Email or username:

Password:

Forgot your password?
Top-level
Rich Felker

@bedast Then don't call it AI. Call it speech to text. But if it uses a language model to more effectively predict words based on context rather than doing an analyzable mechanical local transformation, it is at least partly the "bad kind of AI" - it has the capacity to introduce biases from training data making output that "sounds right" but means the wrong thing, which is much worse than substituting nonsensical homophones now and then (which the reader will immediately recognize as mistakes). Same principle as why autocorrected text is worse than text with typos.

7 comments
Rich Felker

@bedast Enthusiastically calling new functionality "AI" signals to your audience that you're aligned with the scams and makes them distrust you.

This is not hard.

If you have privacy respecting, on-device, non-plagiarized, ethically built statistical model based processing, DON'T CALL IT "AI".

...the heck?!!!??! 🍉

@dalias @bedast I agree. This is why "AI" transcription is a downgrade from previous technologies. It's contributing to the plausible disinformation slop we've still been drowning in lately.

I think automated captions have a place but I'm wary of using genai to do it.

A.V.

@dalias @bedast speech recognition has used language models for decades now. It was one of original applications of language models, way before they scaled up to aping shakespeare.

But even without language models, the act of transcription is very close to generative ai, as its the task of predicting the next text token, given previous tokens and encoded audio sequence.

Rich Felker

@varavs @bedast Then don't call it "AI".

But also, question what harms are coming out of the predictive models. The more they force the output to sound natural and fix misrecognitions, the greater the chance they're altering meaning. Same as autocorrect vs typed text with typos and misspellings.

Rich Felker

@varavs @bedast Also ask if the model is ethically and legally sound. Was it produced from professional training material with compatible license terms? Or stolen from millions of movies or YouTube videos?

LisPi
@dalias @bedast @varavs Aren't basically all the embeddable models that don't have absurd spec requirements sourced & produced by university projects?
LisPi
@dalias @bedast Didn't mathematical/rule-based language modeling start showing massively diminishing returns back like... two~three decades ago or is my information wrong?

As far as I'm aware it would be preferable to start from a rule-based language, and then be able to specifically train a small model on a different captioned sample set of the speaker(s) to eliminate its flakiness.
Go Up