@zanagb @bedast VLC will do it in-device, not sending anything anywhere.
Whisper models are terrible at transcribing casual conversations of doctors and patients because the training data doesn't reflect that kind of speech and environments. But it excels at transcribing movies etc. because a lot of its training data are closed captions. So this would actually work reasonably well. One can put some text with the names of characters, places, etc. as context and that makes it transcribe those names very well. (source: I've been using whisper models at work, and occasionally I've been putting the mic towards the speaker with some show I'm watching to test) (also: I haven't sent any data to openai nor paid them anything)
@starsider @bedast the CES demo makes it clear the transcription is **off-device**, ie, syphoning data. And besides, there are already many built in tools for that on macOS and linux.
If i wanted fucked-up nonsense on my videos i would watch a raunchy youtube poop from the early 2010s
Id rather have a phoneme-based system where at least you can tell what the gibberish came from and you can tell its an error, and even reconstruct the sentence back.
We do not need this.