Wrote a tiny new LLM plugin this morning: llm-whisper-api,...

Wrote a tiny new LLM plugin this morning: llm-whisper-api, which lets you do this (if you have an OpenAI API key configured already):

llm install llm-whisper-api
llm whisper-api myfile.mp3 > transcript.txt

https://simonwillison.net/2024/Oct/27/llm-whisper-api/

Like 27 October at 21:18 | Open on fedi.simonwillison.net

9 comments

Felix 🇺🇦🚴‍♂️

@simon In principle, human transcription as a job or service is also dead.

I've always wished for something like this to turn podcasts into texts.

Had played around with opensource like cmusphinx a long time ago. But the quality wasn't particularly good then and it was also very slow. It's impressive how it has developed.

27 October at 22:37 | Open on norden.social

Ame

@simon Now we just also need a plugin for the groq whisper api, which can be used for free!

1 November at 15:33 | Open on breta.moe

Simon Willison

@ame I got Claude to port my Whisper API plugin to use Groq instead! It seems to work - I've only released it as an alpha though as I've not yet added automated tests or manually QAd all of the options https://github.com/simonw/llm-groq-whisper

1 November at 19:46 | Open on fedi.simonwillison.net

Ame

@simon Amazing, that works for me!
Thank you!

I think it's pretty easy to get an LLM to write code to have the plugin stitch the json response to an SRT file too 👀

1 November at 21:37 | Open on breta.moe

Simon Willison

@ame I got Claude to make an artifact for that instead, looks like it might work OK https://gistpreview.github.io/?e29943852f371f638c9a3ae1dcc4784e

Claude transcript: https://gist.github.com/simonw/49b52ce2a7b5796edf4e0e2e2152db41

Screenshot of a web interface titled "Whisper JSON to SRT/VTT Converter" showing input JSON data with timestamps and parameters, and output SRT format with two subtitle entries: "Hey everyone, welcome back." (00:00:00,000 --> 00:00:01,379) and "You ever find yourself wading through mountains of data trying to pluck out the juicy bits?" (00:00:01,580 --> 00:00:06,419)

1 November at 22:48 | Open on fedi.simonwillison.net

Simon Willison

@ame let me know if it works, I'm not sure how best to test it!

1 November at 23:06 | Open on fedi.simonwillison.net

Ame

@simon thank you, will test it tomorrow!

1 November at 23:42 | Open on breta.moe

Ame

@simon This works for me, thank you!
I just tested it by transcribing an mp3 file, converting to srt and listening to the audio together with the subs using mpv

3 November at 9:54 | Open on breta.moe

Ame

@simon I want to reiterate how amazing this is, this effectively enables anyone to transcribe audio for free, regardless of their hardware!

1 November at 21:47 | Open on breta.moe

Go Up