Email or username:

Password:

Forgot your password?
Simon Willison

I interviewed Rajiv Sinclair about his team's new project, VERDAD - an outstanding piece of data journalism that tracks 48 US talk radio stations (many in Spanish), archives their audio, transcribes it and uses Gemini 1.5 to help identify potential snippets of misinformation - then presents the results in a UI for human review

simonwillison.net/2024/Nov/7/p

11 comments
Simon Willison

I'm hoping to turn this into a series of YouTube interviews with people building cool data projects where we nerd out about what they've built and how they built it, so I'm optimistically thinking of this as episode one! youtube.com/watch?v=t_S-loWDGE

Simon Willison

The VERDAD prompts are pretty complex - Rajiv shared this example of a conversation he had with Claude 3.5 Sonnet to further iterate on the existing prompt used with Gemini 1.5 Pro gist.github.com/rajivsinclair/

Simon Willison

A YouTube comment asked the price difference between Gemini 1.5 Flash and OpenAI's Whisper

Whisper API is $0.006 / minute, so an hour of audio = 36 cents

Gemini 1.5 Flash is $0.075 for 1 million tokens, 25 tokens/second of audio so an hour is 0.675 cents

Over 50x cheaper!

Daniel Erenrich

@simon that whisper price quote isn't competitive groq.com/pricing/ and I'd be curious on the accuracy differential

Simon Willison

@derenrich problem with Groq is they haven't actually launched their billed API yet, so you're stuck with whatever their free tier will let you do

Developer
Scale up and pay as you go
Pay per Token
Coming Soon
* High Rate Limits
Priority Support
Xing Shi Cai

@simon Is the quality of Gemini and Whisper in Speech-to-text on the same level though?

Simon Willison

@xsc from what I've seen so far they do feel similar in quality - and Gemini can do extra tricks like diarization and tone-of-voice analysis that Whisper can't

I remain paranoid about the risk of Gemini accidentally acting on instructions within the audio, but I've not (yet) seen that happen - so possibly more of a risk with deliberately malicious audio

Jay Nakrani

@simon That is a superb use of LLMs. I've seen a lot of text-classification tasks (that previously required expensive model training) can now be done rather cheaply using LLMs + engineered prompts. Cost and development velocity has improved quite a bit with this new LLM-as-rater approach compared to previous approaches of custom-model-training.

The next bottleneck is human evals, but I guess we can't completely remove them until LLMs stop making mistakes.

Go Up