Email or username:

Password:

Forgot your password?
Top-level
Simon Willison

I'm hoping to turn this into a series of YouTube interviews with people building cool data projects where we nerd out about what they've built and how they built it, so I'm optimistically thinking of this as episode one! youtube.com/watch?v=t_S-loWDGE

9 comments
Simon Willison

The VERDAD prompts are pretty complex - Rajiv shared this example of a conversation he had with Claude 3.5 Sonnet to further iterate on the existing prompt used with Gemini 1.5 Pro gist.github.com/rajivsinclair/

Simon Willison

A YouTube comment asked the price difference between Gemini 1.5 Flash and OpenAI's Whisper

Whisper API is $0.006 / minute, so an hour of audio = 36 cents

Gemini 1.5 Flash is $0.075 for 1 million tokens, 25 tokens/second of audio so an hour is 0.675 cents

Over 50x cheaper!

Daniel Erenrich

@simon that whisper price quote isn't competitive groq.com/pricing/ and I'd be curious on the accuracy differential

Simon Willison

@derenrich problem with Groq is they haven't actually launched their billed API yet, so you're stuck with whatever their free tier will let you do

Developer
Scale up and pay as you go
Pay per Token
Coming Soon
* High Rate Limits
Priority Support
Xing Shi Cai

@simon Is the quality of Gemini and Whisper in Speech-to-text on the same level though?

Simon Willison

@xsc from what I've seen so far they do feel similar in quality - and Gemini can do extra tricks like diarization and tone-of-voice analysis that Whisper can't

I remain paranoid about the risk of Gemini accidentally acting on instructions within the audio, but I've not (yet) seen that happen - so possibly more of a risk with deliberately malicious audio

Go Up