I interviewed Rajiv Sinclair about his team's new project,...

I interviewed Rajiv Sinclair about his team's new project, VERDAD - an outstanding piece of data journalism that tracks 48 US talk radio stations (many in Spanish), archives their audio, transcribes it and uses Gemini 1.5 to help identify potential snippets of misinformation - then presents the results in a UI for human review

https://simonwillison.net/2024/Nov/7/project-verdad/

Like 7 November at 18:48 | Open on fedi.simonwillison.net

11 comments

Simon Willison

I'm hoping to turn this into a series of YouTube interviews with people building cool data projects where we nerd out about what they've built and how they built it, so I'm optimistically thinking of this as episode one! https://www.youtube.com/watch?v=t_S-loWDGE0

7 November at 18:50 | Open on fedi.simonwillison.net

Simon Willison

The VERDAD prompts are pretty complex - Rajiv shared this example of a conversation he had with Claude 3.5 Sonnet to further iterate on the existing prompt used with Gemini 1.5 Pro https://gist.github.com/rajivsinclair/8fb0371f6eda25f9e5cc515cd77abd62

7 November at 19:01 | Open on fedi.simonwillison.net

Simon Willison

A YouTube comment asked the price difference between Gemini 1.5 Flash and OpenAI's Whisper

Whisper API is $0.006 / minute, so an hour of audio = 36 cents

Gemini 1.5 Flash is $0.075 for 1 million tokens, 25 tokens/second of audio so an hour is 0.675 cents

Over 50x cheaper!

8 November at 6:15 | Open on fedi.simonwillison.net

Daniel Erenrich

@simon that whisper price quote isn't competitive https://groq.com/pricing/ and I'd be curious on the accuracy differential

8 November at 7:11 | Open on techhub.social

Simon Willison

@derenrich problem with Groq is they haven't actually launched their billed API yet, so you're stuck with whatever their free tier will let you do

Developer
Scale up and pay as you go
Pay per Token
Coming Soon
* High Rate Limits
Priority Support

8 November at 7:25 | Open on fedi.simonwillison.net

Xing Shi Cai

Sensitive content

@simon Is the quality of Gemini and Whisper in Speech-to-text on the same level though?

8 November at 11:52 | Open on mathstodon.xyz

Simon Willison

@xsc from what I've seen so far they do feel similar in quality - and Gemini can do extra tricks like diarization and tone-of-voice analysis that Whisper can't

I remain paranoid about the risk of Gemini accidentally acting on instructions within the audio, but I've not (yet) seen that happen - so possibly more of a risk with deliberately malicious audio

8 November at 12:02 | Open on fedi.simonwillison.net

phildini

@simon I’d love to talk about https://civic.band ✨

8 November at 2:30 | Open on wandering.shop

Simon Willison

@phildini YES let's do it!

8 November at 3:46 | Open on fedi.simonwillison.net

phildini

@simon how do we get started? Wanna dm me on discord?

8 November at 18:55 | Open on wandering.shop

Jay Nakrani

@simon That is a superb use of LLMs. I've seen a lot of text-classification tasks (that previously required expensive model training) can now be done rather cheaply using LLMs + engineered prompts. Cost and development velocity has improved quite a bit with this new LLM-as-rater approach compared to previous approaches of custom-model-training.

The next bottleneck is human evals, but I guess we can't completely remove them until LLMs stop making mistakes.

8 November at 5:03 | Open on mastodon.world