🗣️ Free range archivist Jason Scott (@textfiles) takes Whisper, the open source speech recognition project, out for a spin on 5+ years worth of his weekly podcasts, preserved on archive.org ▶️ https://blog.archive.org/2024/04/28/taking-the-words-out-of-my-mouth-with-ai/ x
@internetarchive @textfiles I can confirm that Whisper is useful for generating transcripts of podcasts that are 80% “good enough” for local text searching with non-LLM #FLOSS tools such as Recoll https://www.recoll.org/ https://packages.debian.org/bookworm/recoll #search
LLM audio transcription is like a bulldozer tunneling a straight line into an otherwise inaccessible jungle; sure, stuff gets mangled, but it gives you quick access to biomes too costly to survey on foot.