I added multi-modal (image, audio, video) support to my LLM command-line tool and Python library, so now you can use it to run all sorts of content through LLMs such as GPT-4o, Claude and Google Gemini
Cost to transcribe 7 minutes of audio with Gemini 1.5 Flash 8B? 1/10th of a cent.
@simon Since you have the tokens readily available what do you think about including the pricing calculator directly inside `llm`?
aider is doing a similar thing by directly showing you how much you were billed for each response.