If you are still LLM-skeptical but haven't spent much...

If you are still LLM-skeptical but haven't spent much time thinking about or experimenting with these multi-modal variants I'd encourage you to take a look at them

Being able to extract information from images, audio and video is a truly amazing capability, and something which was previously prohibitively difficult - see XKCD 1425 https://xkcd.com/1425/

Like 29 October at 17:27 | Open on fedi.simonwillison.net

7 comments

Matthew Martin

@simon Half a century later, it is a solved problem.

29 October at 17:29 | Open on mastodon.social

Simon Willison

The LLM Python library supports attachments now as well https://llm.datasette.io/en/stable/python-api.html#attachments

Model that accept multi-modal input (images, audio, video etc) can be passed attachments using the attachments= keyword argument. This accepts a list of llm.Attachment() instances.

This example shows two attachments - one from a file path and one from a URL:

import llm

model = llm.get_model("gpt-4o-mini")
response = model.prompt(
"Describe these images",
attachments=[
llm.Attachment(path="pelican.jpg"),
llm.Attachment(url="https://static.simonwillison.net/static/2024/pelicans.jpg"),
]
)

Use llm.Attachment(content=b"binary image content here") to pass binary content directly.

29 October at 18:15 | Open on fedi.simonwillison.net

Prem Kumar Aparanji 👶🤖🐘

@simon neat!

Where can I look at the code behind this function?

29 October at 18:22 | Open on mastodon.social

Simon Willison

@prem_k more docs here: https://llm.datasette.io/en/stable/plugins/advanced-model-plugins.html#attachments-for-multi-modal-models

Implementations are spread out across different plugins, eg https://github.com/simonw/llm/blob/a44ba49c21f8d4ac30c8e41bfa5599c258ce53cc/llm/default_plugins/openai_models.py#L338 and https://github.com/simonw/llm-gemini/blob/ce82727a6950c7769a8e40bf030591d0e6f83e5e/llm_gemini.py#L135

29 October at 18:25 | Open on fedi.simonwillison.net

steve ulrich

@simon oh slick. thanks!

31 October at 15:04 | Open on botwerks.social

Daniel

@simon Note that some of us are skeptical for reasons such as the exploitation of creative folks, the copyright infringements at scale, the hype cycle created by venture capital, the impact it has on misinformation and the ads space, and so on. Some of the tech is cool no doubt.

29 October at 18:33 | Open on chaos.social

Simon Willison

@djh those are all very valid reasons to be skeptical!

The only reason I'll consistently push back at is the idea that these things aren't useful at all

29 October at 18:44 | Open on fedi.simonwillison.net

Go Up