Thanks to the combo of Ollama and the llm-ollama plugin...

Thanks to the combo of Ollama and the llm-ollama plugin you can now run Meta's Llama 3.2 Vision image model (7.9GB) on a Mac and use it to run prompts against images https://simonwillison.net/2024/Nov/13/ollama-llama-vision/

If you have Ollama installed you can fetch the 11B model (7.9 GB) like this:

ollama pull llama3.2-vision

Or the larger 90B model (55GB) like this:

ollama pull llama3.2-vision:90b

I was delighted to learn that Sukhbinder Singh had already contributed support for LLM attachments to Sergey Alexandrov's llm-ollama plugin, which means the following works once you've pulled the models:

llm install --upgrade llm-ollama
llm -m llama3.2-vision:latest 'describe' \
-a https://static.simonwillison.net/static/2024/pelican.jpg

A photograph of a California Brown Pelican in a harbor

$ llm -m llama3.2-vision:latest 'describe' \
-a https://static.simonwillison.net/static/2024/pelican.jpg
This image features a brown pelican standing on rocks, facing the camera and positioned to the left of center. The bird's long beak is a light brown color with a darker tip, while its white neck is adorned with gray feathers that continue down to its body. Its legs are also gray.

In the background, out-of-focus boats and water are visible, providing context for the pelican's environment.

Like 13 November at 2:01 | Open on fedi.simonwillison.net

6 comments

Jan

@simon Curious how you’re running Ollama - is it just in your laptop or you have some beefy server running it?

13 November at 3:09 | Open on mastodon.hidupmanis.studio

Simon Willison

@jmalonzo just on my laptop, it's a M2 Max with 64GB of RAM so it's pretty good for small and medium sized models

Any model that needs more than 32GB of RAM tends to mean I can't easily run other RAM-hungry apps like VS Code and Firefox

13 November at 3:15 | Open on fedi.simonwillison.net

Jan

@simon That makes sense. Thank you!

13 November at 3:49 | Open on mastodon.hidupmanis.studio

Jeff Triplett

@simon the 90B (55GB) might confuse people.

You do need ~88GB of RAM, not counting your context window, just to run the 90B model size. So 128 GB of RAM, or else you are going to get 1 token per 30 to 45 seconds or more of output while everything swaps around.

That small model is going to run very, very well on any M-series Mac with enough RAM.

13 November at 3:28 | Open on mastodon.social

Simon Willison

@webology thanks, updated that to say "Or the larger 90B model (55GB download, likely needs ~88GB of RAM) like this:"

13 November at 3:32 | Open on fedi.simonwillison.net

Jeff Triplett

@simon I hate that the memory size is so complicated. (that's my best rough guess btw)

Check out Matt's work https://www.youtube.com/@technovangelist

Check out this video https://www.youtube.com/watch?v=QfFRNF5AhME and one of Matt's about context windows. They are the best I have seen for explaining how much RAM one needs just to run one.

What annoys me is models are measured in Billions and context sizes in Kilobytes, and both are deceptive. i.e. 128K context size = 64 GB of RAM. NOT OBVIOUS at all.

@simon I hate that the memory size is so complicated. (that's my best rough guess btw)

Check out Matt's work https://www.youtube.com/@technovangelist

Check out this video https://www.youtube.com/watch?v=QfFRNF5AhME and one of Matt's about context windows. They are the best I have seen for explaining how much RAM one needs just to run one.

Expand text...

13 November at 3:40 | Open on mastodon.social