Email or username:

Password:

Forgot your password?
Simon Willison

Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code
simonwillison.net/2024/Nov/12/

13 comments
Simon Willison

So far I've run Qwen2.5-Coder-32B successfully in two different ways: once via Ollama (and the llm-ollama plugin) and once using Apple's MLX framework and mlx-llm - details on how I ran both of those are in my article.

Simon Willison

Here's a one-liner that should work for you if you run uv on a Mac with 64GB of RAM (it will download ~32GB of model the first time you run it)

uv run --with mlx-lm \
mlx_lm.generate \
--model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \
--max-tokens 4000 \
--prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal'

David Edmiston

@simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32?

I’m waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious.

Simon Willison

@edmistond I just tried running a prompt through the Ollama qwen2.5-coder:32b model and to my surprise it appeared to peak at just 2GB of RAM usage, but it was using 95% of my GPU

I thought GPU and system RAM were shared on macOS so I don't entirely understand what happened there, I would have expected more like 20GB of RAM use

David Edmiston

@simon Interesting, thanks for checking! Either way, since I currently work on a 16GB M1 with no problems for my day to day tools, I know I should have enough RAM to run my normal tools plus that for experimentation. 🙂

Robert Atkins

@simon What’s the speed difference between ollama vs mlx?

Simon Willison

Added an example showing Qwen 2.5 Coder's performance on my "pelican on a bicycle" benchmark:

llm -m qwen2.5-coder:32b 'Generate an SVG of a pelican riding a bicycle'

It's not the *worst* I've seen! simonwillison.net/2024/Oct/25/

Let’s see how it does on the Pelican on a bicycle benchmark.

llm -m qwen2.5-coder:32b 'Generate an SVG of a pelican riding a bicycle'

Here’s what I got:

SVG: A jumble of shapes. The pelican has a yellow body, a black head and a weird proboscis kind of thing. The bicycle is several brown overlapping shapes that looks a bit like a tractor.
Jeff Triplett

@simon there is something to be said about generating bad SVG graphics for things. With a different color palette, I have seen worse art on paper cups and hanging in offices.

Could easily work for project release artwork.

Stefano Pacifico 🧬 🇺🇦

@simon besides offline use and additionally privacy, did you detect any other advantage running locally?

Simon Willison

@stefpac sadly not, I'm probably going to continue mostly using the best hosted ones because then I don't have to sacrifice half my system RAM

Drew Breunig

@simon Did you notice a speed difference between mlx and ollama?

Simon Willison

@dbreunig I haven't measured it properly but MLX feels a bit faster to me

balloob

@simon qwen is amazing. It’s the best performing local model in the Home Assistant AI benchmarks. github.com/allenporter/home-as

Go Up