Wrote up some notes on the new Qwen2.5-Coder-32B model,...

Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code
https://simonwillison.net/2024/Nov/12/qwen25-coder/

Like 12 November at 23:39 | Open on fedi.simonwillison.net

13 comments

Simon Willison

So far I've run Qwen2.5-Coder-32B successfully in two different ways: once via Ollama (and the llm-ollama plugin) and once using Apple's MLX framework and mlx-llm - details on how I ran both of those are in my article.

12 November at 23:39 | Open on fedi.simonwillison.net

Simon Willison

Here's a one-liner that should work for you if you run uv on a Mac with 64GB of RAM (it will download ~32GB of model the first time you run it)

uv run --with mlx-lm \
mlx_lm.generate \
--model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \
--max-tokens 4000 \
--prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal'

12 November at 23:41 | Open on fedi.simonwillison.net

David Edmiston

@simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32?

I’m waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious.

13 November at 2:58 | Open on fosstodon.org

Simon Willison

@edmistond I just tried running a prompt through the Ollama qwen2.5-coder:32b model and to my surprise it appeared to peak at just 2GB of RAM usage, but it was using 95% of my GPU

I thought GPU and system RAM were shared on macOS so I don't entirely understand what happened there, I would have expected more like 20GB of RAM use

13 November at 3:19 | Open on fedi.simonwillison.net

David Edmiston

@simon Interesting, thanks for checking! Either way, since I currently work on a 16GB M1 with no problems for my day to day tools, I know I should have enough RAM to run my normal tools plus that for experimentation. 🙂

13 November at 3:22 | Open on fosstodon.org

Robert Atkins

@simon What’s the speed difference between ollama vs mlx?

13 November at 8:14 | Open on mastodon.social

Simon Willison

Added an example showing Qwen 2.5 Coder's performance on my "pelican on a bicycle" benchmark:

llm -m qwen2.5-coder:32b 'Generate an SVG of a pelican riding a bicycle'

It's not the *worst* I've seen! https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/

13 November at 14:56 | Open on fedi.simonwillison.net

Jeff Triplett

@simon there is something to be said about generating bad SVG graphics for things. With a different color palette, I have seen worse art on paper cups and hanging in offices.

Could easily work for project release artwork.

13 November at 15:46 | Open on mastodon.social