Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code
https://simonwillison.net/2024/Nov/12/qwen25-coder/
Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code 13 comments
Here's a one-liner that should work for you if you run uv on a Mac with 64GB of RAM (it will download ~32GB of model the first time you run it) uv run --with mlx-lm \ @simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32? I’m waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious. @edmistond I just tried running a prompt through the Ollama qwen2.5-coder:32b model and to my surprise it appeared to peak at just 2GB of RAM usage, but it was using 95% of my GPU I thought GPU and system RAM were shared on macOS so I don't entirely understand what happened there, I would have expected more like 20GB of RAM use @simon Interesting, thanks for checking! Either way, since I currently work on a 16GB M1 with no problems for my day to day tools, I know I should have enough RAM to run my normal tools plus that for experimentation. 🙂 Added an example showing Qwen 2.5 Coder's performance on my "pelican on a bicycle" benchmark: llm -m qwen2.5-coder:32b 'Generate an SVG of a pelican riding a bicycle' It's not the *worst* I've seen! https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/ @simon there is something to be said about generating bad SVG graphics for things. With a different color palette, I have seen worse art on paper cups and hanging in offices. Could easily work for project release artwork. @simon besides offline use and additionally privacy, did you detect any other advantage running locally? @stefpac sadly not, I'm probably going to continue mostly using the best hosted ones because then I don't have to sacrifice half my system RAM @simon qwen is amazing. It’s the best performing local model in the Home Assistant AI benchmarks. https://github.com/allenporter/home-assistant-datasets/tree/main/reports |
So far I've run Qwen2.5-Coder-32B successfully in two different ways: once via Ollama (and the llm-ollama plugin) and once using Apple's MLX framework and mlx-llm - details on how I ran both of those are in my article.