Here's a one-liner that should work for you if you run uv on a Mac with 64GB of RAM (it will download ~32GB of model the first time you run it)
uv run --with mlx-lm \
mlx_lm.generate \
--model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \
--max-tokens 4000 \
--prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal'
@simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32?
I’m waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious.