@simon Your post mentioned a ~20GB quantized file via...

Simon's posts Post Back to profile

Top-level

David Edmiston

@simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32?

I’m waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious.

Like 13 November at 2:58 | Wall-to-wall | Open on fosstodon.org

2 comments

Simon Willison

@edmistond I just tried running a prompt through the Ollama qwen2.5-coder:32b model and to my surprise it appeared to peak at just 2GB of RAM usage, but it was using 95% of my GPU

I thought GPU and system RAM were shared on macOS so I don't entirely understand what happened there, I would have expected more like 20GB of RAM use

13 November at 3:19 | Open on fedi.simonwillison.net

David Edmiston

@simon Interesting, thanks for checking! Either way, since I currently work on a 16GB M1 with no problems for my day to day tools, I know I should have enough RAM to run my normal tools plus that for experimentation. 🙂

13 November at 3:22 | Open on fosstodon.org

Go Up