@simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32?
I’m waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious.
Top-level
@simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32? I’m waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious. 2 comments
@simon Interesting, thanks for checking! Either way, since I currently work on a 16GB M1 with no problems for my day to day tools, I know I should have enough RAM to run my normal tools plus that for experimentation. 🙂 |
@edmistond I just tried running a prompt through the Ollama qwen2.5-coder:32b model and to my surprise it appeared to peak at just 2GB of RAM usage, but it was using 95% of my GPU
I thought GPU and system RAM were shared on macOS so I don't entirely understand what happened there, I would have expected more like 20GB of RAM use