Here are all of experiments with full transcripts https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc
Top-level
Here are all of experiments with full transcripts https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc 5 comments
The other major Chinese AI lab, DeepSeek, just dropped their own last-minute entry into the 2024 model race: DeepSeek v3 is a HUGE model (685B parameters) which showed up, mostly undocumented, on Hugging Face this morning. My notes so far: https://simonwillison.net/2024/Dec/25/deepseek-v3/ @simon The split of 256 experts is interesting as the compute of 8 per token I'm assuming will be ~20B params (plus router model I guess?) which is pretty light weight for the performance in Aider. Having all experts in memory is a very high bar though. The DeepSeek v3 paper came out this morning, added a few notes about that here https://simonwillison.net/2024/Dec/26/deepseek-v3/ |
I got QvQ running on my (M2 64GB) laptop!
uv run --with 'numpy<2.0' --with mlx-vlm python \
-m mlx_vlm.generate \
--model mlx-community/QVQ-72B-Preview-4bit \
--max-tokens 10000 \
--temp 0.0 \
--prompt "describe this" \
--image pelicans-on-bicycles-veo2.jpg
https://simonwillison.net/2024/Dec/24/qvq/#with-mlx-vlm