The other major Chinese AI lab, DeepSeek, just dropped their own last-minute entry into the 2024 model race: DeepSeek v3 is a HUGE model (685B parameters) which showed up, mostly undocumented, on Hugging Face this morning. My notes so far: https://simonwillison.net/2024/Dec/25/deepseek-v3/
@simon The split of 256 experts is interesting as the compute of 8 per token I'm assuming will be ~20B params (plus router model I guess?) which is pretty light weight for the performance in Aider. Having all experts in memory is a very high bar though.