@leo_song it could be that 175B parameters are used during training but not during actual usage