@dalias @bedast Didn't mathematical/rule-based language modeling start showing massively diminishing returns back like... two~three decades ago or is my information wrong?

As far as I'm aware it would be preferable to start from a rule-based language, and then be able to specifically train a small model on a different captioned sample set of the speaker(s) to eliminate its flakiness.