The principles embodied by #LLM (Language) models are actually used in other modalities of AI.
There are voice, image and video modalities e.g. #sora that are amazing for early versions.
Simplifying.
To traverse from State A to State C you have to traverse through B.
And if you feed enough learning data into the model, you will get amazing results.
Modality is immaterial.