@liaizon @JPEG @nightpool well, in the context of a mobile app it's a bit easier — you'd download some part from the beginning of the file and feed that to the system-provided parser. Something from AVFoundation.framework probably but I'm not very good with iOS.

But then the API apps use does actually have a distinction between these:
https://docs.joinmastodon.org/api/entities/#attachment