@mhoye wait it works on a dataset in Pinyin?! Damn

And it's 14 lines of python

And it's not even looking at the contents of the compression, it's just using the byte lengths? Holy fucking shit that's smart as hell.

😍