@mhoye wait it works on a dataset in Pinyin?! Damn
And it's 14 lines of python
And it's not even looking at the contents of the compression, it's just using the byte lengths? Holy fucking shit that's smart as hell.
😍