@mhoye This is awesome, but I'm surprised it wasn't better known. I have vague memories of going to a talk by a researcher in Oxford about 25 years ago about using gzip compression for text analysis. His presentation explained about entropy and how compression is prediction, then looked at categorising text by gzipping it. Can't remember the name; some guy doing inference stuff in the psychology department. This is going to bug me now.
@simoncozens From what I can tell the fact of it wasn't a big secret, but the idea that with apparently negligible effort you can outperform tools that are insanely expensive and wildly more complicated is the interesting part.