Did you ever wake up in the middle of the night wondering what would happen if you applied JPEG-style lossy compression to text?
Well, here's the tool you've been waiting for - The Text Lossifizer: https://lcamtuf.coredump.cx/lossifizer/
Did you ever wake up in the middle of the night wondering what would happen if you applied JPEG-style lossy compression to text? Well, here's the tool you've been waiting for - The Text Lossifizer: https://lcamtuf.coredump.cx/lossifizer/ 18 comments
@lcamtuf training LLMs on stuff like fully "lossed" "Lq#hrbmmbr,!cq atvjcmehu"bmvmhnabpofa"cmbrr!oecedicbtebyqscqrfbw_pc vsfe!ylui#qrvmohrauiu!up!mbtm!vid jefmrecf^`hliswmf"xke"xiheodqvqnfsiemlso ogp]mas.Tgd ccvegoqy!of arugaker dootrkwuset `n`pt!ng"usea`e," would likely be better for human kind than what we're getting now. Imagine the possibilities. @lcamtuf I'm sorrz,?as ` large mboguage nodel I do nns?gavf th`s informatinn. @lcamtuf #Lnss#,?someshmes reeerred tp as "Loss.jpg",\2] is? @JetForMe @lcamtuf @gsuberland I was particularly amused that at level 7.9 it added a single question mark to the text. @lcamtuf This is such a cool project!!! Ahhh it's soo nicely self contained and has fun outputs!! Thanks for sharing!!! @lcamtuf I can't quite figure out what this demo makes me want to measure, but it's something about how many bits the quantized DCT would need to be correctly transmitted somewhere at a given level of "shoddiness". Do you know what question I mean to ask here, and perhaps how to answer it? "View Source" and the accompanying blog post helped me understand what you're doing but not how to reason about its effectiveness as a compression algorithm @lcamtuf it would be fun to try a keymap that puts letters close to their common typing error neighbors. Lossy compression that can be further improved by applying automatic spellchecking. @lcamtuf I would say this is a kind of lossy data transmission. Compression would suggest a restricted character set intended to convey the same message. |
@lcamtuf is it actually doing dct+quantization or is it just adding noise?