Email or username:

Password:

Forgot your password?
lcamtuf :verified: :verified: :verified:

Did you ever wake up in the middle of the night wondering what would happen if you applied JPEG-style lossy compression to text?

Well, here's the tool you've been waiting for - The Text Lossifizer: lcamtuf.coredump.cx/lossifizer

18 comments
bob

@lcamtuf is it actually doing dct+quantization or is it just adding noise?

Łukasz Bromirski :unverified:

@lcamtuf training LLMs on stuff like fully "lossed" "Lq#hrbmmbr,!cq atvjcmehu"bmvmhnabpofa"cmbrr!oecedicbtebyqscqrfbw_pc vsfe!ylui#qrvmohrauiu!up!mbtm!vid jefmrecf^`hliswmf"xke"xiheodqvqnfsiemlso ogp]mas.Tgd ccvegoqy!of arugaker dootrkwuset `n`pt!ng"usea`e," would likely be better for human kind than what we're getting now. Imagine the possibilities.

David Flanagan

@lcamtuf a nonlinear scale might make the slider more useful…

Graham Spookyland🎃/Polynomial

@lcamtuf I'm sorrz,?as ` large mboguage nodel I do nns?gavf th`s informatinn.

Andrew

@lcamtuf #Lnss#,?someshmes reeerred tp as "Loss.jpg",\2] is??rtrip pvcljthed on Jtne?1+?1008, ay Tim Cvclmey eor hit gaming-relased webcomjc Ctrl+Alt+Dek.?Sfu curing a stprxlime hn whhch tgd?mbjn cibsbdter Etham?amd his fibocéd?Lilah are expecting!uieir!fhqst child, uie?rsqhp―pqesensdd as a?fotr,onel!comic?wisg?no?diakofte—showr?Ethao!enufrhng a hnroit`k+ wherf!he sfes Ljmah weephmg in a gptpital bee aftes tufgfring b misbarriage. Cvdlmfy cited fwfots in?hit!life as intqiratjoo!for the comic-?

Walter Nissen

@JetForMe @lcamtuf @gsuberland I was particularly amused that at level 7.9 it added a single question mark to the text.

Sludge, Ph.D.

@lcamtuf oh nice, this hits my current obsessions perfectly

Sludge, Ph.D.

@lcamtuf (they did surgery on a grape voice) they did DCT on a word

Kiran 🏳️‍⚧️

@lcamtuf This is such a cool project!!! Ahhh it's soo nicely self contained and has fun outputs!! Thanks for sharing!!!

Jamey Sharp

@lcamtuf I can't quite figure out what this demo makes me want to measure, but it's something about how many bits the quantized DCT would need to be correctly transmitted somewhere at a given level of "shoddiness". Do you know what question I mean to ask here, and perhaps how to answer it? "View Source" and the accompanying blog post helped me understand what you're doing but not how to reason about its effectiveness as a compression algorithm

Joby (chaotic good)

@lcamtuf it would be fun to try a keymap that puts letters close to their common typing error neighbors. Lossy compression that can be further improved by applying automatic spellchecking.

Lucent Maven Katanova

@lcamtuf
Compression implies something else.

I would say this is a kind of lossy data transmission.

Compression would suggest a restricted character set intended to convey the same message.

lcamtuf :verified: :verified: :verified:

@katanova Sure, but I think you're being pedantic for the sake of it? What JPEG does is this lossy quantization operation, followed by lossless compression of the coefficients, followed by lossless decompression.

You can write a simulator of JPEG compression artifacts, emulating the degradation from source bitmap to your screen, without actually doing the lossless compression step.

This does precisely that, except for text. It would be fairly pointless to throw in the lossless stage, because... the only novelty here is being able to observe what lossy DCT does to text.

@katanova Sure, but I think you're being pedantic for the sake of it? What JPEG does is this lossy quantization operation, followed by lossless compression of the coefficients, followed by lossless decompression.

You can write a simulator of JPEG compression artifacts, emulating the degradation from source bitmap to your screen, without actually doing the lossless compression step.

RealGene ☣️

@lcamtuf
Excuse me, but I don't see it getting any smaller…

Go Up