Email or username:

Password:

Forgot your password?
Devine Lu Linvega

So, for a little project I needed to compress a bit of data, and @cancel and I made up a spec that should be possible to implemented on small systems without too much headaches.

I wrote a bit of documentation for it, and I was wondering if anyone wanted to put the docs to the test and see if they could write a toy implementation in a language of their choice. It'd help me see if there's things missing in the docs.

wiki.xxiivv.com/site/ulz_forma

So, think of it as a little puzzle.

42 comments
Renaud Bédard

@neauoire @cancel A couple of things that are tripping me :
- CPY uses a "negative offset plus 1", but it's not clear if it's "-offset + 1" or "-(offset + 1)"
- It's implied that the dictionary pointer is always pointing to the end of the dictionary buffer after the last copy/append operation, right?

Devine Lu Linvega

@renaudbedard @cancel

Example: an offset of 0 means go back by 1 bytes into the history.

And the pointer is the at the end of the dictionary buffer yep.

I'll add a bit about the negative increment to the offset to the docs! thanks for pointing it out

Renaud Bédard

@neauoire @cancel also I know HTML tables are a nightmare, but if you could make the LIT length box appear to take all 7 bits instead of 6 like CPY1... :cooldog:

DELETED

@neauoire @cancel one potential issue I see which may not be an issue is syncronisation. if my “read” alignment is 1 byte, and for whatever reason the read pointer ends up on the second byte of a cpy2 instruction, (perhaps due to input file corruption) it won’t necessarily be obvious that byte is not a lit, cpy1 or the first byte of a cpy2 instruction. it is my understanding that dysyncing is a potential risk of variable byte encodings

Devine Lu Linvega

@zens @cancel is your issue that it's not resilient to data corruption? In the wider system implementation we have a checksum. Do you think would be enough?
git.sr.ht/~rabbits/uxn-utils/t

DELETED

@neauoire @cancel it’s more that those leading bits remind me of the leading bits of utf-8, why utf-8 was designed that way- and that if this were following the utf-8 pattern, cpy2 would have 110 as the leading bits for both of its bytes.

I am not saying you should do that, but have a look at why utf-8 did it that way. (from memory, line transmission corruption can either take out a few characters as long as sync is maintained… or the entire rest of your file if it is not)

Louis Merlin

@neauoire @cancel here is a naive Rust implementation:

play.rust-lang.org/?version=st

@neauoire @cancel here is a naive Rust implementation:

play.rust-lang.org/?version=st

[DATA EXPUNGED]
WimⓂ️

@neauoire This is my interpretation, is it correct?

0 LIT:7 <up to 2^7-1 bytes which are not commands>
10 CPY1:6 < copy up to 2^6-1 bytes from offset; offset is a byte >
11 CPY2:14 < copy up to 2^14-1 bytes from offset; offset is a byte >

@cancel

max22-

@neauoire @cancel I've made a little implementation in Go : github.com/max22-/ulz-go
it was a little bit difficult to understand that you can copy data even if the length goes past the end of the output buffer (i did an equivalent of memcpy, and it didn't work ^^)

Devine Lu Linvega

@maxime_andre @cancel ah! that trips many people, how would you explain this so others aren't tripped by it?

Devine Lu Linvega

Thanks to everyone who answered my puzzle and wrote experimental implementations of our little LZ scheme! You've helped us improved the documentation and see how portable of an algorithm it is across multiple languages!

Nico

@neauoire @cancel Nice! But the encoded data doesn't use CPY2 does it? I expected the example to guide me towards a full implementation, so I was surprised to see the full text after only LIT and CPY1! :)

Devine Lu Linvega

@nicolagi @cancel ah you're right, I chose a segment that's too small. I'll fix the example :) thanks for pointing that out.

Nico

@neauoire No need to update the example! My message was a way of sharing my enthusiasm for working through your puzzle. Just showing appreciation and checking I didn't miss something; not intending to criticize or create work... sorry if it came across differently!

Devine Lu Linvega

@nicolagi nono it's all good :) I've since added a longer example to the repo(as to not clog the documentatio) It's good, I wanted to have a way to benchmark CPY2 as well!

Verwechslungsgefährte 🍿

@neauoire @cancel Is there 2 lines of "Da ba dee da ba di" too many in your deflated example?

Devine Lu Linvega

@dichotomiker @cancel no, there's supposed to be 4, are you only getting two?

cancel

@neauoire @dichotomiker you should put a note that the encoder in uxn-utils is unsafe and the one from Uxn32 should be used for real software

Devine Lu Linvega

@cancel @dichotomiker sure, would you like to make a standalone file in the repo that I can link people to?

Devine Lu Linvega

@cancel I mean with build instructions and main(), so people can build it without too much fussing with it as a library.

cancel

@neauoire hmm. why not just copy uxn_lz.c and uxn_lz.h to your uxn-utils and use them?

cancel replied to Devine Lu Linvega

@neauoire nice :)

The reason I haven't made an example standalone program in C for it in the Uxn32 repo is that none of the Uxn32 code currently uses the C standard library, and doing a cross-platform cmd line program that does file operations would require to suddenly switch to using it. Not the end of the world but… well maybe I'll do it eventually.

Devine Lu Linvega replied to cancel

@cancel ah! I didn't know that. Good to know, yeah I'll bring the ref implementation and make a README for it!

cancel replied to Devine Lu Linvega

@neauoire sorry, I didn't document how to use the streaming decompressor yet. (Having to swap hard drives to work on Uxn32 is turning out to be pretty inconvenient...)

Verwechslungsgefährte 🍿

@neauoire @cancel Thanks. I got it working now.
codeberg.org/spazzpp2/julz/src
(Never do trial and error on index offsets in the middle of the night. Also, know your libraries).

I mostly used the table and the C implementation from wiki.xxiivv.com/site/ulz_forma for reference.

I was a little disappointed when I learned that CPY doesn't copy from the input but from the output array. So, no self-modification. Good for debugging, though.

Would you get more out of it when you assume, the compression always starts with LIT and then simply alternates between CPY and LIT? Their first bit would then be obsolete (twice as much length). You probably need a zero-length LIT as well.

CPY with length > offset repeating over and over again is a really nice idea, especially for ordered dither images!

@neauoire @cancel Thanks. I got it working now.
codeberg.org/spazzpp2/julz/src
(Never do trial and error on index offsets in the middle of the night. Also, know your libraries).

I mostly used the table and the C implementation from wiki.xxiivv.com/site/ulz_forma for reference.

@reiver ⊼ (Charles) :batman:

@neauoire @cancel

Have you considered including 'magic-bytes' with your ULZ file format?

...

The usage of it would be —

If someone doesn't know what the file-name is, they could still determine the type of the file.

...

It could be something as simple as the first by 7 bytes of the file format being:

55 4C 5A 2F 31 0D 0A

Which, if interpreted as ASCII or Unicode UTF-8 would be:

"ULZ/1\r\n"

...

Here are magic bytes for other file formats:

en.wikipedia.org/wiki/List_of_

.

Devine Lu Linvega

@reiver @cancel it will be used as parts of other formats, which might have identifiers yes :)

matt sephton

@neauoire @cancel really enjoyed this! when a video of it popped up on YouTube I read the wiki

Go Up