this is the way to get the correct tags: (on mac i...

this is the way to get the correct tags:
(on mac i needed to install gnu grep with homebrew `brew install grep` and then use `ggrep` )
will follow up with dataset tomorrow.
https://twitter.com/horsemankukka/status/1486268962119761924?s=20

Like 26 Jan 2022 at 9:32 | Open on social.coop

8 comments

jonny

of course there's smarter watermarking, the metadata is notable because you could scan billions of pdfs fast. this comment on HN got me thinking about this PDF /OpenAction I couldn't make sense of earlier, on open, access metadata, so something with sizes and layout...

26 Jan 2022 at 9:47 | Open on social.coop

jonny replied to jonny

updated the above gist with correctly extracted tags, and included python code to extract your own, feel free to add them in the comments. since we don't know what they contain yet not adding other metadata. definitely patterned, not a hash, but idk yet.
https://twitter.com/json_dirs/status/1486289288115359747?t=QwmBvbOgh2fCkjSOZSh3Fw&s=19

26 Jan 2022 at 11:00 | Open on social.coop

jonny replied to jonny

you go to school to study "the brain" and then the next thing you know you're learning how to debug surveillance in PDF rendering to understand how publishers have so contorted the practice of science for profit. how can there be "normal science" when this is normal?

26 Jan 2022 at 11:34 | Open on social.coop

jonny replied to jonny

follow-up: there does not appear to be any further watermarking: taking two files with different identifying tags, stripping metadata, and relinearizing with qpdf's --deterministic-id flag yields PDFs identical with a diff, ie. no differentiating watermark (but plz check my work)

27 Jan 2022 at 3:15 | Open on social.coop