Email or username:

Password:

Forgot your password?
9 comments
jonny

this is the way to get the correct tags:
(on mac i needed to install gnu grep with homebrew `brew install grep` and then use `ggrep` )
will follow up with dataset tomorrow.
twitter.com/horsemankukka/stat

jonny

of course there's smarter watermarking, the metadata is notable because you could scan billions of pdfs fast. this comment on HN got me thinking about this PDF /OpenAction I couldn't make sense of earlier, on open, access metadata, so something with sizes and layout...

jonny replied to jonny

updated the above gist with correctly extracted tags, and included python code to extract your own, feel free to add them in the comments. since we don't know what they contain yet not adding other metadata. definitely patterned, not a hash, but idk yet.
twitter.com/json_dirs/status/1

jonny replied to jonny

you go to school to study "the brain" and then the next thing you know you're learning how to debug surveillance in PDF rendering to understand how publishers have so contorted the practice of science for profit. how can there be "normal science" when this is normal?

jonny replied to jonny

follow-up: there does not appear to be any further watermarking: taking two files with different identifying tags, stripping metadata, and relinearizing with qpdf's --deterministic-id flag yields PDFs identical with a diff, ie. no differentiating watermark (but plz check my work)

jonny replied to jonny

which is surprising to me, so I'm a little hesitant to make that as a general claim

Nick Astley replied to jonny

@jonny

It's a couple things:

a) Elsevier's vendor's tool only has to be good enough to impress Elsevier

b) Deterrence being more efficient than prevention

shusha replied to jonny

@jonny for the normativity of science see the discourse of STS (science and technology studien), great field!

jonny replied to shusha

@shusha
yes definitely, love it and spend basically all my time reading it nowadays ❤️

Go Up