Email or username:

Password:

Forgot your password?
Simon Willison

OK, help me out here. Is there some aspect of human society that I'm not understanding, where information is considered more official/trustworthy if it's presented as a PDF report and not as a web page?

Today's frustration is this report from @osi about Delayed Open Source Publication - a fascinating document, but why is it a PDF?

opensource.org/delayed-open-so

21 comments
Simon Willison

Here's my first attempt at a Markdown conversion of that document (using Gemini 1.5 Pro against an image of each of the pages in the PDF), created so that I can copy and paste from it and read it on my phone gist.github.com/simonw/7b913aa

Simon Willison

@mseri I used my own little in-development JavaScript app that sends each image to Gemini in turn for conversion to markdown - it's an extension of tools.simonwillison.net/ocr

James King

@simon That is my main use-case for Readwise.io Reader. It generally does a reasonable job of extracting text from PDFs that I can read in the app.

Christopher Neugebauer

@simon

people are still significantly more confident doing print layout than they are doing web layout 🙃

Timothy

@simon @osi because you can’t change a PDF

I think this was even an Adobe marketing slogan at one point, “a PDF is as good as a promise”

(This is a joke, I’m joking)

Martin

@simon @osi from experience with customers that wanted PDF reports (vs web pages) - probably.

The "PDF positive" was the easy to share part (by mail, by putting it on a shared drive, folder, etc).

Yes, I know, urls works too.

Permanence may be an aspect too (links tend to break eventually)

aœ

@simon @osi Perhaps it is the possibility to protect PDF files against unauthorized changes?

alexwlchan

@simon siderea wrote about how the dynamic numbering of HTML’s <ol> tag makes it fundamentally unsuitable for a variety of documents where exact references are important, which drives a lot of people towards PDF: siderea.dreamwidth.org/1819759

That’s clearly not the sole reason (this report doesn’t seem to have significant numbering) but I think there’s something to it.

Tech Singer

@simon Nothing about human society, in my view, everything about human lazyness and desire not to do work. PDFs are easy, if it's just a question of scanning images, they're incredibly easy. Even if it's actual text, layout is far easier than HTML because someone else takes care of it. Note that I never said anyone took care of it well, the important thing is that it's done, to whatever standard. Having said that, though, the main point is to hit scan, get a scanned bunch of pages already in a file, and upload them.

@simon Nothing about human society, in my view, everything about human lazyness and desire not to do work. PDFs are easy, if it's just a question of scanning images, they're incredibly easy. Even if it's actual text, layout is far easier than HTML because someone else takes care of it. Note that I never said anyone took care of it well, the important thing is that it's done, to whatever standard. Having said that, though, the main point is to hit scan, get a scanned bunch of pages already in a file,...

FeralRobots

@simon
I wish I understood this myself - I've been responsible for accessibility on a large website for about 11 years & getting folks to get past this weird valorization of PDF content has been a continual challenge.
@osi

Glyph

@simon I think there's a sort of … contact high of implied prestige that comes from print layouts like this? like, academia uses PDFs mostly due to legacy publication rules based on paper. corporate comms mimicked that style for a long time in order to grant relevance to their "white papers". then white papers gained their own idioms, borrowing from trade press magazines.

all just a general vibe I'm getting, not even really sure how I'd *begin* to go about substantiating this

m_eiman

@simon @osi it’s a lot easier to archive a file than a website, but why they’re more official I dunno… it’s easier to see it as a single thing that’s been blessed as ”official” maybe, rather than a piece of an ever-changing web

Jeff Triplett

@simon @osi I don't see anything in their doc that benefits from being a PDF. That said, you can version a PDF and call it a complete work so maybe that's what they are going for.

I thought it was weird when the PSF did the same thing earlier in the year.

Most foundations do this with financials too and docs one might share via email.

Simon Willison

@webology @osi I don't mind the content being made available as a PDF if it's ALSO available as HTML - I want linkable HTML as the default, with a PDF for people who want to download/forward/print-out etc

James Grimmelmann

@simon@simonwillison.netI think it's that webpages vary and break in ways that PDFs don't. They include images that may not display properly, they depend on libraries that can change or break, they render differently in different browsers and on different devices, etc.

A (basic) PDF is a promise of long-term consistency; every reader will see almost exactly the same content with almost exactly the same layout. It doesn't require ongoing maintenance to protect it from contextual changes, like a webpage does.

Summer Dawn and Company

@simon I think it's because a PDF is protected and cannot be tampered with. That, and they probably figure most people have software to read PDF. Not everyone has word.

Ed Freyfogle

@simon oh Simon, don’t you know pdfs are unfakeable?? I mean why else would a pdf of my Thames water bill in pdf format be acceptable as “proof of address” in the UK?

Drew Breunig

@simon PDF gets more trust. Format it like a scientific paper and its truth! Page layout cosplay infuriates me.

Open Source Initiative :osi:

@simon it's hard to make everyone happy. It's easy to fix though: the source is latex. @kfogel can you help?

Pelle Wessman

@simon @osi Corporate / enterprise love a PDF accompanied with a webinar that you get when you give away your corporate contact info – pretty much the opposite of what non-corporate wants

Go Up