Can modern screen readers read academic papers that...

Can modern screen readers read academic papers that are published as two column PDFs? Do they know how to separate out the two columns?

Like 24 August at 16:20 | Open on fedi.simonwillison.net

14 comments

Raphael Fetzer :kirby:

@simon You can specify the reading order in a PDF document so the screen reader can follow it and doesn’t need to guess.

24 August at 16:26 | Open on mastodon.social

Simon Willison

@pheraph that’s reassuring! Do you know if published papers tend to do that? Any way for me to tell if this one works properly? https://storage.googleapis.com/gweb-research2023-media/pubtools/1004848.pdf

24 August at 16:27 | Open on fedi.simonwillison.net

Raphael Fetzer :kirby:

@simon I am no expert in that field, maybe @yatil has an answer. In my experience PDF files often have tons of accessibility issues. You can check your specific document here: https://pave-pdf.org/pave/index.html It should highlight if the reading order isn’t specified.

24 August at 16:32 | Open on mastodon.social

jleedev

@simon @pheraph That PDF isn't tagged, so the only way to read it is in the page content order, which fortunately is sensible enough to present one column and then the other, but things like footnotes and figures run inline.

24 August at 16:59 | Open on mastodon.sdf.org

James Scholes

@simon This specific PDF is not tagged for accessibility, and is literally unreadable with NVDA plus Acrobat Reader on Windows. For instance, here's an excerpt of what I'm hearing:

> SQLhasbeenextremelysuccessfulasthedefactostandardlanguageforworkingwithdata.Virtuallyallmainstreamdatabase-like systemsuseSQLastheirprimaryquerylanguage.ButSQLisan oldlanguagewithsignificantdesignproblems,makingitdifficultto learn,difficulttouse,

24 August at 17:20 | Open on dragonscave.space

Karl Pettersson

@simon Have not tested, but it should be no problem if the PDF is properly tagged (otherwise it might be problematic).

24 August at 16:28 | Open on mastodon.nu

Bruce Elrick

@simon
I would think you should be able to tell by watching the order in which the selection highlighting expands as you drag the cursor.

Whenever the selection jumps around in non-reading order I suspect a screen reader would also jump around.

24 August at 16:43 | Open on cosocial.ca

Björn Töpel

@simon That feature is what I miss the most on reMarkable! I've never seen a reader supporting that. 😢

24 August at 17:41 | Open on mastodon.social

Mikołaj Hołysz

@simon Short answer, just don't, preferrably provide both a HTML alternative and the LaTeX source.

PDF is essentially a vector graphics format, the ultimate end goal of PDF is making a document that prints and displays in exactly the same way for everybody, everything else is secondary. In HTML, the "recommended way to do things" is to essentially say "put a h1 here" and let the browser deal with it, possibly with some help from your style sheet along the way. In PDF, you essentially say "hey, here's some text, put it 2.7 inches from the left margin, 16 point, use font so and so". If you were so inclined, you could even re-order the characters in your font and use completely nonsensical codepoints, and things would still pretty much work visually.

LaTeX definitely uses shenanigans like that, Polish diacritics for example aren't expressed as a single character. Instead, the English letter is used, along with some extra markup that tells the renderer where to draw the acute accents on the page. Those acute accents aren't actually part of the character from an a11y perspective though, they're just random squiggles that the renderer happens to be told to draw. Some say that modern JS frameworks are crazy, I say that PDF is far, far crazier than that.

Speaking onf the two-column stuff in particular, I've seen it work and I've also seen it not work, this probably depends on where the text goes in the document, what it is rendered with, and probably on what software you're using and what their a11y implementation is like.

Yes, there's a way to mark PDFs up for accessibility properly, but very few people do it, LaTeX makes it far harder, there are a lot of other problems (think math), and support among reading programs is... spotty at best.

@simon Short answer, just don't, preferrably provide both a HTML alternative and the LaTeX source.

Expand text...

24 August at 18:15 | Open on dragonscave.space

Adam Marcus

@simon I believe @jbigham has a few things to say about this

24 August at 19:05 | Open on hachyderm.io

Simon Willison

As an experiment I downloaded the two column PDF of this new paper from Google research "SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL" https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/

... and uploaded it to Google AI Studio and told Gemini Pro 1.5 "Convert this document to neatly styled semantic HTML" - and the results were pretty good! https://static.simonwillison.net/static/2024/Pipe-Syntax-In-SQL.html

24 August at 22:13 | Open on fedi.simonwillison.net

Matt Campbell

@simon I'd be really worried about both hallucination and prompt injection when using an LLM for document conversion, as an accessibility tool for blind or other disabled users. But the tools I've tried on this paper do worse than what you got out of Gemini.

24 August at 22:28 | Open on toot.cafe

Simon Willison

@matt yeah, me too. The responsible way to do this would be to use Gemini Pro to create the first draft, then spend significant time and effort checking and verifying it, iterating on the prompts, porting across the figures etc

24 August at 23:05 | Open on fedi.simonwillison.net

Chris Keene

@simon considering that PDFs are still hard to work with (eg select/copy/paste) after many years, I do wonder if this is an area where AI can help (based on what a user can see on scree, what do you think they want to copy) rather than a traditional approach

25 August at 5:36 | Open on mastodon.social