Can modern screen readers read academic papers that are published as two column PDFs? Do they know how to separate out the two columns?
Can modern screen readers read academic papers that are published as two column PDFs? Do they know how to separate out the two columns? 14 comments
@pheraph that’s reassuring! Do you know if published papers tend to do that? Any way for me to tell if this one works properly? https://storage.googleapis.com/gweb-research2023-media/pubtools/1004848.pdf @simon I am no expert in that field, maybe @yatil has an answer. In my experience PDF files often have tons of accessibility issues. You can check your specific document here: https://pave-pdf.org/pave/index.html It should highlight if the reading order isn’t specified. @simon This specific PDF is not tagged for accessibility, and is literally unreadable with NVDA plus Acrobat Reader on Windows. For instance, here's an excerpt of what I'm hearing: > SQLhasbeenextremelysuccessfulasthedefactostandardlanguageforworkingwithdata.Virtuallyallmainstreamdatabase-like systemsuseSQLastheirprimaryquerylanguage.ButSQLisan oldlanguagewithsignificantdesignproblems,makingitdifficultto learn,difficulttouse, @simon Have not tested, but it should be no problem if the PDF is properly tagged (otherwise it might be problematic). @simon Whenever the selection jumps around in non-reading order I suspect a screen reader would also jump around. @simon That feature is what I miss the most on reMarkable! I've never seen a reader supporting that. 😢 As an experiment I downloaded the two column PDF of this new paper from Google research "SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL" https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/ ... and uploaded it to Google AI Studio and told Gemini Pro 1.5 "Convert this document to neatly styled semantic HTML" - and the results were pretty good! https://static.simonwillison.net/static/2024/Pipe-Syntax-In-SQL.html @simon I'd be really worried about both hallucination and prompt injection when using an LLM for document conversion, as an accessibility tool for blind or other disabled users. But the tools I've tried on this paper do worse than what you got out of Gemini. @matt yeah, me too. The responsible way to do this would be to use Gemini Pro to create the first draft, then spend significant time and effort checking and verifying it, iterating on the prompts, porting across the figures etc @simon considering that PDFs are still hard to work with (eg select/copy/paste) after many years, I do wonder if this is an area where AI can help (based on what a user can see on scree, what do you think they want to copy) rather than a traditional approach |
@simon You can specify the reading order in a PDF document so the screen reader can follow it and doesn’t need to guess.