Email or username:

Password:

Forgot your password?
Top-level
Simon Willison

Google Research published the paper as a two-column PDF with no HTML equivalent, so as an experiment I uploaded that PDF to Google AI Studio and told gemini-1.5-pro-exp-0801 "Convert this document to neatly styled semantic HTML" - it did pretty well! static.simonwillison.net/stati

8 comments
Simon Willison

... possibly too well, six hours after I published that it was already the third search result on Google for the title of the paper!

Since I hadn't reviewed the conversion for correctness I've now added a meta tag to de-index it from the search engine

Google search results for "SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL"

First result is the official Google abstract page. Second is a LinkedIn post by Dave Herrald. Third is my HTML conversion of the paper, shown as "6 hours ago"
aos_

@simon I’ve recently started digging into Elixir and the library used for database abstraction called Ecto has this as a feature. It’s really nice: hexdocs.pm/ecto/Ecto.Query.htm

Leon Bambrick

@simon interesting!

I noticed that (on iOS Safari) I couldn’t use “dark night” plugin to turn it into dark mode. (Thought I could use reader mode to darken it).

Any idea if this is something you’ve explicitly blocked via a meta tag?

(My main reason for wanting to view the original as html was to aid legibility, particularly by having it in darkmode. My eyes are flaring up with photophobia and it’s literally agonising to read black on white at the moment.)

Leon Bambrick

@simon

“REFERENCES

(A long list of references, which I won't reproduce here to save space.)”

I know it’s mean to laugh at the poor AI that is doing the best it can… but there’s always space in html. 🙂

(Thank you for this conversion btw.)

Adriano

@simon Saw this pass by, and the combo
"It did pretty well!
[...]
Since I hadn't checked the conversion for correctness..."

is kinda the thing us LLM detractors like to latch on.

Simon Willison

@adriano sure, that’s why I specifically called it out - they’re not wrong about that kind of thing

Simon Willison

@adriano but if they chose not to consider how much time this would save in terms of doing the work to produce a verified, 100% correct HTML version of that PDF that’s on them at this point

Neil Kandalgaonkar

@simon Huh, I had wondered about exactly this as a service, but hadn’t made the connection with AI

Panning and scanning on mobile is terrible. And I am just a casual scientific paper reader

Go Up