Google Research published the paper as a two-column...

Google Research published the paper as a two-column PDF with no HTML equivalent, so as an experiment I uploaded that PDF to Google AI Studio and told gemini-1.5-pro-exp-0801 "Convert this document to neatly styled semantic HTML" - it did pretty well! https://static.simonwillison.net/static/2024/Pipe-Syntax-In-SQL.html

Like 24 August at 23:09 | Open on fedi.simonwillison.net

8 comments

Simon Willison

... possibly too well, six hours after I published that it was already the third search result on Google for the title of the paper!

Since I hadn't reviewed the conversion for correctness I've now added a meta tag to de-index it from the search engine

Google search results for "SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL"

First result is the official Google abstract page. Second is a LinkedIn post by Dave Herrald. Third is my HTML conversion of the paper, shown as "6 hours ago"

24 August at 23:11 | Open on fedi.simonwillison.net

aos_

@simon I’ve recently started digging into Elixir and the library used for database abstraction called Ecto has this as a feature. It’s really nice: https://hexdocs.pm/ecto/Ecto.Query.html#module-macro-api

24 August at 23:22 | Open on mastodon.social

Leon Bambrick

@simon interesting!

I noticed that (on iOS Safari) I couldn’t use “dark night” plugin to turn it into dark mode. (Thought I could use reader mode to darken it).

Any idea if this is something you’ve explicitly blocked via a meta tag?

(My main reason for wanting to view the original as html was to aid legibility, particularly by having it in darkmode. My eyes are flaring up with photophobia and it’s literally agonising to read black on white at the moment.)

25 August at 0:19 | Open on mastodon.cloud

Leon Bambrick

@simon

“REFERENCES

(A long list of references, which I won't reproduce here to save space.)”

I know it’s mean to laugh at the poor AI that is doing the best it can… but there’s always space in html. 🙂

(Thank you for this conversion btw.)

25 August at 0:36 | Open on mastodon.cloud

Adriano

@simon Saw this pass by, and the combo
"It did pretty well!
[...]
Since I hadn't checked the conversion for correctness..."

is kinda the thing us LLM detractors like to latch on.

26 August at 15:49 | Open on lile.cl

Simon Willison

@adriano sure, that’s why I specifically called it out - they’re not wrong about that kind of thing

26 August at 16:54 | Open on fedi.simonwillison.net

Simon Willison

@adriano but if they chose not to consider how much time this would save in terms of doing the work to produce a verified, 100% correct HTML version of that PDF that’s on them at this point

26 August at 16:56 | Open on fedi.simonwillison.net

Neil Kandalgaonkar

@simon Huh, I had wondered about exactly this as a service, but hadn’t made the connection with AI

Panning and scanning on mobile is terrible. And I am just a casual scientific paper reader

25 August at 0:19 | Open on xoxo.zone