Email or username:

Password:

Forgot your password?
248 posts total
Simon Willison

My notes on Google Research's new paper describing "pipe syntax", their alternative syntax for SQL queries which they've been rolling out internally since February simonwillison.net/2024/Aug/24/

For this SQL query:

SELECT component_id, COUNT(*)
FROM ticketing_system_table
WHERE
  assignee_user.email = 'username@email.com'
  AND status IN ('NEW', 'ASSIGNED', 'ACCEPTED')
GROUP BY component_id
ORDER BY component_id DESC;

The Pipe query alternative would look like this:

FROM ticketing_system_table
|> WHERE
    assignee_user.email = 'username@email.com'
    AND status IN ('NEW', 'ASSIGNED', 'ACCEPTED')
|> AGGREGATE COUNT(*)
   GROUP AND ORDER BY component_id DESC;
Show previous comments
Brian Reiter

@simon this looks a lot like Microsoft’s LINQ declarative query syntax.

lostprototype

@simon - This will be some hilarious prank on DBAs who have spent the last 2 decades irrationally railing against ORMs - Entity Framework in particular.

Under the hood, it's probably quite similar to LINQ and then an engine to translate the AST to SQL!

Bastian Venthur

@simon Came for the SQL, stayed for the rant about PDFs with two columns 😅

Simon Willison

Today’s dumb way to entertain myself with LLMs:

> Write a COBOL program that my dog would enjoy. Include instructions for compiling and running it on macOS.

TIL how to compile and run a COBOL program using GnuCOBOL!

brew install gnu-cobol
cobc -x DOG-GAME.cob -o dog_game
./dog_game

Sadly it didn’t work on the first go, Claude 3.5 Sonnet missed that COBOL requires tabs, not spaces

Transcript: gist.github.com/simonw/64026b4

Today’s dumb way to entertain myself with LLMs:

> Write a COBOL program that my dog would enjoy. Include instructions for compiling and running it on macOS.

TIL how to compile and run a COBOL program using GnuCOBOL!

brew install gnu-cobol
cobc -x DOG-GAME.cob -o dog_game
./dog_game

Sadly it didn’t work on the first go, Claude 3.5 Sonnet missed that COBOL requires tabs, not spaces

Simon Willison

Can modern screen readers read academic papers that are published as two column PDFs? Do they know how to separate out the two columns?

Show previous comments
Mikołaj Hołysz

@simon Short answer, just don't, preferrably provide both a HTML alternative and the LaTeX source.

PDF is essentially a vector graphics format, the ultimate end goal of PDF is making a document that prints and displays in exactly the same way for everybody, everything else is secondary. In HTML, the "recommended way to do things" is to essentially say "put a h1 here" and let the browser deal with it, possibly with some help from your style sheet along the way. In PDF, you essentially say "hey, here's some text, put it 2.7 inches from the left margin, 16 point, use font so and so". If you were so inclined, you could even re-order the characters in your font and use completely nonsensical codepoints, and things would still pretty much work visually.

LaTeX definitely uses shenanigans like that, Polish diacritics for example aren't expressed as a single character. Instead, the English letter is used, along with some extra markup that tells the renderer where to draw the acute accents on the page. Those acute accents aren't actually part of the character from an a11y perspective though, they're just random squiggles that the renderer happens to be told to draw. Some say that modern JS frameworks are crazy, I say that PDF is far, far crazier than that.

Speaking onf the two-column stuff in particular, I've seen it work and I've also seen it not work, this probably depends on where the text goes in the document, what it is rendered with, and probably on what software you're using and what their a11y implementation is like.

Yes, there's a way to mark PDFs up for accessibility properly, but very few people do it, LaTeX makes it far harder, there are a lot of other problems (think math), and support among reading programs is... spotty at best.

@simon Short answer, just don't, preferrably provide both a HTML alternative and the LaTeX source.

PDF is essentially a vector graphics format, the ultimate end goal of PDF is making a document that prints and displays in exactly the same way for everybody, everything else is secondary. In HTML, the "recommended way to do things" is to essentially say "put a h1 here" and let the browser deal with it, possibly with some help from your style sheet along the way. In PDF, you essentially say "hey, here's...

Adam Marcus

@simon I believe @jbigham has a few things to say about this

Simon Willison

As an experiment I downloaded the two column PDF of this new paper from Google research "SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL" research.google/pubs/sql-has-p

... and uploaded it to Google AI Studio and told Gemini Pro 1.5 "Convert this document to neatly styled semantic HTML" - and the results were pretty good! static.simonwillison.net/stati

Simon Willison

Lots of people are asking why Anthropic and OpenAI don't support OAuth, so you can bounce them through those providers to get a token that uses their API budget for your app

My guess: they're worried malicious app developers would use it to trick people and obtain valid API keys

Simon Willison

Imagine a version of my dumb little "write a haiku about a photo you take" page which used OAuth, harvested API keys and then racked up hundreds of dollar bills against everyone who tried it out running illicit election interference campaigns or whatever

tools.simonwillison.net/haiku

Simon Willison

An interesting thing about CORS is how poorly understood it is and how difficult it is to find a really clear explanation

I’m not sure I could write a clear explanation myself

The best I’ve seen is jakearchibald.com/2021/cors/

Show previous comments
Luis Lavena

@simon indeed is cryptic. The best reference I found was this one: jub0bs.com/posts/2023-02-08-fe

But the one you shared looks nice!

happyborg

@simon I hope CORS is half as frustrating for those trying to break web security as those of us just trying to do legitimate stuff 🤦‍♂️

Simon Willison

This story about why some companies are reconsidering their Microsoft Copliot 365 rollouts is amusing - in this case the problem is that the AI chatbot is /too/ effective, in that if you haven’t correctly configure permissions on documents like the employee salary spreadsheet anyone in your org who asks about it will get the right answer! simonwillison.net/2024/Aug/23/

Steven

@simon Fun story. I worked for a company that used Confluence for wikis and such. The VP of engineering would write all the private team meetings in here. Yet I was not part of the allowed members to see such content.

However, I just told Confluence to e-mail me daily updates on when this VP posted.

So I would get a daily e-mail that showed the contents of these meetings delivered conveniently to my inbox.

Not sure if this would still work as it has been many years… 😀

[DATA EXPUNGED]
Simon Willison

Any screen reader users able to help idenfity the best pattern for ensuring this proposed fediverse symbol gets read out loud correctly in different software?
typo.social/@FediverseSymbol/1

Pratik Patel

@simon I'll file a report with Apple to see what's going on with Voiceover/other speech components. I'll also do some testing with different synthesizers. It could be that Apple will need to add this symbol to their emojis.

Tristan

@simon Tested in latest NVDA on Windows and this symbol is read as "Asterism," regardless of speech synthesizer. Using latest version of JAWS on Windows, it is not read at all. There is no pause, it's just skipped. Same for Windows Narrator. iOS 18 beta reads it as asterism, likewise for macOS beta. The only way to fix this is to beg and plead with screen reader maintainers to add a default pronunciation for ⁂. Users can add a replacement themselves, but this is a somewhat technical process.

Simon Willison

Sent out the latest edition of my newsletter - everything I’ve posted on my blog in the past couple of weeks, which it turns out adds up to a lot of stuff! simonw.substack.com/p/claudes-

Simon Willison

I like this idea, but I worry about accessibility

I just tried using the Mobile Safari “read this page” feature and it skipped right over the ⁂ symbol as if it wasn’t even there
typo.social/@FediverseSymbol/1

James Scholes

@simon My screen reader (NVDA) announces it, but it's four syllables so might get a bit tedious. No idea about other SRs or how it manifests in braille.

Simon Willison

@FediverseSymbol maybe this is an Apple bug? At least one screen reader, NVDA, pronounces the symbol correctly as “Asterism” dragonscave.space/@jscholes/11

Fediverse Symbol ⁂

@simon Ah, that’s annoying, would have expected “asterism” or some description. Depending on the usage, the symbol might be decorative and that’s maybe okay. But we need to think about cases where it convey meaning.

Simon Willison

Awesome new Anthropic API feature: you can now enable CORS support with a (currently undocumented) anthropic-dangerous-direct-browser-access: true request header - which means you can call their API directly from browser JavaScript now!

My notes here: simonwillison.net/2024/Aug/23/

Simon Willison

I used it to upgrade my fun little Haiku app, which uses your webcam to take a photo and then writes a Haiku about it using the Claude Haiku model

tools.simonwillison.net/haiku

A photo of my dog Cleo resting in a yellow bed - a haiku reads:

Resting peaceful dog,
Curled up on soft, vibrant bed,
Contentment abounds.

The URL tools.simonwillison.net is shown in the URL bar at the bottom of the page, with the camera icon red - buttons for taking a photo and swapping the camera are visible
Simon Willison

Armin’s take on what happens if Astral turn out to be a bad steward of uv is interesting: “[…] having seen the code and what uv is doing, even in the worst possible future this is a very forkable and maintainable thing. I believe that even in case Astral shuts down or were to do something incredibly dodgy licensing wise, the community would be better off than before uv existed.”
hachyderm.io/@mitsuhiko/112999

Andrea Grandi 🦕

@simon what if uv is the nose and nobody is looking at the Moon yet?

I'm pretty sure their next product will be a PyPi alternative. Then they will add proprietary stuff. This will fragment the market and devs/companies will have to release Python packages to two places instead of one. Then it will become paid only etc... (one can only imagine the possibilities).

I'm not saying this will 100% happens, but it's definitely possible.

uv alone, is not an issue.

Simon Willison

Wow there is a LOT of stuff in the new release of uv - lockfiles, a pipx alternative, Poetry-style project management, even the ability to manage and download standalone Python versions directly

It's taking me a while to dig through all of this astral.sh/blog/uv-unified-pyth

Show previous comments
Paul Everitt

@simon So much to unpack. As a minor-minor-nice-to-have (but really wish Python would settle on something here)...scaffold/template support: github.com/astral-sh/uv/issues

Simon Willison

New prompt injection data exfiltration attack today, this time against Slack and Slack AI

It's a bit of a subtle one, but the net effect is that if you can get your malicious tokens into a Slack you can get their AI bot to trick users into exfiltrating private data by clicking on links

My notes here: simonwillison.net/2024/Aug/20/

Original report by PromptArmor here: promptarmor.substack.com/p/dat

Simon Willison

i quit my job just over 5 years ago to explain computer things (jvns.ca/blog/2019/09/13/a-year). I had no idea if I would like being my own boss but ultimately it's been really cool and I'm happy to have this weird job writing zines about computers.

("I’m not planning to hire employees or anything” turned out to not be an accurate prediction, now I work with 2 part-time employees who I don't know how I would manage without)

Show previous comments
Ron Jeffries

@b0rk
Watching from afar, I admire you for trying it, and admire you more for the success you are finding. Well done!

Julia Evans

someone asked me recently how long it took me to get used to the rhythm of working for myself and I said “uh, maybe 3 years?”. I thought working for myself would be hard to adjust to and it was, but I'm happy I did it anyway

Simon Willison

My notes on trying out whisperfile, the new cross-platform executable packaging for the Whisper speech-to-text model, released as part of the latest update to llamafile
simonwillison.net/2024/Aug/19/

Simon Willison

Fixed a bug in my @covidsewage bot caused by a change to the underlying website - details on the bug fix here: github.com/simonw/covidsewage-

It now uses this nested shot-scraper monstrosity which delights me greatly:

shot-scraper -o /tmp/covid.png $(
shot-scraper javascript \
$URL \
'document.querySelector("iframe").src' \
-b firefox \
--user-agent 'Mozilla/5.0 (Macintosh...' \
--raw
) --wait 5000 -b firefox --retina

fedi.simonwillison.net/@covids

Fixed a bug in my @covidsewage bot caused by a change to the underlying website - details on the bug fix here: github.com/simonw/covidsewage-

It now uses this nested shot-scraper monstrosity which delights me greatly:

shot-scraper -o /tmp/covid.png $(
shot-scraper javascript \
$URL \
'document.querySelector("iframe").src' \
-b firefox \
--user-agent 'Mozilla/5.0 (Macintosh...' \
--raw
) --wait 5000 -b firefox --retina

Simon Willison

Everyone who builds web applications should read the Reckoning series by @slightlyoff infrequently.org/series/reckon

My own notes here, but you should work through the entire thing: simonwillison.net/2024/Aug/18/

Seriously, take a look at the case-study in which the California food stamps signup site takes 29.5s to become interactive on a slow rural mobile connection, and tell me we don't urgently need to do better! infrequently.org/2024/08/objec

Show previous comments
Bob 🇺🇲♒🐧🪖

@simon @slightlyoff

Killing all the tracking code to Google and all corporates makes our services on MPAQ, very fast. We are even working on self hosting a hit counter 😜 outsourcing is a major draw on performance.

Bill Zaumen

@simon @slightlyoff It is not just javascript. In spite of a high-bandwidth connection, I've been getting very bad response loading web pages recently. After diagnosing the problem, I found that there was 76% to 91% packet loss on DNS requests to the servers my ISP configures! I'm about to file a service request, but in the meantime, I changed DNS servers to publicly available ones.

What made it worse was the number of DNS requests now needed to load a web page.

razze

@simon @slightlyoff I don't think saying Javascript is the cause of this is honest or helpful.

Simon Willison

I built a fun new Datasette plugin: datasette-checkbox, which looks for columns on a table called is_* or has_* or should_* and upgrades them to interactive checkboxes, provided the current user has update-row permission for that table simonwillison.net/2024/Aug/16/

Casey Gollan

@simon I love that you share your transcripts. So cool to see the behind the scenes. 🙌 Eventually I wonder if we’ll land on a standardized/automated way to attach prompt transcripts to commits as a kind of provenance artifact.

Simon Willison

Interesting notes from Paul Gauthier on how asking an LLM to return code wrapped in a JSON object can result in a quality reduction compared to asking for that code in a less complex format such as fenced code Markdown blocks aider.chat/2024/08/14/code-in-

(Cross-posted from my blog: simonwillison.net/2024/Aug/16/)

Coding skill by model and code wrapping strategy - four models, each showing their pass rate % average of five runs. Claude 3.5 Sonnet gets 60.5% with Markdown, 54.1% with JSON. DeepSeek-Coder V2 0724 gets 60.6% with Markdown, 51.1% with JSON. GPT-4o-2024-05-13 gets 60.0% with Markdown, 59.6% with JSON. GPT-4o-2024-08-06 gets 60.8% with Markdown, 57.6% with JSON, and 56.9% with JSON (strict). Markdown consistently performs better than JSON across all models.
Martin Owens :inkscape:

@simon

Format encoding robs the network of capacity I said. Better to write encoder/decoders I said. Just because it can write #svg don't mean it should I said. 😉

Good to have data though.

This might be interesting for you @diacritica

Go Up