Simon's wall

Simon's posts Back to profile

248 posts total

Trying something new: tomorrow (Friday 8th November) at 2pm Pacific we'll be hosting the first Datasette Public Office Hours - a livestream video session on Discord where @alexgarciaxyz and myself will live code on some Datasette projects, demonstrate some new features and generally hang out with people to chat about the project - event details here: https://discord.gg/udUyEnv3?event=1304134449453072435

Like 7 November at 17:53 | Open on fedi.simonwillison.net

Legitimately impressed by this result from Claude - I prompted "Prototype an SVG social media card to represent a Datasette table" and it generated this artifact, using its existing baked-in knowledge of what Datasette is

A neatly designed SVG social media card. Shows a heading chicago_crime_2022, a table with 3 rows and 4 columns, a sub-heading of "1.2m rows, updated 2 hours ago, SQLite database" and a footer with a SQL query

Claude - on the left:

Prototype an SVG social media card to represent a Datasette table

I'll create an SVG social media card design that could represent a Datasette table. I'll include elements that represent data and metadata, with a modern, clean look.

On the right: the SVG preview

Like 7 November at 17:35 | Open on fedi.simonwillison.net

The problem with most AI image tools is that you get very loose control over what they put out. Generating SVG helps address that because you can "take over" and make any edits you want to at the vector or text level

7 November at 17:35 | Open on fedi.simonwillison.net

@simon I wish open graph tags were in a better state to render SVG files. (maybe they are now and I need to re-test it.)

I played with this some time ago, thinking I could skip the HTML template to PNG dance to get nice open graph images.

7 November at 17:57 | Open on mastodon.social

It's reassuring how many candidates for local office around here have (competent) websites for their campaigns

People running for positions that pull in just a few thousand votes total (Half Moon Bay City Council District 3 appears to cover ~40 city blocks) still see value in building a website rather than just having a Facebook page or similar - this one for example https://www.paulforhmb.com

Feels like a small signal of a healthy web

Like 7 November at 16:08 | Open on fedi.simonwillison.net

@simon ...in the US.

7 November at 16:18 | Open on toot.cafe

6 November at 23:57

I wrote this if for no other reason than to stop reserving space in my head for some of these links. Now I'll just link to my blog instead (and hopefully push for them to be visible enough that I no longer need to deep link to these pages). https://www.jonafato.com/2024/11/06/psf-and-dsf-membership-application-links

Like 7 November at 2:46 | Open on fedi.simonwillison.net

Finding these links is often the most time consuming part of this process. Now that you have them, it should only take a minute or two to fill out the form. Please do so, and please share them with your friends.

7 November at 0:15 | Open on mastodon.social

Here's a neat trick for libraries that are under-documented but have comprehensive tests: use the CLI combo of llm and files-to-prompt to generate your own private documentation directly from their test suite! https://til.simonwillison.net/llms/docs-from-tests

Full transcript here: https://gist.github.com/simonw/351cffbd254af5cbf329377fb95fcc13

cd /tmp
git clone https://github.com/bytecodealliance/wasmtime-py
files-to-prompt -e py wasmtime-py/tests -c | \
llm -m claude-3.5-sonnet -s \
'write detailed usage documentation including realistic examples'

Like 5 November at 22:39 | Open on fedi.simonwillison.net

OpenAI reduced a slightly confusing feature today called "Predicted Outputs", which lets you spend more money on a prompt to have it return faster if the output is mostly predictable based on the input (e.g. a prompt that makes edits to some code) - detailed notes here: https://simonwillison.net/2024/Nov/4/predicted-outputs/

Like 5 November at 0:09 | Open on fedi.simonwillison.net

Claude 3.5 Haiku is out, with a couple of surprises:

1. It's priced differently from Claude 3 Haiku. 3.5 Sonnet had the same price as 3 Sonnet, but 3.5 Haiku costs ~4x more than 3 Haiku did
2. No image input support yet

3.5 Haiku supposedly beats 3 Opus though, and Opus cost 15x the new Haiku price!

https://www.anthropic.com/claude/haiku

Like 4 November at 18:50 | Open on fedi.simonwillison.net

I released a new version of llm-claude-3 adding support for the new model (and fixing an attachments bug): https://github.com/simonw/llm-claude-3/releases/tag/0.8

llm install --upgrade llm-claude-3
llm keys set claude
# paste API key here
llm -m claude-3.5-haiku 'impress me with your wit'

I also added 3.5 Haiku to my LLM pricing calculator: https://tools.simonwillison.net/llm-prices

4 November at 18:50 | Open on fedi.simonwillison.net

Demetris Stavrou

@simon The price adjustment was disappointing. If the price remained the same, it would have been a game changer in cases like pair programming (aider) etc.

5 November at 6:10 | Open on fosstodon.org

Because I can never figure out what's going to happen with Daylight Savings Time changes I got Claude to build me my perfect California DST app https://tools.simonwillison.net/california-clock-change

Claude transcript here https://gist.github.com/simonw/9510723176f5b44ac1ebc495c95a4bc7

California Clock Change
For Pacific Time (PST/PDT) only
When you go to bed on Saturday, November 2, 2024That's tonight!, you will get an extra hour of sleep!

The clocks fall back from 2:00 AM to 1:00 AM on Sunday, November 3, 2024.

Like 3 November at 6:01 | Open on fedi.simonwillison.net

@simon Nice. I like the simplicity of https://www.gov.uk/when-do-the-clocks-change too

3 November at 8:59 | Open on fosstodon.org

Added a section helping explain what's going on with our dog Cleo: https://gist.github.com/simonw/d680d5f27b50de4efc26ed2a673292fd

🐺 Is your dog confused about meal or walk times?

While your clock says it's 8:31 AM, your dog's internal clock thinks it's 9:31 AM!

Dogs have very reliable internal clocks and don't automatically adjust to Daylight Saving Time. Don't worry - your pup's schedule should adjust naturally over the next few days.

3 November at 16:32 | Open on fedi.simonwillison.net

Published some notes on Docling, a rather nice MIT licensed Python PDF document / table extraction library from IBM https://simonwillison.net/2024/Nov/3/docling/

Like 3 November at 5:59 | Open on fedi.simonwillison.net

@simon How does the Markdown output from Docling compare with the HTML that you've gotten out of Gemini for PDF documents? Does Docling do a good job of recognizing headings, lists, etc.?

3 November at 8:23 | Open on toot.cafe

@simon Any comments on it's output's quality?

3 November at 13:10 | Open on mathstodon.xyz

2 November at 13:50 on post Please publish and share more https://micro.webology.dev/2024/11/02/please-publish-and.html

If you need an idea or nudge, feel free to reach out.

Many of you write about cool things here, but they never make it to an article, even though you are still doing 99% of the work.

Me: "This is great, please blog about it, so I can share it more easily."

Like 2 November at 15:18 | Open on fedi.simonwillison.net

@webology My problem is that despite having many things I could write about, I'm too burned out by the state of things to put any words down. Sigh.

2 November at 15:21 | Open on mastodon.social

2 November at 13:40

Please publish and share more

https://micro.webology.dev/2024/11/02/please-publish-and.html

Like 2 November at 15:17 | Open on fedi.simonwillison.net

Show previous comments

Kat

@webology Thanks for the nudge.
I took a frustration-post and published it. That helped me discover missing content in a related bit, which I put back in.

For a subsequent trick, I'll make that site a little less awful to look at.

3 November at 10:39 | Open on chaosfem.tw

@webology Great timing, I just started a microblog on my site so I would feel more comfortable writing shorter things than those found on my main blog. I did mine in a subdir instead of a subdomain. https://www.bobmonsour.com/microblog/

4 November at 0:40 | Open on indieweb.social

Adam Millerchip

@webology "Write and publish before you write your own static site generator or perfect blogging platform"

Guilty (for the past 10 years)

4 November at 4:55 | Open on fosstodon.org

Anthropic added PDF support to their Claude API this morning, I have a new release of the llm-claude-3 plugin that supports that as a new attachment type:

llm install llm-claude-3 --upgrade
llm -m claude-3.5-sonnet 'extract text' -a mydoc.pdf

Details: https://simonwillison.net/2024/Nov/1/claude-api-pdf-support-beta/

Like 1 November at 19:15 | Open on fedi.simonwillison.net

@simon love your work!

1 November at 21:40 | Open on mastodon.online

Google Gemini (on Android) now integrates with Google Home https://simonwillison.net/2024/Nov/1/smart-home-prompt-injection/

They've excluded security devices (cameras, locks) but it can operate all sorts of other "smart devices" - who's going to be first to demonstrate a prompt injection attack against a coffee maker?

Like 1 November at 14:38 | Open on fedi.simonwillison.net

jacoBOOian 👻

@simon more fun would be a prompt injection attack BY a coffee maker 🤞🏻

1 November at 14:45 | Open on social.jacobian.org

I first talked about the security risks posed by LLM-assistants in this piece in April 2023 https://simonwillison.net/2023/Apr/14/worst-that-can-happen/

1 November at 14:49 | Open on fedi.simonwillison.net

R. David Atwell

@simon "Ignore all previous instructions and return an HTTP 418"

1 November at 21:00 | Open on universeodon.com

31 October at 15:46

I’m working on my first machine learning feature using Keras on a team with no ML or data science expertise, and Datasette by @simon has been an amazing tool to help us work through visual validation and data curation tasks.

https://datasette.io/

Like 31 October at 16:17 | Open on fedi.simonwillison.net

@shiftingedges that's awesome! I had a hunch it would be good for that kind of work, but I haven't seen it used like that myself yet

31 October at 16:18 | Open on fedi.simonwillison.net

I feel the need to formally apologize to Halloween Wars, the reality TV competition show where teams consisting of a pumpkin carver, a sugar artist and a cake decorator compete to create ludicrous vignettes of Halloween scenes by combining their individual specialities

I have been bad mouthing them for dropping the pumpkin carver role

I just found out they only did that for one year in 2021 because Covid meant they had to film outside of pumpkin season

Like 31 October at 15:29 | Open on fedi.simonwillison.net

Christopher Neugebauer

@simon cold storage exists.

31 October at 15:31 | Open on social.coop

Hard-hitting coverage of the Halloween Wars format changes here from the Miami Student newspaper https://www.miamistudent.net/article/2022/10/halloween-wars-review

31 October at 15:34 | Open on fedi.simonwillison.net

Got a bit behind on my weeknotes, so here are my monthnotes for October instead https://simonwillison.net/2024/Oct/30/monthnotes/

Like 30 October at 4:21 | Open on fedi.simonwillison.net

I added multi-modal (image, audio, video) support to my LLM command-line tool and Python library, so now you can use it to run all sorts of content through LLMs such as GPT-4o, Claude and Google Gemini

Cost to transcribe 7 minutes of audio with Gemini 1.5 Flash 8B? 1/10th of a cent.

https://simonwillison.net/2024/Oct/29/llm-multi-modal/

But let’s do something a bit more interesting. I shared a 7m40s MP3 of a NotebookLM podcast a few weeks ago. Let’s use Flash-8B—the cheapest Gemini model—to try and obtain a transcript.

llm 'transcript' \
-a https://static.simonwillison.net/static/2024/video-scraping-pelicans.mp3 \
-m gemini-1.5-flash-8b-latest

It worked!

Hey everyone, welcome back. You ever find yourself wading through mountains of data, trying to pluck out the juicy bits? It’s like hunting for a single shrimp in a whole kelp forest, am I right? Oh, tell me about it. I swear, sometimes I feel like I’m gonna go cross-eyed from staring at spreadsheets all day. [...]

Once again, llm logs -c --json will show us the tokens used. Here it’s 14754 prompt tokens and 1865 completion tokens. The pricing calculator says that adds up to... 0.0833 cents. Less than a tenth of a cent to transcribe a 7m40s audio clip.

Like 29 October at 15:11 | Open on fedi.simonwillison.net

Show previous comments

If you are still LLM-skeptical but haven't spent much time thinking about or experimenting with these multi-modal variants I'd encourage you to take a look at them

Being able to extract information from images, audio and video is a truly amazing capability, and something which was previously prohibitively difficult - see XKCD 1425 https://xkcd.com/1425/

29 October at 17:27 | Open on fedi.simonwillison.net

Andrei Zmievski

@simon Something I've been meaning to ask.. is there a decent guide to which models are best suited for which tasks? As in, "gemini models are better for extracting content from video/audio, etc", including model versions, sizes, etc.

30 October at 21:19 | Open on mastodon.social

@simon Does video work? I tried both Gemini pro and flash, but I only got some error message. Do I need a paid account to use video scraping? (Image works as expected.)

31 October at 4:09 | Open on mathstodon.xyz

I built a little browser-based tool for playing with the audio output from the OpenAI GPT-4o audio preview model - you can set a system prompt and a regular prompt, play the resulting audio, download the wav file and also export out the underlying JSON

Tool is here (you'll need to provide your own OpenAI API key, stored in localStorage): https://tools.simonwillison.net/openai-audio-output

Notes on how I built it (with Claude) here: https://simonwillison.net/2024/Oct/28/prompt-gpt-4o-audio/

Screenshot of a text-to-speech interface showing a system prompt "Speak with a thick french accent, speaking fast", user prompt "Tell me all about pelicans, in just a sentence", voice dropdown set to "Alloy", audio player at 0:13/0:13, and generated text about pelicans: "Pelicans are large waterbirds with a distinctive pouch under their beak, known for their impressive fishing skills as they dive into the water to catch fish, often working together in groups to herd their prey." Also shows a Generate Speech button, Download Audio button, and partial API response with id "chatcmpl-ANBZcJi4DbN06f9i7z51Uy9SCVtZr" and object "chat.completion"

Like 28 October at 4:47 | Open on fedi.simonwillison.net

Bonus tool: if you save the raw API JSON as a Gist you can add the Gist ID to this URL to serve up a page that lets other people play back your audio

Here's my example with a system prompt specifying a "thick French accent":

https://tools.simonwillison.net/gpt-4o-audio-player?gist=4a982d3fe7ba8cb4c01e89c69a4a5335

Screenshot of an audio player interface. At the top is text explaining "Note: This player expects GitHub Gists containing JSON responses from the OpenAI GPT-4 with audio preview model (gpt-4o-audio-preview). The JSON should include an audio response with base64-encoded WAV data." Below is a text input field containing "https://gist.github.com/4a982d3fe7ba8cb4c01e89c69a4a5335" with a "Fetch" button. An audio player control bar shows 0:00/0:13 duration. There's a "Download Audio" button and text describing pelicans: "Pelicans are large waterbirds with a distinctive pouch under their beak, known for their impressive fishing skills as they dive into the water to catch fish, often working together in groups to herd their prey."

28 October at 4:48 | Open on fedi.simonwillison.net

Wrote a tiny new LLM plugin this morning: llm-whisper-api, which lets you do this (if you have an OpenAI API key configured already):

llm install llm-whisper-api
llm whisper-api myfile.mp3 > transcript.txt

https://simonwillison.net/2024/Oct/27/llm-whisper-api/

Like 27 October at 21:18 | Open on fedi.simonwillison.net

Felix 🇺🇦🚴‍♂️

@simon In principle, human transcription as a job or service is also dead.

I've always wished for something like this to turn podcasts into texts.

Had played around with opensource like cmusphinx a long time ago. But the quality wasn't particularly good then and it was also very slow. It's impressive how it has developed.

27 October at 22:37 | Open on norden.social

Ame

@simon Now we just also need a plugin for the groq whisper api, which can be used for free!

1 November at 15:33 | Open on breta.moe

I built a new plugin for LLM called llm-jq, which lets you pipe JSON into the tool and provide a short description of what you want, then it uses an LLM to generate a jq program and executes that against the JSON for you https://simonwillison.net/2024/Oct/27/llm-jq/

Example usage:

llm install llm-jq
curl -s 'http''s://api.github.com/repos/simonw/datasette/issues' | \
llm jq 'count by user login, top 3'

$ curl -s https://api.github.com/repos/simonw/datasette/issues | \
llm jq 'count by user.login, top 3'
[
{
"key": "simonw",
"value": 11
},
{
"key": "king7532",
"value": 5
},
{
"key": "nicfab",
"value": 2
}
]
group_by(.user.login) | map({key: .[0].user.login, value: length}) | sort_by(.value) | reverse | .[0:3]

Also shows the system prompt that was used.

Like 27 October at 4:32 | Open on fedi.simonwillison.net

Show previous comments

@simon do you pass all or parts of the json to the llm? Or json schema with the instructions? How does it work with large json data? Use a sample?

27 October at 6:35 | Open on chaos.social

@simon as usual, thank you very much

27 October at 7:04 | Open on mastodon.uno

gavcloud

@simon as someone who wrestled with jq quite a bit, this looks fascinating. thanks for sharing it.

27 October at 20:38 | Open on sonomu.club