Email or username:

Password:

Forgot your password?
248 posts total
Simon Willison

I keep finding new ways to entertain myself with ChatGPT advanced voice mode...

"I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me exclusively in Spanish"

Simon Willison

I spun up a new LLM benchmark: how well can they handle this prompt?

Generate an SVG of a pelican riding a bicycle

I find the results so far utterly delightful: simonwillison.net/2024/Oct/25/

Here's Claude 3.5 Sonnet (2024-06-20) and Claude 3.5 Sonnet (2024-10-22):

Two images that are recognizable as pelicans on bicycles

Gemini 1.5 Flash 001 and Gemini 1.5 Flash 002:

Two images that are NOT.
Show previous comments
Jeremy Kun

@simon I guess "draw a unicorn in tikz" has already been tainted in the training data?

alphaomega

@simon
Just and only my 2 cents: some people do formula 1 racing as a hobby. Or wakeboarding. Or use LLMs.
It's not forbidden. It's fascinating. It's fun.
<-> It's energy consuming

Simon Willison

How hard is it to process untrusted SVG data to strip out any potentially harmful tags or attributes (like stuff that might execute JavaScript)?

I feel like this is well trodden ground for HTML these days, are there robust solutions for the SVG version of this problem?

Show previous comments
João S. O. Bueno

@simon based on the exoerience of people who tried to create a Python sandbox over the decades, I'd say it is pretty much impossible. (save for a browser saparayed as another page box: i.e. a "Frame")

martin sereinig

@simon CSP is probably a good second layer of security, no matter what you end up doing.

Pelle Wessman

@simon I wonder if you could do a similar approach as eg Figma used for their third party plugins: Make it all happen in a WASM script that’s sandboxed

So that and render to a canvas?

Simon Willison

Constantly persisting everything in a web form to localstorage in case of browser, tab accidents is such a cheap and effective trick!
mastodon.social/@timbornholdt/

Simon Willison

I keep meaning to knock up a little bit of JavaScript for the Django Admin that does this automatically for every add form

Sam Wilson

@simon We recently added that sort of feature to MediaWiki: mediawiki.org/wiki/Help:Edit_R (indexedDB rather than localStorage though, because it can store more).

Vít Skalický :fedora:

@simon don't browsers do some level of form persistance if you use plain HTML forms?

Simon Willison

Anthropic's claude.ai/ grew a new feature today: an equivalent of OpenAI's ChatGPT Code Interpreter mode, where the chatbot can write and then execute code in order to help answer questions (e.g. to run calculations that are beyond a next-token-predicting LLM)

OpenAI use server-side Python for this, but Anthropic instead chose to use client-side JavaScript running in a Web Worker.

Here are my notes so far on the new feature: simonwillison.net/2024/Oct/24/

Claude screenshot. I&#39;ve uploaded a uv.lock file and prompted &quot;Write a parser for this file format and show me a visualization of what&#39;s in it&quot; Claude: I&#39;ll help create a parser and visualization for this lockfile format. It appears to be similar to a TOML-based lock file used in Python package management. Let me analyze the structure and create a visualization. Visible code: const fileContent = await window.fs.readFile(&#39;uv.lock&#39;, { encoding: &#39;utf8&#39; }); function parseLockFile(content) ... On the right, an SVG visualization showing packages in a circle with lines between them, and an anyio package description
Show previous comments
Michael Hunger

@simon interesting choice.
Delegating the sandbox to the browser and compute to the user. what would be attack vectors here?

Prem Kumar Aparanji 👶🤖🐘

@simon would be cool to see Llama 3.2 1B or similar doing it right inside the browser 😄

Stephan Druskat

@simon Interesting. Do you know (or does the code show) why the vis nodes for the dependencies are of different size?

Simon Willison

I finally have a procedure in place I like for hacking on Python CLI apps using a development environment managed by uv - full notes here:
til.simonwillison.net/python/u

Show previous comments
Luca Baggi

@simon Cool post! I am not sure the `tool.uv.dev-dependencies` is the same as, or interoperable with, PEP 735 though.

Marius Gedminas

@simon Extra tip: you can do uv tool install -e ., and now you can run demo-app directly, without the uv run prefix, in any directory.

Quinn Comendant

@simon Then, for scripts you run frequently, do you just add to your.bashrc `alias demo=“uv run python -m demo_app”`?

My few python CLI apps each have a .venv directly, but it’s tiresome to have to juggle activations: `source ~/bin/demo/.env/activate && demo && deactivate’. 🫠

Simon Willison

I know this should be obvious, but it still surprises me how much more fun my blog feels now that I've started habitually using images in my posts

Here's two screenshots and a GIF demo from today's posts simonwillison.net/2024/Oct/23/

v̾i̾t̾r̾i̾o̾l̾i̾x̾

@simon it does make the posts more engaging. our little rat brains like stimulation

Simon Willison

I built a Bash script for running prompts with images or PDFs against the Google Gemini models - a prototype of how multi-modal support for my LLM CLI tool is going to work

It's so much fun to play with, especially since Gemini somehow costs less than 1/10th of a cent per image

til.simonwillison.net/llms/pro

Simon Willison

Came up with a creative way to post quotes from video content on my blog - since my quotes support images I can run MacWhisper to extract a text transcript of part of the video, then drop a screenshot in the middle to illustrate the quote

simonwillison.net/2024/Oct/23/

Screenshot of my blog:

According to a document that I viewed, Anthropic is telling investors that it is expecting a billion dollars in revenue this year.

Then a screenshot of the video appearance. The alt text on that is:A CNBC Money Movers broadcast screenshot showing financial data. A news anchor in a green blazer appears on the left with the San Francisco Bay Bridge visible behind her. The screen displays ANTHROPIC EST. 2024 REV DOCUMENT SEEN BY CNBC: with a breakdown showing Third-party API: 60-75% of sales, Direct sales API: 10-25%, Chatbot subs: 15%, Professional services: 2%. The lower third chyron reads ANTHROPIC REV EXPECTED TO SURGE

Text continues: Third-party API is expected to make up the majority of sales, 60% to 75% of the total. That refers to the interfaces that allow external developers or third parties like Amazon&#39;s AWS to build and scale their own AI applications using Anthropic&#39;s models.

That is by far its biggest business, with direct API sales a distant second projected to bring in 10% to 25% of revenue. Chatbots, that is its subscription revenue from Claude, the chatbot, that&#39;s expected to make up 15% of sales in 2024 at $150 million.

— Deirdre Bosa, CNBC Money Movers
Reilly Wood

@simon Do you understand the difference between third party and direct sales in that quote? Third party sounds like regular API key usage, I’m unclear on what counts as direct sales

Simon Willison

The Internet Archive being down helps expose quite how much I rely on their Wayback Machine - I've headed over there are least four times in the past week and been disappointed at not being able to use it

Simon Willison

Today people are saying that it's interesting that Anthropic's Claude 3.5 Opus model is no longer mentioned on their models page - but without the archive I can't see for myself if it used to be listed there or not docs.anthropic.com/en/docs/abo

Simon Willison

Anthropic released a fascinating new capability today called "Computer Use" - a mode of their Claude 3.5 Sonnet model where it can do things like accept screenshots of a remotely operated computer and send back commands to click on specific coordinates, enter text etc

My notes on what I've figured out so far: simonwillison.net/2024/Oct/22/

Show previous comments
jmjm

@simon is this the much vaunted vaporware, the Large Action Model?

Duncan Lock

@simon
This feels like it's a step on the way to automating many millions of office admin jobs - which are often "copy and paste stuff from one computer system to another, sometimes editing it". Sobering thinking of how many people are potentially affected by this stuff.

Mikołaj Hołysz

@simon this has huge accessibility implications, hopefully I'll have some time to test this out from a screen reader perspective over the coming weekend.

Jed Schmidt

@simon that is amazing! reminds me of when @fgnass golfed a sudoku solver into a tweet... he didn't have enough bytes to return the result so he stopped the recursive iteration by throwing: youtu.be/JsAetmgJRss?t=1467

Simon Willison

You can solve sudokus in python packaging.

Not not python code, python packages:
```
[project]
name = "sudoku"
version = "1.0.0"
dependencies = [
"sudoku_3_1 == 2",
"sudoku_5_7 == 6",
"sudoku_0_7 == 5"
...
]
```
and the installed packages are the solution.

github.com/konstin/sudoku-in-p

Lucas Werkmeister

@konstin konsti stop doing cursed python things challenge (impossible)

Frost Ming

@konstin This is not a new idea
github.com/frostming/sodoku-so
In this repo, I referenced the original article.

Simon Willison

I exported my Claude data to poke around with it and found out I've become a HEAVY user of Claude Artifacts - the feature that lets Claude build a full interactive HTML+JavaScript tool for you based on your prompts. I built 14 (somewhat) useful things with it in just the past week!

Here's a post describing them all simonwillison.net/2024/Oct/21/

Simon Willison

Full list of artifacts from the past week


    URL to Markdown with Jina Reader
    SQLite in WASM demo
    Extract URLs
    Clipboard viewer
    Pyodide REPL
    Photo Camera Settings Simulator
    LLM pricing calculator
    YAML to JSON converter
    OpenAI Audio
    QR Code Decoder
    Image converter and downloader
    HTML Entity Escaper
    text-wrap-balance-nav
    ARES Phonetic Alphabet Converter
Jeff Triplett

@simon I haven't played as much with Canvas but like I told you at DCUS, Claude Projects can't be unseen once you use it.

I saw your app last night, but I'm going to carve out some time to try it out. I have 1000s of chats and 100s of projects driven from it. I know it's not a true sync but I suspect I'll get a ton of good out of it.

Simon Willison

A year ago Mastodon 4.2 expanded the HTML filter to allow a whole bunch of additional tags... anyone know of any good examples of accounts that are using that bounty of additional formatting options? docs.joinmastodon.org/spec/act

Since Mastodon v4.2, the following elements and attributes are supported:

    &lt;p&gt;
    &lt;span&gt; ( class )
    &lt;br&gt;
    &lt;a&gt; ( href , rel , class )
    &lt;del&gt;
    &lt;pre&gt;
    &lt;code&gt;
    &lt;em&gt;
    &lt;strong&gt;
    &lt;b&gt;
    &lt;i&gt;
    &lt;u&gt;
    &lt;ul&gt;
    &lt;ol&gt; ( start , reversed )
    &lt;li&gt; ( value )
    &lt;blockquote&gt;
    headings will be converted to &lt;strong&gt; and then wrapped in &lt;p&gt;
Simon Willison

I really like Drew's framework here dividing current AI use-cases into Gods (human replacement, which I think of as still mostly science fiction), Interns (assistants you delegate closely-reviewed tasks to, which is most of how I use LLMs today) and Cogs (smaller tools that can more reliably serve a single purpose, like Whisper for transcription) - more of my own notes on this here: simonwillison.net/2024/Oct/20/
note.computer/@dbreunig/113330

Matt Campbell

@simon For the "gods" category, also check out @forrestbrazeal's excellent song "AGI (Artificial God Incarnate)": youtube.com/watch?v=1ZhhO7MGkn

Jeff Triplett

@simon I have been struggling with terminology, so this is useful. That said, I'm not a fan of "interns" used like this. The context that you used it in felt more appropriate than a whole class of AI terminology that literally means to replace a useful class of workers and learning.

I have personally struggled with the term Agents for lack of a framework or way to use them outside of running a Python script.

Simon Willison

The cold open from Abbott Elementary Season 4 Episode 2 (Ringworm) - the one with the PTA meeting - is already iconic

Simon Willison

This is really neat from Terence Eden: you can use text-wrap: balance for icons as well as text shkspr.mobi/blog/2024/10/you-c

Inspired by this I put together an interactive prototype using it for text-based navigation items, seems to work well for that too!

Try the prototype here: tools.simonwillison.net/text-w

More notes: simonwillison.net/2024/Oct/20/

mastodon.social/@Edent/1133395

This is really neat from Terence Eden: you can use text-wrap: balance for icons as well as text shkspr.mobi/blog/2024/10/you-c

Inspired by this I put together an interactive prototype using it for text-based navigation items, seems to work well for that too!

George Lund

@simon the result on Firefox mobile is a slightly distressing amount of horizontal scroll for some reason, which rather spoils the effect!

Simon Willison

I finally managed to get the Llama 3.2 and Phi 3.5 vision models to run on my M2 Mac laptop, using the mistral.rs Rust library and its CLI Tool and Python bindings simonwillison.net/2024/Oct/19/

Simon Willison

Here's what I got from Llama 3.2 11B for this photo I took at the Pioneer Memorial Museum in Salt Lake City niche-museums.com/111

"describe this image including any text"

Photos of the outside of the museum. The signs are visible, but none of them say that it is in Salt Lake City in Utah.
The image shows the exterior of the Daughters of Utah Pioneers (DUP) Museum, which is also known as the Pioneer Memorial Museum. The building is located in Salt Lake City, Utah, USA.

Visible in the image are:

The DUP Museum Building: The main structure of the museum is a prominent feature in the image. It is a two-story building with a rectangular shape and a flat roof. The exterior walls are made of brick or stone, giving it a classic and historic appearance.

Signage: Above the entrance, there is a sign that reads &quot;Daughters of Utah Pioneers Museum&quot; in bold letters. Below this sign, another sign indicates that the building is also known as the &quot;Pioneer Memorial Museum.&quot;

Entrance: The main entrance to the museum is marked by a doorway with a decorative frame and a pair of large wooden doors.

Landscaping: The surrounding area around the museum appears well-maintained with neatly trimmed grass and some trees nearby.

Statue or Monument: In front of the museum, there is a small statue or monument that appears to be dedicated to pioneers or early settlers.

The Daughters of Utah Pioneers (DUP) Museum showcases historical artifacts and exhibits related to Utah&#39;s pioneer history, focusing on the experiences and contributions of early settlers in the region.
Florian Idelberger

@simon thanks for this! I had some issues however replicating this, where on an M3 max it always crashes. (Plus also annoying that it also crashes or errors if it cannot find an image. There is a PR to fix that, but it's not merged yet) Like even on the M3 MAX, as the in-situ quantization is done on one core, it takes a while... have you experienced one or all of these?

Go Up