Email or username:

Password:

Forgot your password?
248 posts total
Simon Willison

I still don't like Accept headers

I just found out if you hit pypi.org/simple/pydantic/ with "Accept: application/vnd.pypi.simple.v1+json" you get back super-useful JSON about that package... but I can't link to a demo because I can't include the Accept header in a link!

Here's a Gist instead: gist.github.com/simonw/8cf8a85

Show previous comments
Stefan Eissing

@simon http content-negotiation sucks. It always gets in the way.

Many thanks to the github people who showed how to do good url design.

danbri

@simon yeah the RDF / Linked Data community somehow got itself addicted to content negotiation as a favoured way to publish data views. Horrible!

Simon Willison

Posted some notes on the new PyPI digital attestations feature released today, providing digital signatures that help demonstrate that the package you are downloading from PyPI was built from a specific version of the underlying code on GitHub simonwillison.net/2024/Nov/14/

Hugo 雨果

@simon I understand what this does, but I don’t understand the value of it. It provides validation that the build happened on MS’s server and that they used used a specific checkout. But if builds are not reproducible (eg: use unchecksumed external resources), this guarantees nothing. If builds are properly reproducible, what value does the attestation add?

Simon Willison

Shout out to @simon ’s shot-scraper for grabbing web browser screenshots from the command line.

I redid dozens of screenshots for the upcoming book update, and shot-scraper made it way easier than my previous approach with Firefox’s screenshot tool.

shot-scraper.datasette.io/en/s

Simon Willison

On the one hand, I'm very sympathetic to the argument that "AI" is an over-hyped buzzword that is rapidly losing all meaning, if it ever had any (beyond being a combination of science-fiction and an academic discipline from the 1950s)

On the other hand though, I'm building a feature where LLMs help a user build a SQL query using an English-language question and I need to decide what label to put on that button, it's hard to come up with anything that's as clear as "Use AI to write this query"

Show previous comments
Petr Viktorin

@simon “Generate” (or “autogenerate”)?
To the end user it doesn't matter that the heuristics use many floats rather than few ints/pointers. Traditional autocorrect/autocomplete (and other tools that deal with human language) are also often wrong. If the engine actually works, no one should care what's inside.

postweber

@simon Grafana has graphical query builders for some languages. You could have an SQL/Plain English toggle.

Simon Willison

Thanks to the combo of Ollama and the llm-ollama plugin you can now run Meta's Llama 3.2 Vision image model (7.9GB) on a Mac and use it to run prompts against images simonwillison.net/2024/Nov/13/

If you have Ollama installed you can fetch the 11B model (7.9 GB) like this:

ollama pull llama3.2-vision

Or the larger 90B model (55GB) like this:

ollama pull llama3.2-vision:90b

I was delighted to learn that Sukhbinder Singh had already contributed support for LLM attachments to Sergey Alexandrov's llm-ollama plugin, which means the following works once you've pulled the models:

llm install --upgrade llm-ollama
llm -m llama3.2-vision:latest 'describe' \
  -a https://static.simonwillison.net/static/2024/pelican.jpg
A photograph of a California Brown Pelican in a harbor
$ llm -m llama3.2-vision:latest 'describe' \
  -a https://static.simonwillison.net/static/2024/pelican.jpg 
This image features a brown pelican standing on rocks, facing the camera and positioned to the left of center. The bird's long beak is a light brown color with a darker tip, while its white neck is adorned with gray feathers that continue down to its body. Its legs are also gray.

In the background, out-of-focus boats and water are visible, providing context for the pelican's environment.
Jan

@simon Curious how you’re running Ollama - is it just in your laptop or you have some beefy server running it?

Jeff Triplett

@simon the 90B (55GB) might confuse people.

You do need ~88GB of RAM, not counting your context window, just to run the 90B model size. So 128 GB of RAM, or else you are going to get 1 token per 30 to 45 seconds or more of output while everything swaps around.

That small model is going to run very, very well on any M-series Mac with enough RAM.

Simon Willison

Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code
simonwillison.net/2024/Nov/12/

Show previous comments
Stefano Pacifico 🧬 🇺🇦

@simon besides offline use and additionally privacy, did you detect any other advantage running locally?

Drew Breunig

@simon Did you notice a speed difference between mlx and ollama?

balloob

@simon qwen is amazing. It’s the best performing local model in the Home Assistant AI benchmarks. github.com/allenporter/home-as

Simon Willison

The more experience I gain as a software developer the less tolerance I have for the idea that something doesn't need documenting if you can go and read the source code instead

(That's despite getting much, much better at reading source code to answer my own questions as I gain experience)

Show previous comments
Marty Fouts

@simon The thing that makes me sad about this discussion is that it has gone on since the late 1950s; there have not been any new arguments in 30 years and the only good explanation was Knuth’s Literate Programming book but nobody ever really understood it.

SpaceLifeForm

@simon

If you do not have at least as many comments in your source code as the actual compiliable source code, you are making a mistake.

Simon Willison

I should clarify: when I talk about documentation here I'm not talking about code comment style docs - I'm talking about "this is how to use this library / API" docs

If your code is clearly written and nothing else ever needs to call it then I don't particularly mind if there's no additional documentation - but if I'm expected to use call your library from my own code I'm very much not keen on being told I have to read all of that code myself just to use it!

Simon Willison

Like a lot of people I'm really concerned about what the incoming regime is going to do, so here's one small way I'm trying to help: jacobian.org/2024/nov/11/digit

Simon Willison

Jamelle Bouie's TikTok account is one of my favorite sources of political commentary right now - he's a columnist for the New York Times who has basically perfected the very different art of TikTok

Here's his latest, about tariffs and domestic supply chains

tiktok.com/@jamellebouie/video

Simon Willison

There are many “what should we do next” thinkpieces, but this one is mine.

If you want an abstract summary, the idea is “we need to run a year-round parallel campaign apparatus that just introduces people to progressive ideas by making their lives better in whatever ways we can”.

That is a staggeringly huge project and if it does even happen, I can only be a tiny part of it, so I will need your help. Contact info is at the end of the blog post.

blog.glyph.im/2024/11/its-time

There are many “what should we do next” thinkpieces, but this one is mine.

If you want an abstract summary, the idea is “we need to run a year-round parallel campaign apparatus that just introduces people to progressive ideas by making their lives better in whatever ways we can”.

That is a staggeringly huge project and if it does even happen, I can only be a tiny part of it, so I will need your help. Contact info is at the end of the blog post.

64 Islands Airship Co-op

@glyph i remember saying this in 2001. it’s not wrong!

Glyph

my other idea that I have _zero_ chance of actually making happen is that Crooked Media or someone like them needs to poach Matt Levine from Bloomberg and stand up a full-fledged competitor to CNBC where they cover economic and market issues from a progressive standpoint. If everyone gets all their financial and economic news from conservatives of course the population is going to keep trusting conservatives on the economy

M. Treasurer commandasaurus 🦖

@glyph I'm down. Building bridges across my experiences, communities, and identities is absolutely the plan.

Simon Willison

Got distracted digging around in the belly of the MDN browser compatibility tables, and found out their API is served with access-control-allow-origin: *... so now I've built my own little browser support timeline viewer tool! tools.simonwillison.net/mdn-ti

More details here: simonwillison.net/2024/Nov/11/

Show previous comments
Magyk

@simon
It seems that the back/forward navigation doesn't allow exiting from the page using the back button
(tested on Firefox Android)

Stuart Knightley

@simon you might also be interested in bcd-watch.igalia.com to get updates when new APIs get browser support (based on the same MDN data you’re using!)

Rachel Andrew

@simon You might be interested in what we're building over at webstatus.dev/ (which does have an API, though it needs docs). You can already do some pretty interesting queries.

Simon Willison

I want to enable comments on my blog again, but (I'm current possibly overthinking things in that) I'm worrying if I need a privacy policy, or how I should think about things like GDPR, and should users be able to delete their comments?

Never thought about this stuff for a second back in the 2000s!

Show previous comments
Jeff Triplett

@simon If you go the federated route, I like how these daily prompts work. See ,kmcd.dev/posts/daily-prompts/ for details and then click on the /prompts section to see them in action. (there is pretty low engagement)

That said, I saw your post about using GitHub Auth and that's what I default to these days. The stakes are higher for not being a jerk plus you have GH's moderations rules/team should you need to have to report someone.

Hope

@simon Although I like to call it a statement, because it's something I believe in, not just something I found a form to generate for me.

Simon Willison

Saw dolphins in Half Moon Bay on Thursday!

(I'm pretty sure this is a dolphin and not a porpoise, I think porpoises are smaller)

Ryan Hiebert

@simon TIL I learned that they aren't the same thing.

Simon Willison

Kicking off our first ever Discord chat + video + live demos Datasette Public Office Hours in 10 minutes time, details here:
simonwillison.net/2024/Nov/7/d

Simon Willison

Here are detailed notes from our public office hours, showing how myself and @alexgarciaxyz imported San Mateo County election results into Datasette, cleaned them up and then used them to build geospatial visualizations in an @observablehq notebook simonwillison.net/2024/Nov/9/v

Simon Willison

Wrote up a few notes on trying out ChainForge, a Yahoo-Pipes-style "visual programming" tool for evaluating prompts against different LLMs simonwillison.net/2024/Nov/8/c

Simon Willison

I am finding myself turning to gpt-4o-mini a whole lot more since they added prompt caching last month - where you get an automated 50% discount if you send the same tokens twice or more

It is fantastic for use-cases like answering questions about a medium sized codebase

Sviatoslav Abakumov

@simon Curious, how do you feed the whole codebase into the context?

Simon Willison

Hearing about the death of June Spencer at the age of 105 - who was in the Archers from 1950 to 2022 - made me curious as to the world record for longest time playing the same character

Depends on how you count: June started in 1950 but took some breaks for family, while Patricia Greene's Jill Archer started in 1957 and has a 67 year uninterrupted run

en.m.wikipedia.org/wiki/List_o

Table showing longest-serving soap opera actors: 

Patricia Greene as Jill Archer (The Archers, 67 years), June Spencer as Peggy Woolley/Rita Flynn (The Archers, 66 years), William Roache as Ken Barlow (Coronation Street, 63 years), Ludmiła Łączyńska as Wisia Matysiakowa (Matysiakowie, 62 years), and Lesley Saweard as Christine Barford (The Archers, 60 years).
Paul Bowsher

@simon Jill is starting to sound all of those 67 years :(

Simon Willison

A thing I have learned about voting in US elections in California is that you should do vote by mail, but be sure to return the ballot before 1st November

If you instead of drop off your pre-filled ballot on 5th of November it doesn't get counted for several more days (still waiting here)

Simon Willison

This is particularly frustrating when one of your hyper-local elections only attracts a few thousand votes total and two of the candidates are within 100 votes of each other!

Marcello Bastéa-Forte

@simon I did that and it got counted early this morning

Simon Willison

I interviewed Rajiv Sinclair about his team's new project, VERDAD - an outstanding piece of data journalism that tracks 48 US talk radio stations (many in Spanish), archives their audio, transcribes it and uses Gemini 1.5 to help identify potential snippets of misinformation - then presents the results in a UI for human review

simonwillison.net/2024/Nov/7/p

Simon Willison

I'm hoping to turn this into a series of YouTube interviews with people building cool data projects where we nerd out about what they've built and how they built it, so I'm optimistically thinking of this as episode one! youtube.com/watch?v=t_S-loWDGE

Jay Nakrani

@simon That is a superb use of LLMs. I've seen a lot of text-classification tasks (that previously required expensive model training) can now be done rather cheaply using LLMs + engineered prompts. Cost and development velocity has improved quite a bit with this new LLM-as-rater approach compared to previous approaches of custom-model-training.

The next bottleneck is human evals, but I guess we can't completely remove them until LLMs stop making mistakes.

Go Up