Simon's wall

Simon's posts Back to profile

120 posts total

New TIL: Using sqlite-vec with embeddings in sqlite-utils and Datasette https://til.simonwillison.net/sqlite/sqlite-vec

Like 11 August at 23:42 | Open on fedi.simonwillison.net

10 August at 17:20 on post Boring technical question which neither the documentation nor ChatGPT can answer. I want to automate...

OK gang! This is ready for you to test.

python tweet2html.py --css 1234

That will take a Twitter ID and return HTML and CSS for you to embed in your website with no calling back to Twitter.

Features:
🗣 Avatars inlined as Base64 WebP
📸 All attached photos inlined
🎥 Video poster inline, <video> to original source
🔗 Hyperlinks don't use t.co
#️⃣ Hashtags and @ mentions linked
🕰 Semantic time
♥ and 🗨 counts

Try it out at https://github.com/edent/Tweet2Embed

Feedback and pull requests very welcome!

OK gang! This is ready for you to test.

python tweet2html.py --css 1234

That will take a Twitter ID and return HTML and CSS for you to embed in your website with no calling back to Twitter.

Features:
🗣 Avatars inlined as Base64 WebP
📸 All attached photos inlined
🎥 Video poster inline, <video> to original source
🔗 Hyperlinks don't use t.co
#️⃣ Hashtags and @ mentions linked
🕰 Semantic time
♥ and 🗨 counts

Expand text...

Like 10 August at 17:47 | Open on fedi.simonwillison.net

This 404 Media piece definitively answers the question about where all of the weird Jesus shrimp AI generated image slop on Facebook comes from, and it’s fascinating: https://www.404media.co/where-facebooks-ai-slop-comes-from/

A few of my own notes here: https://simonwillison.net/2024/Aug/10/where-facebooks-ai-slop-comes-from/

Like 10 August at 6:52 | Open on fedi.simonwillison.net

22

@simon I believe you invented the term “slop” right 😁 my question is, if “slop” is “unwanted AI-generated content” why is this called “slop” when it’s obviously getting a ton of engagement? Is the claim that all the likes are bots? If it’s actually hundreds of thousands of people liking AI-generated images then isn’t it … not unwanted?

10 August at 16:11 | Open on sfba.social

I spent some time reading the newly released GPT-4o System Card - it's a fascinating document, with all kinds of interesting new-to-me details in there. I've posted my highlights here: https://simonwillison.net/2024/Aug/8/gpt-4o-system-card/

I particularly enjoyed this bit about "scheming"

Like 9 August at 0:10 | Open on fedi.simonwillison.net

Lafncow :blobcatcoffee:

@simon "...it is unlikely that GPT-4o is capable of catastrophic scheming."

So either it's bad at it or really good at it.

9 August at 0:49 | Open on mastodon.social

Mans R

@simon Do they actually believe that stuff?

9 August at 7:31 | Open on society.oftrolls.com

Google announced a major price drop for their Gemini 1.5 Flash model today - it's now the cheapest of the mainstream cheap-and-fast models, and it can also handle PDF files, audio and video as well as images and text.

More notes here: https://simonwillison.net/2024/Aug/8/gemini-15-flash-price-drop/

Like 8 August at 22:46 | Open on fedi.simonwillison.net

@simon
With all these price drops I'm waiting until they start paying me to use their models. 😜

8 August at 22:57 | Open on fosstodon.org

I got fed up of Claude's lack of a feature to export and share a full conversation, so I built a new tool for doing that using an Observable notebook - details here: https://simonwillison.net/2024/Aug/8/convert-claude-json-to-markdown/

Here's an example shared transcript, in which I start by asking for advice on breeding spiders to catch flies and then keep on escalating until Claude is frantically advising me not to attract any bears to deal with the mountain lions that have surrounded my house: https://gist.github.com/simonw/95abdfa3cdf755dbe6feb5ec4e3029f4

I got fed up of Claude's lack of a feature to export and share a full conversation, so I built a new tool for doing that using an Observable notebook - details here: https://simonwillison.net/2024/Aug/8/convert-claude-json-to-markdown/

Here's an example shared transcript, in which I start by asking for advice on breeding spiders to catch flies and then keep on escalating until Claude is frantically advising me not to attract any bears to deal with the mountain lions that have surrounded my house:

Expand text...

Like 8 August at 20:47 | Open on fedi.simonwillison.net

Show previous comments

"Dancer" Graham Knapp

@simon thank you this is hilarious 😂

8 August at 21:03 | Open on hachyderm.io

Spencer

@simon This conversation is giving me strong "there was an old lady who swallowed a fly" vibes

8 August at 21:09 | Open on mastodon.social

Tim

@simon that was hilarious, thanks!

8 August at 21:25 | Open on mastodon.xyz

I've released a new reusable Django app - django-http-debug - which makes it easy to quickly setup a debugging HTTP endpoint that returns a canned response and logs full details of any incoming requests, great for the initial stages of implementing things like OAuth or incoming webhooks.

Most of the code was written for me by Claude 3.5 Sonnet - full details here: https://simonwillison.net/2024/Aug/8/django-http-debug/

Like 8 August at 15:32 | Open on fedi.simonwillison.net

As part of working on this I figured out (with more help from Claude) a good pattern for writing automated tests for a reusable Django app like this that can live in the same repository and spin up a minimal Django project, just enough for the tests to run. I wrote that up in detail as this TIL: https://til.simonwillison.net/django/pytest-django

8 August at 15:35 | Open on fedi.simonwillison.net

Glyph

@simon does this level of LLM “authorship” give you concerns about its provenance?

8 August at 17:17 | Open on mastodon.social

antrix

@simon do you use any IDE integrations to work with LLMs?

9 August at 3:42 | Open on mastodon.social

Your regular reminder to never build a LLM-based chat interface with access to privileged information that can render Markdown images targetting external domains, if you don't want a prompt injection attack to be able to instantly exfiltrate that private data

Today's example is Google AI Studio: https://simonwillison.net/2024/Aug/7/google-ai-studio-data-exfiltration-demo/

It joins ChatGPT, Google Bard, writer.com, Amazon Q, Google NotebookLM and GitHub Copilot Chat in my collection of products that have made this mistake: https://simonwillison.net/tags/markdown-exfiltration/

Your regular reminder to never build a LLM-based chat interface with access to privileged information that can render Markdown images targetting external domains, if you don't want a prompt injection attack to be able to instantly exfiltrate that private data

Today's example is Google AI Studio: https://simonwillison.net/2024/Aug/7/google-ai-studio-data-exfiltration-demo/

Expand text...

Like 7 August at 17:09 | Open on fedi.simonwillison.net

@simon I guess I don't understand how this is an attack. The malicious prompt came from the attacker, but so did everything else. So the attacker already has access to the "exfiltrated" data, right?

Or is there some missing context here?

11 August at 16:01 | Open on kosmos.social

Steven Zekowski

7 August at 1:16 on post OK, turns out Tim Walz is a former geography teacher and a GIS and data visualization nerd...

https://minnesotareformer.com/2024/08/06/former-geography-teacher-tim-walz-is-really-into-maps/

> Walz also says maps help implement the nitty-gritty details of otherwise abstract policy. “You have to have a plan” for “how we’re doing power and economic justice and environmental justice,” he said. “The tools for that plan are GIS.” Those tools “transfer a vision of a fair society into one that actually has results.”
#HarrisWalz2024

Like 7 August at 1:26 | Open on fedi.simonwillison.net

OK, turns out Tim Walz is a former geography teacher and a GIS and data visualization nerd
https://mastodon.social/@kjhealy/112915624921601732

Like 6 August at 23:25 | Open on fedi.simonwillison.net

@simon So, Walz knows Where, and Harris knows Venn.

6 August at 23:30 | Open on m.phase.org

@simon @kjhealy Do we know his opinions on the Mercator projection?

7 August at 0:26 | Open on hachyderm.io

Steven Zekowski

https://minnesotareformer.com/2024/08/06/former-geography-teacher-tim-walz-is-really-into-maps/

> Walz also says maps help implement the nitty-gritty details of otherwise abstract policy. “You have to have a plan” for “how we’re doing power and economic justice and environmental justice,” he said. “The tools for that plan are GIS.” Those tools “transfer a vision of a fair society into one that actually has results.”
#HarrisWalz2024

7 August at 1:16 | Open on freeradical.zone

OpenAI announced "Structured Outputs in the API" today, which is a lot more exciting than it sounds and also came with a huge price drop (50% cheaper) for a new GPT-4o model - which makes it their new cheapest model for image inputs, since GPT-4o-mini is priced the same for image inputs as the previous, more expensive GPT-4o

My detailed notes here: https://simonwillison.net/2024/Aug/6/openai-structured-outputs/

Like 6 August at 18:35 | Open on fedi.simonwillison.net

@simon I did quite a bit of testing tonight I was surprised that structured outputs also worked with GPT-4o-mini. I didn’t crunch the numbers but I was already using it to parse text to json and I noticed an immediate improvement. I assumed we had to use the new model for pydantic support but it just worked.

8 August at 5:13 | Open on mastodon.social

Weeknotes: a staging environment, a Datasette alpha and a bunch of new LLMs https://simonwillison.net/2024/Aug/6/staging/

Like 6 August at 15:43 | Open on fedi.simonwillison.net

Fascinating report from 404 Media's Samantha Cole on a trove of leaked NVIDIA Slack messages and emails about how they're scraping millions of YouTube videos to train their own new foundation video generation model: https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/

Posted a few of my own notes here: https://simonwillison.net/2024/Aug/5/nvidia-scraping-videos/

It's not surprising to learn that they're doing this - that's practically the industry standard right now - but is still really interesting to see internal details of what they're collecting and why

Fascinating report from 404 Media's Samantha Cole on a trove of leaked NVIDIA Slack messages and emails about how they're scraping millions of YouTube videos to train their own new foundation video generation model: https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/

Posted a few of my own notes here: https://simonwillison.net/2024/Aug/5/nvidia-scraping-videos/

Expand text...

Like 5 August at 17:29 | Open on fedi.simonwillison.net

phillmv

every now and then i feel like im taking crazy pills because i remember when aaron swartz killed himself because he was going to go to jail forever because he scraped JSTOR,

and eleven years later your manager tells you “sshhhh it’s fine just scrape all of it don’t worry the CEO said it’s fine”

5 August at 17:43 | Open on hachyderm.io

@simon
Do you think Nvidia might become more chaos focused in future?

I genuinely worry about big AI companies becoming chaos agents.

Also do you think they are paying YouTube for that kind of access, or paying folks like youtube-downloader?

14 August at 17:42 | Open on phpc.social

It turns out Google Chrome ships a default, hidden extension that allows code on `*.google.com` access to private APIs, including your current CPU usage

You can test it out by pasting the following into your Chrome DevTools console on any Google page:

chrome.runtime.sendMessage(
"nkeimhogjdpnpccoofpliimaahmaaome",
{ method: "cpu.getInfo" },
(response) => {
console.log(JSON.stringify(response, null, 2));
},
);

More notes here: https://simonwillison.net/2024/Jul/9/hangout_servicesthunkjs/

It turns out Google Chrome ships a default, hidden extension that allows code on `*.google.com` access to private APIs, including your current CPU usage

You can test it out by pasting the following into your Chrome DevTools console on any Google page:

chrome.runtime.sendMessage(
"nkeimhogjdpnpccoofpliimaahmaaome",
{ method: "cpu.getInfo" },
(response) => {
console.log(JSON.stringify(response, null, 2));
},
);

Expand text...

Like 9 July at 17:54 | Open on fedi.simonwillison.net

Show previous comments

mirabilos

@simon @Shamar did you really have to point out that you used a theft machine (“AI”) to write that short JS snippet?

Ah. From your profile, you’re a promoter of these theft machines. Byebye…

11 July at 23:55 | Open on toot.mirbsd.org

Tann

@simon friendly reminder that you need root access to fully remove Google from many android phones and tablets and that root access generally voids your warranty. That said, most warranties don't last longer than a couple years so if you've had your phone for 2 or more years then you likely have little to lose by ripping your *.google.com applications out and replacing them with much more secure applications.

If you don't want to do that, the paid version of #netguard can at least lock down your phone's network traffic app by app and web address by web address.

@simon friendly reminder that you need root access to fully remove Google from many android phones and tablets and that root access generally voids your warranty. That said, most warranties don't last longer than a couple years so if you've had your phone for 2 or more years then you likely have little to lose by ripping your *.google.com applications out and replacing them with much more secure applications.

Expand text...

13 July at 23:34 | Open on techhub.social

Mitex Leo

@simon @vivaldi

14 July at 2:46 | Open on social.mitexleo.one

Several of the major social media platforms - Instagram, TikTok, LinkedIn, Twitter - have effectively declared war on linking to things and I absolutely hate it

"Link in my bio" / "Link in thread" / "Link in first comment"... or increasingly no link at all, just an unsourced screenshot of a page

Like 12 May at 14:07 | Open on fedi.simonwillison.net

Show previous comments

Kat the Leopardess

@simon Thats why I cant listen to people make the opposite talking point about decentralized or alternative social media. People wanna stick to FB/Meta and the big platforms bc they need the exposure and consider things like the Fedisphere "too complicated"

....and yet you obviously are going to have to get more complicated in how you continue to get that exposure on places like Meta/Ig...by having to do workarounds for the link blocking.

Like, is that complacency really that much better and easier?

@simon Thats why I cant listen to people make the opposite talking point about decentralized or alternative social media. People wanna stick to FB/Meta and the big platforms bc they need the exposure and consider things like the Fedisphere "too complicated"

....and yet you obviously are going to have to get more complicated in how you continue to get that exposure on places like Meta/Ig...by having to do workarounds for the link blocking.

Expand text...

12 May at 16:13 | Open on meow.social

@simon Email too. The more links in your email the more likely they are to put it in a spam folder.

12 May at 16:42 | Open on boing.world

tallship

A couple of things there Simon. How do you know this? Are you still using the deprecated, privacy mining, monolithic silos to which you refer, or are you just taking this on the word of good, 3rd party information sources?

If it's the former, why?

If it's the latter, then yippie kai yay! The demise of these platforms is underway and in full swing - just as Steve Ballmer once called Linux "Cancer", the fact that these silos aren't simply ignoring links to particular resources, especially those in the #Fediverse, and have taken up with the practice of actively blocking them, is a good thing; and you, as a #Fedizen, should be proud.

Yet again, if it is the former, and you really insist on validating and monetizing those privacy mining silos via your subjugation as inventoried chattel there, consider pinning something akin to the following to the top of your profile (make sure to read the alt-text for the image):

A couple of things there Simon. How do you know this? Are you still using the deprecated, privacy mining, monolithic silos to which you refer, or are you just taking this on the word of good, 3rd party information sources?

If it's the former, why?

If it's the latter, then yippie kai yay! The demise of these platforms is underway and in full swing - just as Steve Ballmer once called Linux "Cancer", the fact that these silos aren't simply ignoring links to particular resources, especially those in the

Expand text...

12 May at 20:15 | Open on public.mitra.social

Here's a brilliant neologism: "slop", for text generated entirely by LLMs and published, unwanted, on the Internet

> Watching in real time as "slop" becomes a term of art. the way that "spam" became the term for unwanted emails, "slop" is going in the dictionary as the term for unwanted AI generated content

Source: https://twitter.com/deepfates/status/1787472784106639418

Like 8 May at 0:16 | Open on fedi.simonwillison.net

Show previous comments

@simon @troublewithwords like “spam,” “slop” has that p-sound so you can really spit the word out when you’re mad.

8 May at 3:40 | Open on hackers.town

Adlangx

@simon trash fire.

8 May at 3:59 | Open on mastodon.social

⛈️ Information ⛈️

@simon I wish Fates would make the jump to Mastodon.

8 May at 5:20 | Open on mastodon.social

I built a new tool: https://tools.simonwillison.net/ocr - it runs OCR against images and PDFs entirely in your browser (no file upload needed) using Tesseract.js and PDF.js

I wrote more about the tool and how I built it (with copious amounts of Claude 3 Opus and a little bit of ChatGPT) here: https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

Like 30 March at 18:04 | Open on fedi.simonwillison.net

Show previous comments

gabi

@simon Insanely cool! It works fine in Android Chrome (no luck with Firefox though).

30 March at 18:46 | Open on gabi.is

@simon Damn, that's awesome! Queued for ResearchBuzz.

30 March at 18:47 | Open on researchbuzz.masto.host

Alexander Winkler

@simon not sure if this is of interest to @jbaiter , cf. https://openbiblio.social/@jbaiter/110815957206638047

30 March at 18:48 | Open on openbiblio.social

I wrote about the AI trust crisis: when companies like Dropbox and OpenAI say "we won't train models or your private data", it's increasingly clear that a lot of people simply don't believe them.
https://simonwillison.net/2023/Dec/14/ai-trust-crisis/

Like 14 Dec 2023 at 16:17 | Open on fedi.simonwillison.net

New release of shot-scraper, my CLI tool for taking screenshots of web pages (and scraping them with JavaScript) https://github.com/simonw/shot-scraper/releases/tag/1.3

Like 1 Nov 2023 at 22:22 | Open on fedi.simonwillison.net

I wrote some notes about DALL-E 3, including reverse engineering some aspects of how it works. It's a fascinating insight into the prompt engineering that happens inside of OpenAI

**Now add a walrus: Prompt engineering in DALL-E 3**
https://simonwillison.net/2023/Oct/26/add-a-walrus/

Like 26 Oct 2023 at 21:18 | Open on fedi.simonwillison.net

Published a short TIL about the very simple 2x2 CSS grid layout I used to display the images in that post https://til.simonwillison.net/css/simple-two-column-grid

27 Oct 2023 at 5:01 | Open on fedi.simonwillison.net