New TIL: Using sqlite-vec with embeddings in sqlite-utils and Datasette https://til.simonwillison.net/sqlite/sqlite-vec
New TIL: Using sqlite-vec with embeddings in sqlite-utils and Datasette https://til.simonwillison.net/sqlite/sqlite-vec This 404 Media piece definitively answers the question about where all of the weird Jesus shrimp AI generated image slop on Facebook comes from, and it’s fascinating: https://www.404media.co/where-facebooks-ai-slop-comes-from/ A few of my own notes here: https://simonwillison.net/2024/Aug/10/where-facebooks-ai-slop-comes-from/ @simon I believe you invented the term “slop” right 😁 my question is, if “slop” is “unwanted AI-generated content” why is this called “slop” when it’s obviously getting a ton of engagement? Is the claim that all the likes are bots? If it’s actually hundreds of thousands of people liking AI-generated images then isn’t it … not unwanted? I spent some time reading the newly released GPT-4o System Card - it's a fascinating document, with all kinds of interesting new-to-me details in there. I've posted my highlights here: https://simonwillison.net/2024/Aug/8/gpt-4o-system-card/ I particularly enjoyed this bit about "scheming" @simon "...it is unlikely that GPT-4o is capable of catastrophic scheming." So either it's bad at it or really good at it. Google announced a major price drop for their Gemini 1.5 Flash model today - it's now the cheapest of the mainstream cheap-and-fast models, and it can also handle PDF files, audio and video as well as images and text. More notes here: https://simonwillison.net/2024/Aug/8/gemini-15-flash-price-drop/ @simon
Show previous comments
I've released a new reusable Django app - django-http-debug - which makes it easy to quickly setup a debugging HTTP endpoint that returns a canned response and logs full details of any incoming requests, great for the initial stages of implementing things like OAuth or incoming webhooks. Most of the code was written for me by Claude 3.5 Sonnet - full details here: https://simonwillison.net/2024/Aug/8/django-http-debug/ As part of working on this I figured out (with more help from Claude) a good pattern for writing automated tests for a reusable Django app like this that can live in the same repository and spin up a minimal Django project, just enough for the tests to run. I wrote that up in detail as this TIL: https://til.simonwillison.net/django/pytest-django @simon I guess I don't understand how this is an attack. The malicious prompt came from the attacker, but so did everything else. So the attacker already has access to the "exfiltrated" data, right? Or is there some missing context here? https://minnesotareformer.com/2024/08/06/former-geography-teacher-tim-walz-is-really-into-maps/ > Walz also says maps help implement the nitty-gritty details of otherwise abstract policy. “You have to have a plan” for “how we’re doing power and economic justice and environmental justice,” he said. “The tools for that plan are GIS.” Those tools “transfer a vision of a fair society into one that actually has results.” OK, turns out Tim Walz is a former geography teacher and a GIS and data visualization nerd https://minnesotareformer.com/2024/08/06/former-geography-teacher-tim-walz-is-really-into-maps/ > Walz also says maps help implement the nitty-gritty details of otherwise abstract policy. “You have to have a plan” for “how we’re doing power and economic justice and environmental justice,” he said. “The tools for that plan are GIS.” Those tools “transfer a vision of a fair society into one that actually has results.” OpenAI announced "Structured Outputs in the API" today, which is a lot more exciting than it sounds and also came with a huge price drop (50% cheaper) for a new GPT-4o model - which makes it their new cheapest model for image inputs, since GPT-4o-mini is priced the same for image inputs as the previous, more expensive GPT-4o My detailed notes here: https://simonwillison.net/2024/Aug/6/openai-structured-outputs/ @simon I did quite a bit of testing tonight I was surprised that structured outputs also worked with GPT-4o-mini. I didn’t crunch the numbers but I was already using it to parse text to json and I noticed an immediate improvement. I assumed we had to use the new model for pydantic support but it just worked. Weeknotes: a staging environment, a Datasette alpha and a bunch of new LLMs https://simonwillison.net/2024/Aug/6/staging/ every now and then i feel like im taking crazy pills because i remember when aaron swartz killed himself because he was going to go to jail forever because he scraped JSTOR, and eleven years later your manager tells you “sshhhh it’s fine just scrape all of it don’t worry the CEO said it’s fine” @simon I genuinely worry about big AI companies becoming chaos agents. Also do you think they are paying YouTube for that kind of access, or paying folks like youtube-downloader?
Show previous comments
Several of the major social media platforms - Instagram, TikTok, LinkedIn, Twitter - have effectively declared war on linking to things and I absolutely hate it "Link in my bio" / "Link in thread" / "Link in first comment"... or increasingly no link at all, just an unsourced screenshot of a page
Show previous comments
@simon Email too. The more links in your email the more likely they are to put it in a spam folder. Here's a brilliant neologism: "slop", for text generated entirely by LLMs and published, unwanted, on the Internet > Watching in real time as "slop" becomes a term of art. the way that "spam" became the term for unwanted emails, "slop" is going in the dictionary as the term for unwanted AI generated content Source: https://twitter.com/deepfates/status/1787472784106639418
Show previous comments
@simon @troublewithwords like “spam,” “slop” has that p-sound so you can really spit the word out when you’re mad. I built a new tool: https://tools.simonwillison.net/ocr - it runs OCR against images and PDFs entirely in your browser (no file upload needed) using Tesseract.js and PDF.js I wrote more about the tool and how I built it (with copious amounts of Claude 3 Opus and a little bit of ChatGPT) here: https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/
Show previous comments
@simon not sure if this is of interest to @jbaiter , cf. https://openbiblio.social/@jbaiter/110815957206638047 I wrote about the AI trust crisis: when companies like Dropbox and OpenAI say "we won't train models or your private data", it's increasingly clear that a lot of people simply don't believe them. New release of shot-scraper, my CLI tool for taking screenshots of web pages (and scraping them with JavaScript) https://github.com/simonw/shot-scraper/releases/tag/1.3 I wrote some notes about DALL-E 3, including reverse engineering some aspects of how it works. It's a fascinating insight into the prompt engineering that happens inside of OpenAI **Now add a walrus: Prompt engineering in DALL-E 3** Published a short TIL about the very simple 2x2 CSS grid layout I used to display the images in that post https://til.simonwillison.net/css/simple-two-column-grid |