I keep finding new ways to entertain myself with ChatGPT advanced voice mode...
"I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me exclusively in Spanish"
I keep finding new ways to entertain myself with ChatGPT advanced voice mode... "I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me exclusively in Spanish" I spun up a new LLM benchmark: how well can they handle this prompt? Generate an SVG of a pelican riding a bicycle I find the results so far utterly delightful: https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/
Show previous comments
@simon saw this and thought of you https://bsky.app/profile/socalleslie.bsky.social/post/3l7ewtd4koe2x @simon How hard is it to process untrusted SVG data to strip out any potentially harmful tags or attributes (like stuff that might execute JavaScript)? I feel like this is well trodden ground for HTML these days, are there robust solutions for the SVG version of this problem?
Show previous comments
@simon based on the exoerience of people who tried to create a Python sandbox over the decades, I'd say it is pretty much impossible. (save for a browser saparayed as another page box: i.e. a "Frame") @simon CSP is probably a good second layer of security, no matter what you end up doing. @simon I wonder if you could do a similar approach as eg Figma used for their third party plugins: Make it all happen in a WASM script that’s sandboxed So that and render to a canvas? Constantly persisting everything in a web form to localstorage in case of browser, tab accidents is such a cheap and effective trick! I keep meaning to knock up a little bit of JavaScript for the Django Admin that does this automatically for every add form @simon We recently added that sort of feature to MediaWiki: https://mediawiki.org/wiki/Help:Edit_Recovery (indexedDB rather than localStorage though, because it can store more). @simon don't browsers do some level of form persistance if you use plain HTML forms? Anthropic's https://claude.ai/ grew a new feature today: an equivalent of OpenAI's ChatGPT Code Interpreter mode, where the chatbot can write and then execute code in order to help answer questions (e.g. to run calculations that are beyond a next-token-predicting LLM) OpenAI use server-side Python for this, but Anthropic instead chose to use client-side JavaScript running in a Web Worker. Here are my notes so far on the new feature: https://simonwillison.net/2024/Oct/24/claude-analysis-tool/
Show previous comments
@simon interesting choice. @simon would be cool to see Llama 3.2 1B or similar doing it right inside the browser 😄 @simon Interesting. Do you know (or does the code show) why the vis nodes for the dependencies are of different size? I finally have a procedure in place I like for hacking on Python CLI apps using a development environment managed by uv - full notes here:
Show previous comments
@simon Cool post! I am not sure the `tool.uv.dev-dependencies` is the same as, or interoperable with, PEP 735 though. @simon Extra tip: you can do uv tool install -e ., and now you can run demo-app directly, without the uv run prefix, in any directory. @simon Then, for scripts you run frequently, do you just add to your.bashrc `alias demo=“uv run python -m demo_app”`? My few python CLI apps each have a .venv directly, but it’s tiresome to have to juggle activations: `source ~/bin/demo/.env/activate && demo && deactivate’. 🫠 I know this should be obvious, but it still surprises me how much more fun my blog feels now that I've started habitually using images in my posts Here's two screenshots and a GIF demo from today's posts https://simonwillison.net/2024/Oct/23/ @simon it does make the posts more engaging. our little rat brains like stimulation I built a Bash script for running prompts with images or PDFs against the Google Gemini models - a prototype of how multi-modal support for my LLM CLI tool is going to work It's so much fun to play with, especially since Gemini somehow costs less than 1/10th of a cent per image Came up with a creative way to post quotes from video content on my blog - since my quotes support images I can run MacWhisper to extract a text transcript of part of the video, then drop a screenshot in the middle to illustrate the quote @simon Do you understand the difference between third party and direct sales in that quote? Third party sounds like regular API key usage, I’m unclear on what counts as direct sales The Internet Archive being down helps expose quite how much I rely on their Wayback Machine - I've headed over there are least four times in the past week and been disappointed at not being able to use it Today people are saying that it's interesting that Anthropic's Claude 3.5 Opus model is no longer mentioned on their models page - but without the archive I can't see for myself if it used to be listed there or not https://docs.anthropic.com/en/docs/about-claude/models Anthropic released a fascinating new capability today called "Computer Use" - a mode of their Claude 3.5 Sonnet model where it can do things like accept screenshots of a remotely operated computer and send back commands to click on specific coordinates, enter text etc My notes on what I've figured out so far: https://simonwillison.net/2024/Oct/22/computer-use/
Show previous comments
@simon @simon this has huge accessibility implications, hopefully I'll have some time to test this out from a screen reader perspective over the coming weekend. This is an absurdly clever hack. I broke it down to figure out how it works here: https://simonwillison.net/2024/Oct/21/sudoku-in-python-packaging/ @simon that is amazing! reminds me of when @fgnass golfed a sudoku solver into a tweet... he didn't have enough bytes to return the result so he stopped the recursive iteration by throwing: https://youtu.be/JsAetmgJRss?t=1467 You can solve sudokus in python packaging. Not not python code, python packages: @konstin This is not a new idea I exported my Claude data to poke around with it and found out I've become a HEAVY user of Claude Artifacts - the feature that lets Claude build a full interactive HTML+JavaScript tool for you based on your prompts. I built 14 (somewhat) useful things with it in just the past week! Here's a post describing them all https://simonwillison.net/2024/Oct/21/claude-artifacts/ @simon I haven't played as much with Canvas but like I told you at DCUS, Claude Projects can't be unseen once you use it. I saw your app last night, but I'm going to carve out some time to try it out. I have 1000s of chats and 100s of projects driven from it. I know it's not a true sync but I suspect I'll get a ton of good out of it. A year ago Mastodon 4.2 expanded the HTML filter to allow a whole bunch of additional tags... anyone know of any good examples of accounts that are using that bounty of additional formatting options? https://docs.joinmastodon.org/spec/activitypub/#sanitization
Show previous comments
@simon Running 4.2 here, but seeing raw tags in the replies 😪 <i>someday this might be in italics</i> @simon tantek uses it (via bridgy) to do in-post footnotes https://fed.brid.gy/r/https://tantek.com/2024/285/t1/io-domain-suggested-steps I really like Drew's framework here dividing current AI use-cases into Gods (human replacement, which I think of as still mostly science fiction), Interns (assistants you delegate closely-reviewed tasks to, which is most of how I use LLMs today) and Cogs (smaller tools that can more reliably serve a single purpose, like Whisper for transcription) - more of my own notes on this here: https://simonwillison.net/2024/Oct/20/gods-interns-and-cogs/ @simon For the "gods" category, also check out @forrestbrazeal's excellent song "AGI (Artificial God Incarnate)": https://www.youtube.com/watch?v=1ZhhO7MGknQ @simon I have been struggling with terminology, so this is useful. That said, I'm not a fan of "interns" used like this. The context that you used it in felt more appropriate than a whole class of AI terminology that literally means to replace a useful class of workers and learning. I have personally struggled with the term Agents for lack of a framework or way to use them outside of running a Python script. The cold open from Abbott Elementary Season 4 Episode 2 (Ringworm) - the one with the PTA meeting - is already iconic @simon the result on Firefox mobile is a slightly distressing amount of horizontal scroll for some reason, which rather spoils the effect! I finally managed to get the Llama 3.2 and Phi 3.5 vision models to run on my M2 Mac laptop, using the mistral.rs Rust library and its CLI Tool and Python bindings https://simonwillison.net/2024/Oct/19/mistralrs/ Here's what I got from Llama 3.2 11B for this photo I took at the Pioneer Memorial Museum in Salt Lake City https://www.niche-museums.com/111 "describe this image including any text" @simon thanks for this! I had some issues however replicating this, where on an M3 max it always crashes. (Plus also annoying that it also crashes or errors if it cannot find an image. There is a PR to fix that, but it's not merged yet) Like even on the M3 MAX, as the in-situ quantization is done on one core, it takes a while... have you experienced one or all of these? |
@simon I think it could be a game changer for learning a language.
@simon
ChatGPT response: "You have very peculiar needs." 🤪
Here's a short audio clip https://static.simonwillison.net/static/2024/russian-pelican-in-spanish.m4a