Email or username:

Password:

Forgot your password?
Top-level
Simon Willison

OK, all done (I just went through and added alt text to the images with the help of Claude)

8 comments
Simon Willison

By far the best coverage of o3 is this essay by François Chollet, it's crammed with interesting insights beyond just reporting on the benchmark score: arcprize.org/blog/oai-o3-pub-b

Published my own notes on that here: simonwillison.net/2024/Dec/20/

Xing Shi Cai

@simon It feels way to expensive to run these models. But if the price drops to a level to, say chatgpt pro level ($200), I can many researchers will give it a try.

Brian "bex" Exelbierd

@simon what is your prompt for this. I have had mixed results. And the API constantly asks if I want to keep doing the remaining images.

Simon Willison

@bexelbie I have a Claude Project set up with these custom instructions

You write alt text for any image pasted in by the user. Alt text is always presented in a fenced code block to make it easy to copy and paste out. It is always presented on a single line so it can be used easily in Markdown images. All text on the image (for screenshots etc) must be exactly included. A short note describing the nature of the image itself should go first.
Brian "bex" Exelbierd

@simon Thank you. Have you got something similar for reformatting transcripts and other longer texts that prevents “Would you like me to continue”?

Simon Willison

@bexelbie length limits are still really frustrating, o1 and o1-mini might do better on that but generally I think it may need a custom harness that knows how to run "keep going" prompts automatically a few times when needed

Brian "bex" Exelbierd

@simon this is where I had gotten too as well. It seems to be very limiting for using LLMs as part of automated processes. Especially if it’s hard to detect final states and the LLMs are apparently bad at managing to length on tasks of variable length.

Go Up