OK, all done (I just went through and added alt text to the images with the help of Claude)
Top-level
OK, all done (I just went through and added alt text to the images with the help of Claude) 8 comments
@simon It feels way to expensive to run these models. But if the price drops to a level to, say chatgpt pro level ($200), I can many researchers will give it a try. @simon what is your prompt for this. I have had mixed results. And the API constantly asks if I want to keep doing the remaining images. @simon Thank you. Have you got something similar for reformatting transcripts and other longer texts that prevents “Would you like me to continue”? @bexelbie length limits are still really frustrating, o1 and o1-mini might do better on that but generally I think it may need a custom harness that knows how to run "keep going" prompts automatically a few times when needed @simon this is where I had gotten too as well. It seems to be very limiting for using LLMs as part of automated processes. Especially if it’s hard to detect final states and the LLMs are apparently bad at managing to length on tasks of variable length. |
By far the best coverage of o3 is this essay by François Chollet, it's crammed with interesting insights beyond just reporting on the benchmark score: https://arcprize.org/blog/oai-o3-pub-breakthrough
Published my own notes on that here: https://simonwillison.net/2024/Dec/20/openai-o3-breakthrough/