Here's another example of multi-modal vision LLM usage: I collected the prices for the different preset models by dumping screenshots of their pricing pages directly into the Claude conversation
Full transcript here: https://gist.github.com/simonw/6b684b5f7d75fb82034fc963cc487530
@simon Leaks show that the ChatGPT Mac and/or web app are going to get screen sharing soon via the Realtime API. Seems like this is the next frontier: dumping the whole personal computing experience into models.