@simon Thanks for this! I've just started working on a project that needs to both generate bounding boxes and extract some qualitative information from images—hopefully Gemini can be a one stop shop for that, rather than stringing things together like I'd started to do.

Microsoft has docs on a GPT4+"Enhancements" vision model with grounding/bounding boxes, but when you get into their dashboard it seems like it's actually deprecated. 🙄