@simon yep, which is particularly not helpful for users...

Simon's posts Post Back to profile

Top-level

Leaping Woman

@simon yep, which is particularly not helpful for users of screen readers.

Like 19 October at 17:12 | Wall-to-wall | Open on spore.social

5 comments

Simon Willison

@leapingwoman I've talked to screen reader users who still get enormous value out of the vision LLMs - they're generally reliable for things like text and high level overviews, where they get weird is more detailed descriptions

Plus the best hosted models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) are a whole lot less likely to hallucinate than the ones I can run on my laptop!

19 October at 17:20 | Open on fedi.simonwillison.net

Simon Willison

@leapingwoman I use Claude 3.5 Sonnet to help me write alt text on almost a daily basis, but I never use exactly what it spat out - I always further edit it myself for clarity and to make sure it's as useful as possible

19 October at 17:21 | Open on fedi.simonwillison.net

Leaping Woman

@simon That's the way to do it. Both with image descriptions and with automatic speech-to-text, editing the machine version is key.

19 October at 19:25 | Open on spore.social

Joseph Szymborski :qcca:

@simon @leapingwoman Yah, it looks like Calude 3.5 Sonnet is right on the money with this one:

The image shows a large, neoclassical-style building with white stone walls and columns. The building is identified as the "PIONEER MEMORIAL MUSEUM" by text above its entrance. In front of the building stands a statue, though details of the statue are not clear from this distance.
The foreground of the image shows a sign that reads:
"HEADQUARTERS
INTERNATIONAL SOCIETY
DAUGHTERS OF UTAH PIONEERS"
The building is surrounded by trees, some of which are beginning to bud or leaf out, suggesting it's spring. The sky appears overcast with some clouds visible.
There are sidewalks leading up to the building, and a street is visible to the right side of the image. The overall setting appears to be in an urban or suburban area, likely in Utah given the reference to "Utah Pioneers" on the sign.

19 October at 18:28 | Open on cosocial.ca

Sigismund Ninja

@simon @leapingwoman also, the llama 3.2 model is quantized so that it uses 4 bit weights (instead of original 16 bit). And the model is fine-tuned for material sciences.

https://huggingface.co/lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k

19 October at 19:40 | Open on mastodon.nu

Go Up