Here's a recipe for running the Qwen2-VL vision LLM... | Simon Willison

Simon's posts Post Back to profile

Here's a recipe for running the Qwen2-VL vision LLM models on Apple Silicon using Python and the mlx-vlm library, via a uv shell one-liner

Full details on my blog: https://simonwillison.net/2024/Sep/29/mlx-vlm/ - and here's the full output from that example prompt https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17

I used uv to run it against this image with this shell one-liner:

uv run --with mlx-vlm \
python -m mlx_vlm.generate \
--model Qwen/Qwen2-VL-2B-Instruct \
--max-tokens 1000 \
--temp 0.0 \
--image https://static.simonwillison.net/static/2024/django-roadmap.png \
--prompt "Describe image in detail, include all text"

This first downloaded 4.1GB to my ~/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct folder and then output this result, which starts:

The image is a horizontal timeline chart that represents the release dates of various software versions. The timeline is divided into years from 2023 to 2029, with each year represented by a vertical line. The chart includes a legend at the bottom, which distinguishes between different types of software versions. [...]

Like 29 September at 21:52 | Open on fedi.simonwillison.net

No comments