Some notes on running the new SmolVM vision model (~4.2GB) on my MacBook Pro M2 at 75 tokens/second to describe a photo, using a shell one-liner powered by uv and mlx-vlm https://simonwillison.net/2024/Nov/28/smolvlm/
If you have an Apple Silicon Mac with >24GB of RAM and >5GB of available disk space and uv installed, try running this command to see the new model in action (replace IMG_4414.JPG at the end with a path to your own image)