Here's my end-of-year review of things we learned about LLMs in 2024 - we learned a LOT of things https://simonwillison.net/2024/Dec/31/llms-in-2024/
Table of contents:
Here's my end-of-year review of things we learned about LLMs in 2024 - we learned a LOT of things https://simonwillison.net/2024/Dec/31/llms-in-2024/
Table of contents:
Searching Google for "encanto 2" right now provides an entirely made-up response derived from an imagined description of the film on a fan fiction wiki
https://simonwillison.net/2024/Dec/29/encanto-2/
Turns out we weren't done for major LLM releases in 2024 after all... Alibaba's Qwen just released QvQ, a "visual reasoning model" - the same chain-of-thought trick as OpenAI's o1 but applied strictly to running a prompt against an image
I've been trying it out and it's a lot of fun to poke around with: https://simonwillison.net/2024/Dec/24/qvq/
Here's what it said when I asked it to count those pelicans
Here are all of experiments with full transcripts https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc
I got QvQ running on my (M2 64GB) laptop!
uv run --with 'numpy<2.0' --with mlx-vlm python \
-m mlx_vlm.generate \
--model mlx-community/QVQ-72B-Preview-4bit \
--max-tokens 10000 \
--temp 0.0 \
--prompt "describe this" \
--image pelicans-on-bicycles-veo2.jpg
The other major Chinese AI lab, DeepSeek, just dropped their own last-minute entry into the 2024 model race: DeepSeek v3 is a HUGE model (685B parameters) which showed up, mostly undocumented, on Hugging Face this morning. My notes so far: https://simonwillison.net/2024/Dec/25/deepseek-v3/
The DeepSeek v3 paper came out this morning, added a few notes about that here https://simonwillison.net/2024/Dec/26/deepseek-v3/
Rewatching Severance in preparation for season 2 starting on January the 19th - it rewatches *really well*. I'm running a rewatch series on MetaFilter FanFare here, with a new episode every three days - just posted S1E2 https://fanfare.metafilter.com/show/severance
@evan I use GitHub copilot but I get a ton of work done directly in Claude (with Artifacts) and ChatGPT (with Code Interpeter) pasting code back and forth
@evan love Web Components - I often prompt it to create those directly, but I still want them all in a single file so I can easily copy and paste the whole lot out of the LLM at once
I figured out a prompting pattern for getting Claude to produce fully self-contained Python scripts that execute with "uv run" using PEP 723 inline script dependencies - and now I can one-shot useful Python utilities with it https://simonwillison.net/2024/Dec/19/one-shot-python-tools/
Here are my custom instructions which I'm using as part of a Claude Project, but I expect they'll work the same way with other LLMs too
I have a similar set of custom instructions I use with Claude Artifacts to get it to produce mobile-friendly single page HTML apps that run without a build step
I can now run a GPT-4 class model on my laptop
(The exact same laptop that could just about run a GPT-3 class model 20 months ago)
The new Llama 3.3 70B is a striking example of the huge efficiency gains we've seen in the last two years
https://simonwillison.net/2024/Dec/9/llama-33-70b/
This consistent trend of models getting smaller, faster and more capable on the same hardware is one of the reasons I'm not particularly concerned by the ongoing discourse about models hitting a plateau
https://simonwillison.net/2024/Dec/9/llama-33-70b/#is-performance-about-to-plateau-
I wrote a thing about "Storing time for human events" - how if you're building an events website used by actual human beings the standard advice of "convert times to UTC and just store that" isn't actually the best approach
https://simonwillison.net/2024/Nov/27/storing-times-for-human-events/
@demiurg @lmk I'm 100% on "store timestamps of when stuff happened as UTC" - the one edge-case here is for events that haven't happened yet where human beings think about them in terms of local time
This is excellent https://www.tiktok.com/@alexandrawideeyes/video/7443189708328340766
If you have an Apple Silicon Mac with >24GB of RAM and >5GB of available disk space and uv installed, try running this command to see the new model in action (replace IMG_4414.JPG at the end with a path to your own image)
uv run \
--with mlx-vlm \
--with torch \
python -m mlx_vlm.generate \
--model mlx-community/SmolVLM-Instruct-bf16 \
--max-tokens 500 \
--temp 0.5 \
--prompt "Describe this image in detail" \
--image IMG_4414.JPG
Open source developer building tools to help journalists, archivists, librarians and others analyze, explore and publish their data. https://datasette.io and many other #projects.
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.