Public
- Public
- Network
- Groups
- Featured
- Popular
- People

Notices by Simon Willison (simon@fedi.simonwillison.net), page 2

Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Wednesday, 01-Jan-2025 05:58:20 JST Simon Willison

Here's my end-of-year review of things we learned about LLMs in 2024 - we learned a LOT of things https://simonwillison.net/2024/Dec/31/llms-in-2024/
Table of contents:
In conversation about 3 months ago from fedi.simonwillison.net permalink
Attachments
1. The GPT-4 barrier was comprehensively broken Some of those GPT-4 models run on my laptop LLM prices crashed, thanks to competition and increased efficiency Multimodal vision is common, audio and video are starting to emerge Voice and live camera mode are science fiction come to life Prompt driven app generation is a commodity already Universal access to the best models lasted for just a few short months “Agents” still haven’t really happened yet Evals really matter Apple Intelligence is bad, Apple’s MLX library is excellent The rise of inference-scaling “reasoning” models Was the best currently available LLM trained in China for less than $6m? The environmental impact got better The environmental impact got much, much worse The year of slop Synthetic training data works great LLMs somehow got even harder to use Knowledge is incredibly unevenly distributed LLMs need better criticism Everything tagged “llms” on my blog in 2024
  https://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/748/779/363/034/653/original/ee7621dda992ed1c.png
2. Untitled attachment
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Sunday, 29-Dec-2024 10:50:47 JST Simon Willison

Searching Google for "encanto 2" right now provides an entirely made-up response derived from an imagined description of the film on a fan fiction wiki
https://simonwillison.net/2024/Dec/29/encanto-2/
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Domain not in remote thumbnail source whitelist: static.simonwillison.net
  
  Google search hallucinates Encanto 2
  
  from @simonw
  
  Jason Schreier on Bluesky: > I was excited to tell my kids that there's a sequel to Encanto, only to scroll down and learn that Google's AI just completely made …
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 27-Dec-2024 04:22:40 JST Simon Willison

Turns out we weren't done for major LLM releases in 2024 after all... Alibaba's Qwen just released QvQ, a "visual reasoning model" - the same chain-of-thought trick as OpenAI's o1 but applied strictly to running a prompt against an image
I've been trying it out and it's a lot of fun to poke around with: https://simonwillison.net/2024/Dec/24/qvq/
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Domain not in remote thumbnail source whitelist: static.simonwillison.net
  
  Trying out QvQ—Qwen’s new visual reasoning model
  
  from @simonw
  
  I thought we were done for major model releases in 2024, but apparently not: Alibaba’s Qwen team just dropped the Apache 2.0 licensed Qwen licensed (the license changed) QvQ-72B-Preview, “an …
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 27-Dec-2024 04:22:39 JST Simon Willison
in reply to

Here's what it said when I asked it to count those pelicans
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. So I’ve got this picture of some pelicans, and I need to count them. Let me see... First off, I know that pelicans are those big birds with the massive beaks, right? [...] Let me try counting again, more carefully this time. Starting from the left: The first pelican is fully visible, standing at the front. Behind it, slightly to the right, is the second pelican. Then, further back, there’s a third pelican. And next to that one, there’s a fourth pelican. So, that’s four pelicans in total. But, I notice that on the far right of the image, there’s part of another bird, maybe a pelican’s back or tail feathers, but it’s not fully in the frame, so I don’t think I should count it as a whole pelican.
  https://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/709/819/212/093/759/original/bb299b66f683b4cf.png
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 27-Dec-2024 04:22:38 JST Simon Willison
in reply to

Here are all of experiments with full transcripts https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Domain not in remote thumbnail source whitelist: github.githubassets.com
  
  https://simonwillison.net/2024/Dec/24/qvq/
  
  from simonw
  
  https://simonwillison.net/2024/Dec/24/qvq/
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 27-Dec-2024 04:22:37 JST Simon Willison
in reply to

I got QvQ running on my (M2 64GB) laptop!
uv run --with 'numpy<2.0' --with mlx-vlm python \
-m mlx_vlm.generate \
--model mlx-community/QVQ-72B-Preview-4bit \
--max-tokens 10000 \
--temp 0.0 \
--prompt "describe this" \
--image pelicans-on-bicycles-veo2.jpg
https://simonwillison.net/2024/Dec/24/qvq/#with-mlx-vlm
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Image: ['pelicans-on-bicycles-veo2.jpg'] Prompt: <|im_start|>system You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|> <|im_start|>user describe this<|vision_start|><|image_pad|><|vision_end|><|im_end|> <|im_start|>assistant Alright, I've got this "Text to Video" tool to explore. It seems pretty advanced, allowing me to turn text descriptions into actual videos. The interface looks user-friendly, with a dark theme that's easy on the eyes. On the left side, there's a panel where I can input my text prompt. It already has an example filled in: "A pelican riding a bicycle along a coastal path overlooking a harbor."
  https://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/712/054/445/643/600/original/29b71bc8e707b240.png
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 27-Dec-2024 04:22:36 JST Simon Willison
in reply to

The other major Chinese AI lab, DeepSeek, just dropped their own last-minute entry into the 2024 model race: DeepSeek v3 is a HUGE model (685B parameters) which showed up, mostly undocumented, on Hugging Face this morning. My notes so far: https://simonwillison.net/2024/Dec/25/deepseek-v3/
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Domain not in remote thumbnail source whitelist: static.simonwillison.net
  
  deepseek-ai/DeepSeek-V3-Base
  
  from @simonw
  
  No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund [High-Flyer](https://en.wikipedia.org/wiki/High-Flyer_(company))) looks very significant. It's a huge model …
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 27-Dec-2024 04:22:35 JST Simon Willison
in reply to

The DeepSeek v3 paper came out this morning, added a few notes about that here https://simonwillison.net/2024/Dec/26/deepseek-v3/
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. DeepSeek v3 benchmarks comparably to Claude 3.5 Sonnet, indicating that it's now possible to train a frontier-class model (at least for the 2024 version of the frontier) for less than $6 million! DeepSeek also announced their API pricing. From February 8th onwards: Input: $0.27/million tokens ($0.07/million tokens with cache hits) Output: $1.10/million tokens Claude 3.5 Sonnet is currently $3/million for input and $15/million for output, so if the models are indeed of equivalent quality this is a dramatic new twist in the ongoing LLM pricing wars.
  https://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/720/626/246/615/353/original/6e8805452ed02ee9.png
2. No result found on File_thumbnail lookup.
  
  DeepSeek_V3.pdf
  
  from @simonw
  
  The DeepSeek v3 paper (and [model card](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README.md)) are out, after yesterday's mysterious release of [the undocumented model weights](https://simonwillison.net/2024/Dec/25/deepseek-v3/). Plenty of interesting details in here. The model pre-trained on 14.8 trillion …
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Sunday, 22-Dec-2024 12:33:22 JST Simon Willison

Rewatching Severance in preparation for season 2 starting on January the 19th - it rewatches *really well*. I'm running a rewatch series on MetaFilter FanFare here, with a new episode every three days - just posted S1E2 https://fanfare.metafilter.com/show/severance
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. No result found on File_thumbnail lookup.
  
  Severance | FanFare
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 20-Dec-2024 01:30:54 JST Simon Willison
in reply to
- Evan Prodromou
@evan I use GitHub copilot but I get a ton of work done directly in Claude (with Artifacts) and ChatGPT (with Code Interpeter) pasting code back and forth

In conversation about 4 months ago from fedi.simonwillison.net permalink
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 20-Dec-2024 00:53:20 JST Simon Willison
in reply to
- Evan Prodromou
@evan love Web Components - I often prompt it to create those directly, but I still want them all in a single file so I can easily copy and paste the whole lot out of the LLM at once

In conversation about 4 months ago from fedi.simonwillison.net permalink
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Thursday, 19-Dec-2024 23:09:20 JST Simon Willison

I figured out a prompting pattern for getting Claude to produce fully self-contained Python scripts that execute with "uv run" using PEP 723 inline script dependencies - and now I can one-shot useful Python utilities with it https://simonwillison.net/2024/Dec/19/one-shot-python-tools/
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Domain not in remote thumbnail source whitelist: static.simonwillison.net
  
  Building Python tools with a one-shot prompt using uv run and Claude Projects
  
  from @simonw
  
  I’ve written a lot about how I’ve been using Claude to build one-shot HTML+JavaScript applications via Claude Artifacts. I recently started using a similar pattern to create one-shot Python utilities, …
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Thursday, 19-Dec-2024 23:09:18 JST Simon Willison
in reply to

Here are my custom instructions which I'm using as part of a Claude Project, but I expect they'll work the same way with other LLMs too
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. You write Python tools as single files. They always start with this comment: # /// script # requires-python = ">=3.12" # /// These files can include dependencies on libraries such as Click. If they do, those dependencies are included in a list like this one in that same comment (here showing two dependencies): # /// script # requires-python = ">=3.12" # dependencies = [ # "click", # "sqlite-utils", # ] # ///
  https://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/679/680/256/522/761/original/fa2f7b766fefec7b.jpeg
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Thursday, 19-Dec-2024 23:09:17 JST Simon Willison
in reply to

I have a similar set of custom instructions I use with Claude Artifacts to get it to produce mobile-friendly single page HTML apps that run without a build step
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Never use React in artifacts—always plain HTML and vanilla JavaScript and CSS with minimal dependencies. CSS should be indented with two spaces and should start like this: <style> * { box-sizing: border-box; } Inputs and textareas should be font size 16px. Font should always prefer Helvetica. JavaScript should be two space indents and start like this: <script type="module"> // code in here should not be indented at the first level
  https://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/679/795/285/666/901/original/e64169f35301db67.jpeg
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Tuesday, 10-Dec-2024 04:53:13 JST Simon Willison

I can now run a GPT-4 class model on my laptop
(The exact same laptop that could just about run a GPT-3 class model 20 months ago)
The new Llama 3.3 70B is a striking example of the huge efficiency gains we've seen in the last two years
https://simonwillison.net/2024/Dec/9/llama-33-70b/
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Domain not in remote thumbnail source whitelist: static.simonwillison.net
  
  I can now run a GPT-4 class model on my laptop
  
  from @simonw
  
  Meta’s new Llama 3.3 70B is a genuinely GPT-4 class Large Language Model that runs on my laptop. Just 20 months ago I was amazed to see something that felt …
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Tuesday, 10-Dec-2024 04:53:10 JST Simon Willison
in reply to

This consistent trend of models getting smaller, faster and more capable on the same hardware is one of the reasons I'm not particularly concerned by the ongoing discourse about models hitting a plateau
https://simonwillison.net/2024/Dec/9/llama-33-70b/#is-performance-about-to-plateau-
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Is performance about to plateau? # I’ve been mostly unconvinced by the ongoing discourse around LLMs hitting a plateau. The areas I’m personally most excited about are multi-modality (images, audio and video as input) and model efficiency. Both of those have had enormous leaps forward in the past year. I don’t particularly care about “AGI”. I want models that can do useful things that I tell them to, quickly and inexpensively—and that’s exactly what I’ve been getting more of over the past twelve months. Even if progress on these tools entirely stopped right now, the amount I could get done with just the models I’ve downloaded and stashed on a USB drive would keep me busy and productive for years.
  https://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/624/603/584/330/272/original/5269aa79cc9eec0d.png
2. Domain not in remote thumbnail source whitelist: static.simonwillison.net
  
  I can now run a GPT-4 class model on my laptop
  
  from @simonw
  
  Meta’s new Llama 3.3 70B is a genuinely GPT-4 class Large Language Model that runs on my laptop. Just 20 months ago I was amazed to see something that felt …
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 06-Dec-2024 05:04:05 JST Simon Willison

I wrote a thing about "Storing time for human events" - how if you're building an events website used by actual human beings the standard advice of "convert times to UTC and just store that" isn't actually the best approach
https://simonwillison.net/2024/Nov/27/storing-times-for-human-events/
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. Domain not in remote thumbnail source whitelist: static.simonwillison.net
  
  Storing times for human events
  
  from @simonw
  
  I’ve worked on various event websites in the past, and one of the unintuitively difficult problems that inevitably comes up is the best way to store the time that an …
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 06-Dec-2024 05:04:02 JST Simon Willison
in reply to
- Loren Kohnfelder
- Demiurg
@demiurg @lmk I'm 100% on "store timestamps of when stuff happened as UTC" - the one edge-case here is for events that haven't happened yet where human beings think about them in terms of local time

In conversation about 4 months ago from fedi.simonwillison.net permalink
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Tuesday, 03-Dec-2024 09:05:18 JST Simon Willison

This is excellent https://www.tiktok.com/@alexandrawideeyes/video/7443189708328340766
In conversation about 4 months ago from fedi.simonwillison.net permalink
Attachments
1. TikTok: A certain type of voter watching Muppet Christmas Carol Response to a comment: "do these same voters crying over a Christmas Carol every year and entirely missing the point"
  https://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/578/492/093/211/412/original/a032ec3f9e44d486.jpeg
2. Domain not in remote thumbnail source whitelist: p16-sign.tiktokcdn-us.com
  
  Replying to @TheRentedNest arguably the greatest christmas movie ever made IMHO #christmas #muppets #muppetchristmascarol #themuppets #achristmascarol #trump #trumpsupportersbelike #liberal #christmasmovie #christmastiktok #comedy #movie #jimhenson #brianhenson
  
  from Alexandra Kopko (She/Her)
Embed this notice
Simon Willison (simon@fedi.simonwillison.net)'s status on Friday, 29-Nov-2024 08:24:09 JST Simon Willison
in reply to

If you have an Apple Silicon Mac with >24GB of RAM and >5GB of available disk space and uv installed, try running this command to see the new model in action (replace IMG_4414.JPG at the end with a path to your own image)
uv run \
--with mlx-vlm \
--with torch \
python -m mlx_vlm.generate \
--model mlx-community/SmolVLM-Instruct-bf16 \
--max-tokens 500 \
--temp 0.5 \
--prompt "Describe this image in detail" \
--image IMG_4414.JPG

In conversation about 5 months ago from fedi.simonwillison.net permalink

After
Before

User actions

Open source developer building tools to help journalists, archivists, librarians and others analyze, explore and publish their data. https://datasette.io and many other #projects.

Tags

(None)

Following 0

Followers 0

Groups 0

Statistics

User ID: 18625

Member since: 7 Nov 2022

Notices: 205

Daily average: 0

Feeds

Atom