I'm hoping to turn this into a series of YouTube interviews with people building cool data projects where we nerd out about what they've built and how they built it, so I'm optimistically thinking of this as episode one! https://www.youtube.com/watch?v=t_S-loWDGE0
I wrote about the currently circulating meme where ChatGPT appears to provide deep insights into your personality if you ask it "From all of our interactions what is one thing that you can tell me about myself that I may not know about myself" - when actually all it's doing is spinning up a pseudo-horoscope for you based on short notes it added to its "memory"
(I shared this on Twitter and it's interesting how some people there are very resistant to the idea that the deeply personal insights ChatGPT gave them about themselves might be bogus junk)
I need a small favor: I’m planning to livestream a hyper-local candidate forum for our small town’s local election next week, but I just found out you can’t livestream from a mobile device on YouTube until your channel has 50 subscribers!
If 49 more people could subscribe to https://www.youtube.com/@CoastsideCivic (feel free to unsubscribe after Thursday next week) it would really help me out
And if you’re interested in El Granada California local elections come along on Wednesday https://coastsidecivic.com
It's in my S3 bucket along with all my other images, but I have to admit I do worry that some day decades into the future I may fail to pay my AWS bill and risk my S3 bucket blinking out of existence
@darius@colby@eaton@Edent I care about threads that were entirely me replying to myself, but those replies include quotes of other tweets that I want to capture as well
@darius@colby@eaton@Edent yeah that’s what’s held me back in the past too - hosting my own tweets is obviously fine, embedding full videos from quotes tweets much less so
I don’t want to republish those videos though, I want to stash a personal copy in case Twitter fully implodes some day
It's not surprising to learn that they're doing this - that's practically the industry standard right now - but is still really interesting to see internal details of what they're collecting and why
This letter from Mark Zuckerberg - "Open Source AI Is the Path Forward" - is genuinely worth reading in full, even though it doubles down on Meta's ongoing nasty habit of misusing the term "Open Source" for models that clearly don't fit the Open Source Initiative definition
It outlines Meta's position on why they believe in and invest in openly licensed models, their thoughts on AI safety and even touches on geopolitical issues
> "I see Zuck's prominent misuse of 'open source' as a small-scale act of cultural vandalism," Willison told Ars Technica. "Open source should have an agreed meaning. Abusing the term weakens that meaning which makes the term less generally useful, because if someone says 'it's open source,' that no longer tells me anything useful. I have to then dig in and figure out what they're actually talking about."
Big model release today: Meta AI's Llama 3.1 series, including Llama 3.1 405B which appears to be the first openly licensed model that genuinely competes with current top proprietary models GPT-4o and Claude 3.5 Sonnet
A neat thing about having your own blog is there's no rule saying you can't backfill the archives... yesterday I added this page for a talk I gave back in 2017, just to have it show up in the archives in the right place https://simonwillison.net/2017/Aug/16/denormalized-query-engine/
It turns out Google Chrome ships a default, hidden extension that allows code on `*.google.com` access to private APIs, including your current CPU usage
You can test it out by pasting the following into your Chrome DevTools console on any Google page:
Worth grepping your source code for "polyfill.io" and taking urgent measures to remove that code if you're linking it into your site - the domain name apparently now intermittently serves malicious JavaScript
Open source developer building tools to help journalists, archivists, librarians and others analyze, explore and publish their data. https://datasette.io and many other #projects.