If you run an operation that pays freelance authors for articles, you get a *lot* of people trying to sell you the output from their slop factory of choice. These pitches far exceed the legitimate ones at this point.
Today we got a pitch for an article about the load-balancing scheduler regression caused by the sched_ext framework in the 6.11 release. Somebody has clearly put a bit more than the usual amount of attention into the sort of topic that might appeal to @lwn. There is only one little problem... that regression had nothing to do with sched_ext, which was merged in 6.12. The pitch was a bunch of authoritative-sounding bullshit; the article would surely have been more of the same.
Sometimes I truly lose hope about humanity's ability to keep its head above the flood of this stuff.
Just got a note from them saying "the storage for the physical host that your Linode resides on is in a degraded state. Our team has determined that there is a *potential* for data loss or corruption for all services residing on it"
Oh, and also that they'll continue charging us for it anyway until we delete it.
It took a long time and over 60 articles but, at @lwn, we have finally managed to complete our reporting from the 2025 Linux Storage, Filesystem, Memory Management, and BPF Summit. If you want to know what is going on in those core parts of the kernel, this is the place to look.
We've put together an EPUB version of the whole set as well — good bedtime reading!
The Wayback Machine managed to capture a Linux Journal article about the Arch Linux distribution's plan to switch to "rye-init" before whatever human intelligence remains there figured out that "rye-init" does not actually exist.
The Linux Journal predates LWN by some years and was, for a long time, the definitive read for Linux users. The Don Marti ( @dmarti ) years were especially noteworthy. It is sad to see where it has ended up now.
It drove home the perils of relying on proprietary software and spurred the creation of Git - a significant event, overall.
Embed this noticeJonathan Corbet (corbet@social.kernel.org)'s status on Monday, 31-Mar-2025 01:11:38 JST
Jonathan CorbetToday I got a cheery email from somebody who claims to be the "ethics and compliance" officer for a company called Bright Data. He wanted to have a "no pressure" conversation about the whole AI scraperbot problem. Looking at their web site, this company offers an API that, and I quote, "Bypasses anti-scraping mechanisms and solves CAPTCHAs, ensuring uninterrupted access to the most protected web sites".
After careful consideration for several milliseconds, I have concluded that I really don't have anything to discuss with this person.
But at least their claimed "100M+" of residential IP addresses that they use for their DDOS attacks are "ethically sourced".
So, while I think this article declares victory a bit too soon, I think we also need the occasional optimistic view that we may actually get through this administration.
@mcdanlj@LWN What a lot of people are suggesting (nepethenes and such) will work great against a single abusive robot. None of it will help much when tens of thousands of sites are grabbing a few URLs each. Most of them will never step into the honeypot, and the ones that do will not be seen again regardless.
@penguin42 They don't tell me what they are doing with the data... the distributed scraping is an easily observable fact, though. Perhaps they are firehosing the data back to the mothership for training?
@smxi@monsieuricon Suggestions for these countermeasures - and how to apply them without hosing legitimate users - would be much appreciated. I'm glad they are obvious to you, please do share!
To be clear, LWN has never "crashed" as a result of this onslaught. We'll not talk about what happened after I pushed up some code trying to address it...
Most seriously, though: I'm surprised that this situation is surprising to anybody at this point. This is a net-wide problem, it surely is not limited to free-software-oriented sites. But if the problem is starting to get wider attention, that is fine with me...
Some of these bots are clearly running on a bunch of machines on the same net. I have been able to reduce the traffic significantly by treating everything as a class-C net and doing subnet-level throttling. That and simply blocking a couple of them.
But that leaves a lot of traffic with an interesting characteristic: there are millions of obvious bot hits (following a pattern through the site, for example) that all come from a different IP. An access log with 9M lines as over 1M IP addresses, and few of them appear more than about three times.
So these things are running on widely distributed botnets, likely on compromised computers, and they are doing their best to evade any sort of recognition or throttling. I don't think that any sort of throttling or database of known-bot IPs is going to help here...not quite sure what to do about it.