Conversation

Notices

Embed this notice
djsumdog (djsumdog@djsumdog.com)'s status on Saturday, 10-May-2025 16:00:16 JST djsumdog

Why does Arch Wiki, a public open source documentation system, not want a crawler to index their site? People can scream AI all they want, but the admins are also destroying any new attempts to break into the search engine market. Do they think Google/Bing/Yandex don't already get past this, or do their servers return different results for the big search bots?

People use to be able to view Arch wiki without Javascript. Now they can't. 😡
In conversation about 11 days ago from djsumdog.com permalink
Attachments
1. Untitled attachment
  https://djsumdog.com/media/fa/01/68/fa01688777e729835b3ee5f82c982584d92496652479c37dfe290b97c2d07ce7.png
- pistolero and soberano like this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Saturday, 10-May-2025 16:04:44 JST pistolero
  in reply to
  
  @djsumdog It's probably due to bot-generated edits rather than spiders.
  
  In conversation about 11 days ago permalink
  
  soberano likes this.
- Embed this notice
  Zergling_man (zergling_man@sacred.harpy.faith)'s status on Saturday, 10-May-2025 16:05:16 JST Zergling_man
  in reply to
  - shortstories
  @shortstories @djsumdog Arguably not much more than they can spy on you anyway [especially if you have scripts enabled]
  The main objection is that javascript is fucking shit and makes everything worse.
  
  In conversation about 11 days ago permalink
  
  pistolero and soberano like this.
- Embed this notice
  shortstories@merovingian.club's status on Saturday, 10-May-2025 16:05:17 JST shortstories
  in reply to
  
  @djsumdog
  That makes it hard to archive and maybe impossible to archive with web dot archive dot org also known as the way back machine
  Additionally by clicking I am human they can spy on you
  
  In conversation about 11 days ago permalink
  
  pistolero repeated this.
- Embed this notice
  djsumdog (djsumdog@djsumdog.com)'s status on Saturday, 10-May-2025 16:16:43 JST djsumdog
  in reply to
  - Zergling_man
  - shortstories
  uBlock Origin makes it easy to disable JS universally and enable it selectivity, without the complexity of uMatrix. The speed differences is noticeable.
  
  Also, the Gentoo wiki has no such Javascript "proof of work" for accessing it.
  In conversation about 11 days ago permalink
  Attachments
  1. Untitled attachment
    https://djsumdog.com/media/24/f7/7d/24f77d2ed1d3d08210df7cb007caf74af1a1d10a613760911d27d30151dfae86.png
- Embed this notice
  Zergling_man (zergling_man@sacred.harpy.faith)'s status on Saturday, 10-May-2025 16:16:45 JST Zergling_man
  in reply to
  - shortstories
  @shortstories @djsumdog Browse the web with scripts disabled, then browse it with scripts enabled.
  Observe, with your own eyes, the performance difference.
  
  In conversation about 11 days ago permalink
  
  翠星石 likes this.
- Embed this notice
  shortstories@merovingian.club's status on Saturday, 10-May-2025 16:16:46 JST shortstories
  in reply to
  - Zergling_man
  @Zergling_man @djsumdog
  If I had to guess it is not javascript that is the problem but javascript libraries plus trying to keep things updated that is the problem
  If you write code to do certain basic things that do not need constant updating and do not access a library then if it works it should continue to work
  Once someone puts library software in the code working code can malfunction when the library changes
  
  In conversation about 11 days ago permalink
- Embed this notice
  djsumdog (djsumdog@djsumdog.com)'s status on Saturday, 10-May-2025 16:32:42 JST djsumdog
  in reply to
  - Zergling_man
  - shortstories
  I have no idea what you're arguing. The problem I had was I couldn't see a static website because it requires Javascript, NOT for functionality, not even for DDoS protection, but to solve a proof of work because they don't want to be scraped by "AI" ... an open source documentation site not wanting to be scraped. Let that sink in.
  
  In conversation about 11 days ago permalink
- Embed this notice
  shortstories@merovingian.club's status on Saturday, 10-May-2025 16:32:43 JST shortstories
  in reply to
  - Zergling_man
  @Zergling_man @djsumdog
  If someone does not know how to program something or is too lazy to do it themself they will use someone else's library
  These libraries can change at any time
  So what might have worked might stop working when the library is changed
  These libraries allow people who do not know what they are doing to look competant and slip in bad code that will malfunction later after they get paid to do their job.
  I would suggest that these libraries are an additional serious problem
  
  In conversation about 11 days ago permalink
- Embed this notice
  shortstories@merovingian.club's status on Saturday, 10-May-2025 16:32:44 JST shortstories
  in reply to
  - Zergling_man
  @Zergling_man @djsumdog
  I would suggest there are two reasons for the difference
  1 Having more code to run slows down everything in exchange for doing whatever adfitional feature is provided by running the code
  2 The problem is not primarily from Javascript itself other than the mistake of whoever put the library feature. I would suggest the problem is with people who are bad at computer programming writing the code in Javascript using libraries because they do not know how to program
  
  In conversation about 11 days ago permalink
- Embed this notice
  Zergling_man (zergling_man@sacred.harpy.faith)'s status on Saturday, 10-May-2025 16:32:44 JST Zergling_man
  in reply to
  - shortstories
  @shortstories @djsumdog It's just 1. It's all 1.
  If it were actually an "additional feature" it would make sense.
  99% of the time it is not. Like loading a form; as if there isn't a standard way to do that already.
  
  In conversation about 11 days ago permalink
- Embed this notice
  翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Saturday, 10-May-2025 21:34:15 JST 翠星石
  in reply to
  - tyil
  @tyil @djsumdog Please do not immorally attack people with proprietary software tyil - you know better.
  
  One way to solve that issue is to set bait with gzip bombs; https://idiallo.com/blog/zipbomb-protection ("Content-Encoding: deflate, gzip" is incorrect, should be; "Content-Encoding: gzip") - many bots will fetch such bombs and crash.
  
  Most scraper bots seem to use Apple useragents and just 403 .*AppleWebKit.* fixes that issue for cgit (or if you still want to allow isheep access to your website, maybe attacking apple used with more proprietary malware is what they deserve).
  In conversation about 11 days ago permalink
  Attachments
  1. Domain not in remote thumbnail source whitelist: cdn.idiallo.com
    
    I use Zip Bombs to Protect my Server
    
    from @dialloibu
    
    The majority of the traffic on the web is from bots. For the most part, these bots are used to discover new content. These are RSS Feed readers, search engines crawling your content, or nowadays AI bo
- Embed this notice
  tyil (tyil@fedi.tyil.nl)'s status on Saturday, 10-May-2025 21:34:18 JST tyil
  in reply to
  
  @djsumdog@djsumdog.com Depending on if they suffer the same issues as my cgit instance, the choice is to be down completely because LLM scrapers overload the instance constantly, or force JS so at least some people can use the site.
  
  I don't enjoy using Anubis, I think it is stupid to waste CPU cycles like this. I do use Anubis on services that are go down all day because I currently have no better solution to fight back against LLMs. I don't have the money for infinite resources, and I don't have the time to constantly log potential LLM bots and block them.
  
  In conversation about 11 days ago permalink
- Embed this notice
  翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Saturday, 10-May-2025 21:36:00 JST 翠星石
  in reply to
  
  @djsumdog Yes, the typical "open source" project doesn't hesitate to attack its users with proprietary malware.
  
  In conversation about 11 days ago permalink
- Embed this notice
  Reasonable Man (r000t@ligma.pro)'s status on Sunday, 11-May-2025 01:25:42 JST Reasonable Man
  in reply to
  
  @djsumdog
  I'm trying to solve this problem by making public endpoints that hit an offline Wikipedia dump.
  If my service is better than live Wikipedia in terms of speed or having to parse sht, then everyone wins.
  
  In conversation about 11 days ago permalink

Public

Conversation

Notices

Feeds