GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    djsumdog (djsumdog@djsumdog.com)'s status on Saturday, 10-May-2025 16:00:16 JST djsumdog djsumdog
    Why does Arch Wiki, a public open source documentation system, not want a crawler to index their site? People can scream AI all they want, but the admins are also destroying any new attempts to break into the search engine market. Do they think Google/Bing/Yandex don't already get past this, or do their servers return different results for the big search bots?

    People use to be able to view Arch wiki without Javascript. Now they can't. 😡
    In conversation about 11 days ago from djsumdog.com permalink

    Attachments


    1. https://djsumdog.com/media/fa/01/68/fa01688777e729835b3ee5f82c982584d92496652479c37dfe290b97c2d07ce7.png
    • pistolero and soberano like this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 10-May-2025 16:04:44 JST pistolero pistolero
      in reply to
      @djsumdog It's probably due to bot-generated edits rather than spiders.
      In conversation about 11 days ago permalink
      soberano likes this.
    • Embed this notice
      Zergling_man (zergling_man@sacred.harpy.faith)'s status on Saturday, 10-May-2025 16:05:16 JST Zergling_man Zergling_man
      in reply to
      • shortstories
      @shortstories @djsumdog Arguably not much more than they can spy on you anyway [especially if you have scripts enabled]
      The main objection is that javascript is fucking shit and makes everything worse.
      In conversation about 11 days ago permalink
      pistolero and soberano like this.
    • Embed this notice
      shortstories@merovingian.club's status on Saturday, 10-May-2025 16:05:17 JST shortstories shortstories
      in reply to

      @djsumdog

      That makes it hard to archive and maybe impossible to archive with web dot archive dot org also known as the way back machine

      Additionally by clicking I am human they can spy on you

      In conversation about 11 days ago permalink
      pistolero repeated this.
    • Embed this notice
      djsumdog (djsumdog@djsumdog.com)'s status on Saturday, 10-May-2025 16:16:43 JST djsumdog djsumdog
      in reply to
      • Zergling_man
      • shortstories
      uBlock Origin makes it easy to disable JS universally and enable it selectivity, without the complexity of uMatrix. The speed differences is noticeable.

      Also, the Gentoo wiki has no such Javascript "proof of work" for accessing it.
      In conversation about 11 days ago permalink

      Attachments


      1. https://djsumdog.com/media/24/f7/7d/24f77d2ed1d3d08210df7cb007caf74af1a1d10a613760911d27d30151dfae86.png
    • Embed this notice
      Zergling_man (zergling_man@sacred.harpy.faith)'s status on Saturday, 10-May-2025 16:16:45 JST Zergling_man Zergling_man
      in reply to
      • shortstories
      @shortstories @djsumdog Browse the web with scripts disabled, then browse it with scripts enabled.
      Observe, with your own eyes, the performance difference.
      In conversation about 11 days ago permalink
      翠星石 likes this.
    • Embed this notice
      shortstories@merovingian.club's status on Saturday, 10-May-2025 16:16:46 JST shortstories shortstories
      in reply to
      • Zergling_man

      @Zergling_man @djsumdog

      If I had to guess it is not javascript that is the problem but javascript libraries plus trying to keep things updated that is the problem

      If you write code to do certain basic things that do not need constant updating and do not access a library then if it works it should continue to work

      Once someone puts library software in the code working code can malfunction when the library changes

      In conversation about 11 days ago permalink
    • Embed this notice
      djsumdog (djsumdog@djsumdog.com)'s status on Saturday, 10-May-2025 16:32:42 JST djsumdog djsumdog
      in reply to
      • Zergling_man
      • shortstories
      I have no idea what you're arguing. The problem I had was I couldn't see a static website because it requires Javascript, NOT for functionality, not even for DDoS protection, but to solve a proof of work because they don't want to be scraped by "AI" ... an open source documentation site not wanting to be scraped. Let that sink in.
      In conversation about 11 days ago permalink
    • Embed this notice
      shortstories@merovingian.club's status on Saturday, 10-May-2025 16:32:43 JST shortstories shortstories
      in reply to
      • Zergling_man

      @Zergling_man @djsumdog
      If someone does not know how to program something or is too lazy to do it themself they will use someone else's library

      These libraries can change at any time

      So what might have worked might stop working when the library is changed

      These libraries allow people who do not know what they are doing to look competant and slip in bad code that will malfunction later after they get paid to do their job.

      I would suggest that these libraries are an additional serious problem

      In conversation about 11 days ago permalink
    • Embed this notice
      shortstories@merovingian.club's status on Saturday, 10-May-2025 16:32:44 JST shortstories shortstories
      in reply to
      • Zergling_man

      @Zergling_man @djsumdog

      I would suggest there are two reasons for the difference

      1 Having more code to run slows down everything in exchange for doing whatever adfitional feature is provided by running the code

      2 The problem is not primarily from Javascript itself other than the mistake of whoever put the library feature. I would suggest the problem is with people who are bad at computer programming writing the code in Javascript using libraries because they do not know how to program

      In conversation about 11 days ago permalink
    • Embed this notice
      Zergling_man (zergling_man@sacred.harpy.faith)'s status on Saturday, 10-May-2025 16:32:44 JST Zergling_man Zergling_man
      in reply to
      • shortstories
      @shortstories @djsumdog It's just 1. It's all 1.
      If it were actually an "additional feature" it would make sense.
      99% of the time it is not. Like loading a form; as if there isn't a standard way to do that already.
      In conversation about 11 days ago permalink
    • Embed this notice
      翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Saturday, 10-May-2025 21:34:15 JST 翠星石 翠星石
      in reply to
      • tyil
      @tyil @djsumdog Please do not immorally attack people with proprietary software tyil - you know better.

      One way to solve that issue is to set bait with gzip bombs; https://idiallo.com/blog/zipbomb-protection ("Content-Encoding: deflate, gzip" is incorrect, should be; "Content-Encoding: gzip") - many bots will fetch such bombs and crash.

      Most scraper bots seem to use Apple useragents and just 403 .*AppleWebKit.* fixes that issue for cgit (or if you still want to allow isheep access to your website, maybe attacking apple used with more proprietary malware is what they deserve).
      In conversation about 11 days ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: cdn.idiallo.com
        I use Zip Bombs to Protect my Server
        from @dialloibu
        The majority of the traffic on the web is from bots. For the most part, these bots are used to discover new content. These are RSS Feed readers, search engines crawling your content, or nowadays AI bo
    • Embed this notice
      tyil (tyil@fedi.tyil.nl)'s status on Saturday, 10-May-2025 21:34:18 JST tyil tyil
      in reply to

      @djsumdog@djsumdog.com Depending on if they suffer the same issues as my cgit instance, the choice is to be down completely because LLM scrapers overload the instance constantly, or force JS so at least some people can use the site.

      I don't enjoy using Anubis, I think it is stupid to waste CPU cycles like this. I do use Anubis on services that are go down all day because I currently have no better solution to fight back against LLMs. I don't have the money for infinite resources, and I don't have the time to constantly log potential LLM bots and block them.

      In conversation about 11 days ago permalink
    • Embed this notice
      翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Saturday, 10-May-2025 21:36:00 JST 翠星石 翠星石
      in reply to
      @djsumdog Yes, the typical "open source" project doesn't hesitate to attack its users with proprietary malware.
      In conversation about 11 days ago permalink
    • Embed this notice
      Reasonable Man (r000t@ligma.pro)'s status on Sunday, 11-May-2025 01:25:42 JST Reasonable Man Reasonable Man
      in reply to

      @djsumdog
      I'm trying to solve this problem by making public endpoints that hit an offline Wikipedia dump.

      If my service is better than live Wikipedia in terms of speed or having to parse sht, then everyone wins.

      In conversation about 11 days ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.