GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Embed Notice

HTML Code

Corresponding Notice

  1. Embed this notice
    Phantasm (phnt@fluffytail.org)'s status on Wednesday, 05-Mar-2025 01:11:10 JSTPhantasmPhantasm
    in reply to

    @p @ins0mniak

    Are they one of the ones that tries the "/ai.txt" or something or do they just fucking scrape?

    Nope, they ask for robots.txt and then immediately ignore it.

    18.119.253.53 - - [23/Feb/2025:02:08:20 +0000] "GET /robots.txt HTTP/2.0" 200 1833 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)"

    I ended up just killing off their IPs, but because I also had to wipe the logs (media.fse ran out of space on /var) I can't check if they did.

    With Claude it's at least easy. Return 403 to the UA and you are done. Which btw still does not stop their attempts at scraping. They will continue to hit webserver even when they obviously aren't let through. From there a log monitor will do the job.

    With the Chink scrapers, it's a bit harder than automated log monitoring. They are clever in a way, where they will not send you more than approx. 3 requests from one IP, meaning that the typical monitoring tools like fail2ban or something custom won't work as all of the ones I know of don't do subnet/ASN detection, or it will be very trigger-happy.

    Thankfully they are retarded in other ways which make them stick out like a sore thumb in the logs. Currently I just look at the logs every few days unless they trigger alerts and throw the whole announced prefix into the trash. So far that has worked out great.

    In conversationabout 3 months ago from fluffytail.orgpermalink
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.