GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:01:32 JST Paul Cantrell Paul Cantrell

    In honor of the dawn of the age of web crawling being for LLMs instead of search engines, I have deleted the robots.txt for https://wookieepedia.org/.

    Let the floodgates open. Crawl away, my friends, crawl away.

    In conversation Saturday, 27-Jan-2024 10:01:32 JST from hachyderm.io permalink

    Attachments


    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:06:38 JST Paul Cantrell Paul Cantrell
      in reply to

      I feel like an important thing to mention to put the previous post in context is that the links work.

      ALL the links work.

      In conversation Saturday, 27-Jan-2024 10:06:38 JST permalink
    • Embed this notice
      Jeff Miller (orange hatband) (jmeowmeow@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:10:24 JST Jeff Miller (orange hatband) Jeff Miller (orange hatband)
      in reply to

      @inthehands wait a minute there... do I hear echoes of A Plan For Spam?

      In conversation Saturday, 27-Jan-2024 10:10:24 JST permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:11:47 JST Paul Cantrell Paul Cantrell
      in reply to
      • Jeff Miller (orange hatband)

      @jmeowmeow To that, I can only say mvuoooo ruuoau raoaaauvuaua ruuauawoaaou rroaaaavoouoa voiahouoaoa noa, wuoaa mouuoaruaourv ruaouu ruiu

      In conversation Saturday, 27-Jan-2024 10:11:47 JST permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:13:13 JST Paul Cantrell Paul Cantrell
      in reply to
      • Adam ♿

      @voltagex HRRAOOOOOWAOOO!

      In conversation Saturday, 27-Jan-2024 10:13:13 JST permalink
    • Embed this notice
      Adam ♿ (voltagex@aus.social)'s status on Saturday, 27-Jan-2024 10:13:14 JST Adam ♿ Adam ♿
      in reply to

      @inthehands OUTSTANDING.

      In conversation Saturday, 27-Jan-2024 10:13:14 JST permalink
    • Embed this notice
      Jeff Miller (orange hatband) (jmeowmeow@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:13:15 JST Jeff Miller (orange hatband) Jeff Miller (orange hatband)
      in reply to

      @inthehands (loud laughter). Let the Wookiee win!

      In conversation Saturday, 27-Jan-2024 10:13:15 JST permalink
    • Embed this notice
      Adam ♿ (voltagex@aus.social)'s status on Saturday, 27-Jan-2024 10:14:03 JST Adam ♿ Adam ♿
      in reply to

      @inthehands you might also want to match on /*

      https://wookieepedia.org/test3

      In conversation Saturday, 27-Jan-2024 10:14:03 JST permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:14:03 JST Paul Cantrell Paul Cantrell
      in reply to
      • Adam ♿

      @voltagex No way, gotta keep the site organized

      In conversation Saturday, 27-Jan-2024 10:14:03 JST permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:38:45 JST Paul Cantrell Paul Cantrell
      in reply to

      There’s not a way to explicitly submit one’s site to OpenAI for crawling, right?

      Like…they just do secretive mass crawls on their own schedule, I assume?

      In conversation Saturday, 27-Jan-2024 10:38:45 JST permalink
    • Embed this notice
      Cory Carson (corycarson@gnu.gl)'s status on Saturday, 27-Jan-2024 10:40:46 JST Cory Carson Cory Carson
      in reply to

      @inthehands it…. It still works….

      https://wookieepedia.org/w/Rrruauou%20rroaanaaouo

      In conversation Saturday, 27-Jan-2024 10:40:46 JST permalink

      Attachments


      1. https://assets.gnu.gl/media_attachments/files/111/825/355/376/355/738/original/1c1ea1cdd1e58399.png
      2. No result found on File_thumbnail lookup.
        Rrruauou Rroaanaaouo - Wookieepedia, the hirsute encyclopedia
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:40:48 JST Paul Cantrell Paul Cantrell
      in reply to
      • Cory Carson

      @corycarson Amazing

      In conversation Saturday, 27-Jan-2024 10:40:48 JST permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:48:43 JST Paul Cantrell Paul Cantrell
      in reply to
      • Cory Carson

      @corycarson It took some cajoling, but I got GPT to start speaking Wookiee:

      In conversation Saturday, 27-Jan-2024 10:48:43 JST permalink

      Attachments


      1. https://media.hachyderm.io/media_attachments/files/111/825/395/439/417/656/original/0a17adc171c97945.png
    • Embed this notice
      Cory Carson (corycarson@gnu.gl)'s status on Saturday, 27-Jan-2024 10:48:45 JST Cory Carson Cory Carson
      in reply to

      @inthehands MUST GO DEEPER

      In conversation Saturday, 27-Jan-2024 10:48:45 JST permalink

      Attachments


      1. https://assets.gnu.gl/media_attachments/files/111/825/368/063/049/122/original/45dd5553738ddaef.png
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Saturday, 27-Jan-2024 10:50:12 JST Paul Cantrell Paul Cantrell
      in reply to
      • Cory Carson

      @corycarson The question is, that is so shockingly similar to the style of “Wookiee” that Wookieepedia uses, it’s hard to believe that Wookieepedia wasn’t the training source. But until a few minutes ago, the site’s robots.txt disallowed crawlers for everything except the homepage….

      In conversation Saturday, 27-Jan-2024 10:50:12 JST permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Monday, 29-Jan-2024 14:20:28 JST Paul Cantrell Paul Cantrell
      in reply to
      • talby

      @talby One the home page, IIRC, but yeah

      In conversation Monday, 29-Jan-2024 14:20:28 JST permalink
    • Embed this notice
      talby (talby@techhub.social)'s status on Monday, 29-Jan-2024 14:20:29 JST talby talby
      in reply to

      @inthehands I think LLMs mostly use https://commoncrawl.org/ rather than crawling the web themselves. The Internet Archive's Wayback Machine uses Common Crawl as a source and has Wookieepedia so I think it's likely in there already.

      In conversation Monday, 29-Jan-2024 14:20:29 JST permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        Common Crawl - Open Repository of Web Crawl Data
        We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.