GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 12:19:12 JST myrmepropagandist myrmepropagandist

    I haven't thought "I should try to build my *own* web spider, then maybe I could find things." since... Well, since 1998.

    :/

    In conversation about 5 months ago from sauropods.win permalink

    Attachments


    1. https://cdn.masto.host/sauropodswin/media_attachments/files/113/745/075/335/212/653/original/727d95338f65eaee.png
    • Rich Felker repeated this.
    • Embed this notice
      Jess👾 (jesstheunstill@infosec.exchange)'s status on Tuesday, 31-Dec-2024 12:19:09 JST Jess👾 Jess👾
      in reply to
      • Thomas Sturm

      Not a bad idea! My (vaguely) related is to fork a Fediverse app / make a browser plugin that caches and indexes only the Fediverse posts that I've browsed - whether on my timeline or on the explore page or whatever. Then I could search the content I've had access to, and I don't feel like I'd be violating anyone's privacy for caching and indexing the content I've already been allowed to view exclusively for my own personal use.

      Obviously, it'd be other problems if I started crawling and indexing content for public usage, but I think using a computer to augment my own fallible memory would be acceptable so I can find the posts I wanted to remember 2 weeks later.

      @futurebird @tsturm

      In conversation about 5 months ago permalink
    • Embed this notice
      myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 12:19:10 JST myrmepropagandist myrmepropagandist
      in reply to
      • Jess👾
      • Thomas Sturm

      @JessTheUnstill @tsturm

      Well I'm thinking of doing something a little smaller and more targeted like this:

      https://sauropods.win/@futurebird/113744151630008623

      Because making a proper full web spider is a massive project. And even my small idea could be too big.

      In conversation about 5 months ago permalink
      Rich Felker repeated this.
    • Embed this notice
      Thomas Sturm (tsturm@famichiki.jp)'s status on Tuesday, 31-Dec-2024 12:19:11 JST Thomas Sturm Thomas Sturm
      in reply to

      @futurebird I recently have been thinking of what it would take to run my own spider... for the first time in about 25 years. The search results I'm getting lately are so bad, that a DIY spider might actually improve the situation for me.

      In conversation about 5 months ago permalink
    • Embed this notice
      Jess👾 (jesstheunstill@infosec.exchange)'s status on Tuesday, 31-Dec-2024 12:19:11 JST Jess👾 Jess👾
      in reply to
      • Thomas Sturm

      The problem is only partly that Google has gotten so much worse. It's also that SEO, botspam, LLM spam, and affiliate link spam has gotten so good that it's functionally impossible to algorithmically filter them out of the results. So just running your own spider is unlikely to matter much.

      @tsturm @futurebird

      In conversation about 5 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Tuesday, 31-Dec-2024 12:21:32 JST Rich Felker Rich Felker
      in reply to
      • Jess👾
      • Thomas Sturm

      @JessTheUnstill @futurebird @tsturm This is what I wish instances would do, optionally, also warning you if a post you view this way was deleted that further publication, even off platform, except as evidence of abuse may result in moderation action against your account.

      In conversation about 5 months ago permalink
    • Embed this notice
      Dawn Ahukanna (dahukanna@mastodon.social)'s status on Tuesday, 31-Dec-2024 20:46:17 JST Dawn Ahukanna Dawn Ahukanna
      in reply to

      @futurebird

      To remove & externalise bookmark dependency from browsers, I’ve resorted to manually collecting & curating links as I find them, with personal notes+tags reminding me why they are of interest. They’re always 100% searchable & findable.

      Given the inconsiderate, effective DDOS behavior of AI scraper bots, adding to that melee with more robo-indexing may not produce a usable search index - https://mastodon.social/@dahukanna/113741237599333856

      In conversation about 5 months ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        Dawn Ahukanna (@dahukanna@mastodon.social)
        from Dawn Ahukanna
        @recursive@hachyderm.io how do they, the producers and indoctrinators of “Artificial Intelligence Large Language Models (AI-LLM)”, not “compute” that: content production rate ≠ content request rate? I’d like to have and host a website without it or the environment being “knackered” from constant demands.
    • Embed this notice
      myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 20:47:33 JST myrmepropagandist myrmepropagandist
      in reply to
      • Dawn Ahukanna

      @dahukanna

      Importantly this database would grow over time, it wouldn't be focused on "what's new" ... basically I have a high level of trust in the way people #onhere associate hash tags with links and I think that'd be a great way to find things.

      In fact I do it manually often enough, but it's time consuming. I just want all of the links sometimes.

      In conversation about 5 months ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        things.in - このウェブサイトは販売用です! - things リソースおよび情報
        このウェブサイトは販売用です! things.in は、あなたがお探しの情報の全ての最新かつ最適なソースです。一般トピックからここから検索できる内容は、things.inが全てとなります。あなたがお探しの内容が見つかることを願っています!
    • Embed this notice
      myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 20:47:34 JST myrmepropagandist myrmepropagandist
      in reply to
      • Dawn Ahukanna

      @dahukanna

      I think so, yes. Basically I want a database of every single link that's been posted to *my* feed. It would also contain any hash tags used with the link, the post ID so I can go back and see the context.

      Next I'd strip out all of the "big sites" and focus more on the obscure.

      Then if I'm curious about, say # fossils I would get links mentioned in that context.

      And if # fossils is used with the tag # crinoids often I could move laterally and find more links.

      In conversation about 5 months ago permalink
      Rich Felker repeated this.
    • Embed this notice
      Dawn Ahukanna (dahukanna@mastodon.social)'s status on Tuesday, 31-Dec-2024 20:47:35 JST Dawn Ahukanna Dawn Ahukanna
      in reply to

      @futurebird

      … extract links from within the post and links to the source post?

      In conversation about 5 months ago permalink
    • Embed this notice
      myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 20:47:36 JST myrmepropagandist myrmepropagandist
      in reply to
      • Dawn Ahukanna

      @dahukanna

      I'm thinking of something much more modest:

      https://sauropods.win/@futurebird/113744151630008623

      In conversation about 5 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.