Conversation

Notices

Embed this notice
myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 12:19:12 JST myrmepropagandist

I haven't thought "I should try to build my *own* web spider, then maybe I could find things." since... Well, since 1998.
:/
In conversation about 5 months ago from sauropods.win permalink
Attachments
1. The Google landing page as it looked in 1998. Superficially one might think that not much has changed. This one is so old it still has an "about" page to explain what they are doing. And links to Stanford University.
  https://cdn.masto.host/sauropodswin/media_attachments/files/113/745/075/335/212/653/original/727d95338f65eaee.png
- Rich Felker repeated this.
- Embed this notice
  Jess👾 (jesstheunstill@infosec.exchange)'s status on Tuesday, 31-Dec-2024 12:19:09 JST Jess👾
  in reply to
  - Thomas Sturm
  Not a bad idea! My (vaguely) related is to fork a Fediverse app / make a browser plugin that caches and indexes only the Fediverse posts that I've browsed - whether on my timeline or on the explore page or whatever. Then I could search the content I've had access to, and I don't feel like I'd be violating anyone's privacy for caching and indexing the content I've already been allowed to view exclusively for my own personal use.
  Obviously, it'd be other problems if I started crawling and indexing content for public usage, but I think using a computer to augment my own fallible memory would be acceptable so I can find the posts I wanted to remember 2 weeks later.
  @futurebird @tsturm
  
  In conversation about 5 months ago permalink
- Embed this notice
  myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 12:19:10 JST myrmepropagandist
  in reply to
  - Jess👾
  - Thomas Sturm
  @JessTheUnstill @tsturm
  Well I'm thinking of doing something a little smaller and more targeted like this:
  https://sauropods.win/@futurebird/113744151630008623
  Because making a proper full web spider is a massive project. And even my small idea could be too big.
  
  In conversation about 5 months ago permalink
  
  Rich Felker repeated this.
- Embed this notice
  Thomas Sturm (tsturm@famichiki.jp)'s status on Tuesday, 31-Dec-2024 12:19:11 JST Thomas Sturm
  in reply to
  
  @futurebird I recently have been thinking of what it would take to run my own spider... for the first time in about 25 years. The search results I'm getting lately are so bad, that a DIY spider might actually improve the situation for me.
  
  In conversation about 5 months ago permalink
- Embed this notice
  Jess👾 (jesstheunstill@infosec.exchange)'s status on Tuesday, 31-Dec-2024 12:19:11 JST Jess👾
  in reply to
  - Thomas Sturm
  The problem is only partly that Google has gotten so much worse. It's also that SEO, botspam, LLM spam, and affiliate link spam has gotten so good that it's functionally impossible to algorithmically filter them out of the results. So just running your own spider is unlikely to matter much.
  @tsturm @futurebird
  
  In conversation about 5 months ago permalink
- Embed this notice
  Rich Felker (dalias@hachyderm.io)'s status on Tuesday, 31-Dec-2024 12:21:32 JST Rich Felker
  in reply to
  - Jess👾
  - Thomas Sturm
  @JessTheUnstill @futurebird @tsturm This is what I wish instances would do, optionally, also warning you if a post you view this way was deleted that further publication, even off platform, except as evidence of abuse may result in moderation action against your account.
  
  In conversation about 5 months ago permalink
- Embed this notice
  Dawn Ahukanna (dahukanna@mastodon.social)'s status on Tuesday, 31-Dec-2024 20:46:17 JST Dawn Ahukanna
  in reply to
  
  @futurebird
  To remove & externalise bookmark dependency from browsers, I’ve resorted to manually collecting & curating links as I find them, with personal notes+tags reminding me why they are of interest. They’re always 100% searchable & findable.
  Given the inconsiderate, effective DDOS behavior of AI scraper bots, adding to that melee with more robo-indexing may not produce a usable search index - https://mastodon.social/@dahukanna/113741237599333856
  In conversation about 5 months ago permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    Dawn Ahukanna (@dahukanna@mastodon.social)
    
    from Dawn Ahukanna
    
    @recursive@hachyderm.io how do they, the producers and indoctrinators of “Artificial Intelligence Large Language Models (AI-LLM)”, not “compute” that: content production rate ≠ content request rate? I’d like to have and host a website without it or the environment being “knackered” from constant demands.
- Embed this notice
  myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 20:47:33 JST myrmepropagandist
  in reply to
  - Dawn Ahukanna
  @dahukanna
  Importantly this database would grow over time, it wouldn't be focused on "what's new" ... basically I have a high level of trust in the way people #onhere associate hash tags with links and I think that'd be a great way to find things.
  In fact I do it manually often enough, but it's time consuming. I just want all of the links sometimes.
  In conversation about 5 months ago permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    things.in - このウェブサイトは販売用です！ - things リソースおよび情報
    
    このウェブサイトは販売用です！ things.in は、あなたがお探しの情報の全ての最新かつ最適なソースです。一般トピックからここから検索できる内容は、things.inが全てとなります。あなたがお探しの内容が見つかることを願っています！
- Embed this notice
  myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 20:47:34 JST myrmepropagandist
  in reply to
  - Dawn Ahukanna
  @dahukanna
  I think so, yes. Basically I want a database of every single link that's been posted to *my* feed. It would also contain any hash tags used with the link, the post ID so I can go back and see the context.
  Next I'd strip out all of the "big sites" and focus more on the obscure.
  Then if I'm curious about, say # fossils I would get links mentioned in that context.
  And if # fossils is used with the tag # crinoids often I could move laterally and find more links.
  
  In conversation about 5 months ago permalink
  
  Rich Felker repeated this.
- Embed this notice
  Dawn Ahukanna (dahukanna@mastodon.social)'s status on Tuesday, 31-Dec-2024 20:47:35 JST Dawn Ahukanna
  in reply to
  
  @futurebird
  … extract links from within the post and links to the source post?
  
  In conversation about 5 months ago permalink
- Embed this notice
  myrmepropagandist (futurebird@sauropods.win)'s status on Tuesday, 31-Dec-2024 20:47:36 JST myrmepropagandist
  in reply to
  - Dawn Ahukanna
  @dahukanna
  I'm thinking of something much more modest:
  https://sauropods.win/@futurebird/113744151630008623
  
  In conversation about 5 months ago permalink

Public

Conversation

Notices

Feeds