A quick question to my fellow #PHP developers here. Does anybody have a suggestion for Spider / crawler library in PHP?
Conversation
Notices
-
Embed this notice
Sander van Kasteel (suitedupdev@mastodon.online)'s status on Tuesday, 11-Jun-2024 04:30:59 JST Sander van Kasteel - BeAware :fediverse: repeated this.
-
Embed this notice
BeAware :fediverse: (beaware@social.beaware.live)'s status on Tuesday, 11-Jun-2024 04:30:58 JST BeAware :fediverse: @SuitedUpDev as long as you don't crawl Fedi domains.
People here don't like that kinda thing.🤷♂️
-
Embed this notice
Sander van Kasteel (suitedupdev@mastodon.online)'s status on Tuesday, 11-Jun-2024 04:30:59 JST Sander van Kasteel I wanna crawl some domains under a certain TLD and keep track how much "outgoing" links are being referenced on the domains.
-
Embed this notice
BeAware :fediverse: (beaware@social.beaware.live)'s status on Tuesday, 11-Jun-2024 04:49:49 JST BeAware :fediverse: @SuitedUpDev 🤷just figured I'd warn you so you don't get Fedi cancelled.
However, AFAIK, North Korea doesn't allow many, if any, sites to reach outside the country. Good luck!
-
Embed this notice
Sander van Kasteel (suitedupdev@mastodon.online)'s status on Tuesday, 11-Jun-2024 04:49:50 JST Sander van Kasteel @BeAware I'm actually planning to crawl North Korean websites.
-
Embed this notice
Sander van Kasteel (suitedupdev@mastodon.online)'s status on Tuesday, 11-Jun-2024 04:55:00 JST Sander van Kasteel @BeAware I have a short list of domains that are reachable from the "regular" internet and I want to do some research on the data I can gather from their websites.
-
Embed this notice
BeAware :fediverse: (beaware@social.beaware.live)'s status on Tuesday, 11-Jun-2024 04:55:58 JST BeAware :fediverse: @SuitedUpDev I hope you get some help! I have very talented folks in my circle, so hopefully someone will see this and reach out to give you some guidance.
-
Embed this notice
Sander van Kasteel (suitedupdev@mastodon.online)'s status on Tuesday, 11-Jun-2024 05:17:56 JST Sander van Kasteel @BeAware Thanks a lot 🙏 I already have some experience with web scraping (from work) but not on this deep of a level.