gnu_slash_bait.png

@0 11/10 bait, made me reply.

As always, the proprietary software lover is incorrect.

There are many free software web scrapers and search engines that can act as replacement to proprietary spiders or search engines.

GNU wget (https://www.gnu.org/software/wget/) can be used as a spider by utilizing its recursive feature and options and parsing the output.
There are also many other web scrapers, with minimal searching I've found; `git clone https://github.com/scrapy/scrapy`, https://www.crummy.com/software/BeautifulSoup/ (https://code.launchpad.net/beautifulsoup) and many more.

When it comes to search engines, there are many free libraries that can be used as such - although generally one builds up a database of data in a free SQL server (sqlite, mySQL, postgreSQL and many more), writes some custom SQL that searches that data and then exports that to a search interface.

If one doesn't want to bother with a SQL server and only wants to make a small database with only key/data pairs, GNU dbm can be used; https://www.gnu.org.ua/software/gdbm/

As always, there's some GNU software to do what you need.

The FSF articles are indeed searchable with a free search engine; https://www.fsf.org/@@search

There are also many free metasearch engines like searxng (`git clone `https://github.com/searxng/searxng` - live instance https://searx.bndkt.io/` and 4get (https://git.lolcat.ca/lolcat/4get - live instance https://4get.ca/ ), which can be used for web searching without running any JavaScript at all and sometimes additional features are available via free JavaScript.

The main reason why there isn't a big free and gratis general search engine available is due to the expense of running a web spider 24/7 on the current size of the internet (lots of different IPv4 and IPv6 addresses are required, due to how popular it is to block any spider that isn't google, even if the spider is well behaved and respects robots.txt - a /48 for IPv6 is cheap (ISP's don't tend to scam too much for something that was assigned to them gratis) but anything larger than a /31 for IPv4 is very expensive) and the bandwidth required for searches (general use is fine for even many thousands of users, but for any good search engine, there is always mouthbreathers that go and set up bots that scrap them to hell instead of such scraping the damn targets directly (due to incompetence and how bots in botnets don't tend to get detected easily if all they do is connect to one IP making small HTTP POST requests)).

Public

Notices where this attachment appears