Email or username:

Password:

Forgot your password?
Top-level
Andy Baio

Judging from all the “that’s Yahoo” replies, it seems like many people have forgotten the difference between a directory and a search engine?

A directory is an organized list of websites (e.g. Yahoo, DMOZ, ooh.directory), while a search engine usually searches the text of web pages (e.g. Google).

I want a *search engine* that crawls and searches pages from a curated list of websites (which could be a directory). Marginalia is the closest I’ve seen. search.marginalia.nu/

18 comments
powersoffour

@andybaio This makes me really miss upcoming dot org

lief

@andybaio Do you want to run it locally? I seem to remember one. Could jog my memory if I try hard enough.

lief

@andybaio Ah! This is it! YaCy: yacy.net

You'll need to curate the list yourself, but the softer for crawling and searching is packed easily enough to try out. I've tried it locally before. Bit of a faff to maintain the site curation. I seem to remember taking a whole evening and maybe a sleep to get my head around the UI.

lief

@andybaio I just tried the Docker version. It defaults to a peer-to-peer index i.e. it's currently pulling in others' indexes. I don't know the validity of that index but looking at the search results page, there's a Peer-to-Peer/Privacy toggle for the results, which lets you keep your searches to an index of your own. I've totally forgotten how to set up a private index though.

Devine Lu Linvega

@andybaio Here's Lieu: lieu.cblgh.org
It crawls the websites from our webring: webring.xxiivv.com

I manually verified every single application, Lieu only crawls these web pages.

Roy Tang 🇵🇭

@neauoire hello! I was poking around on this search engine and I found that there are some webring member sites that are not indexed (mine included). May I ask what is the criteria for not being indexed?

Devine Lu Linvega

@roytang Just that the website be online I think.

Your website is 502 at the moment.

Devine Lu Linvega

@roytang if it's 502 during a crawl, it won't be indexed.

Alexander Cobleigh

@neauoire @andybaio ye i built precisely that in 2021 :)

code's here if u want to tinker
github.com/cblgh/lieu

MDonaldson →

@andybaio

It's software, not a site, but DEVONagent can work like this. It's intended to be a research tool, and you can do extensive web searches and tell it to ignore sites or entire domains or whatever you want. I use it instead of any of the usual search engines when looking up, say, a movie title that I want to write about on the blog. I only get 'worthwhile' results now that I've trained it (which didn't take much effort or time). → devontechnologies.com/apps/dev

There's a lengthy free trial.

@andybaio

It's software, not a site, but DEVONagent can work like this. It's intended to be a research tool, and you can do extensive web searches and tell it to ignore sites or entire domains or whatever you want. I use it instead of any of the usual search engines when looking up, say, a movie title that I want to write about on the blog. I only get 'worthwhile' results now that I've trained it (which didn't take much effort or time). → devontechnologies.com/apps/dev

Greg

@andybaio the sentence works perfectly without "whitelist" and if you really want to be explicit "approved" or "allowed" are easy to make work without the racist origins.

John.e.lamb

@andybaio SEO products and engineers and users should be discouraged from doing so in the most severe manner that is not considered violence or removing rights without due process.

Mark Shane Hayden

@andybaio a *long* time ago I set up a mirror of DMOZ and started (slowly) crawling the content of the websites within it with the intent to build a full-text search facility but ran out of time and resources so I would find such a thing interesting

Tim Trautmann 📷

@andybaio a browser extension that lets one upvote or maybe even downvote a URL for inclusion in search indexing. 🤔

Go Up