Email or username:

Password:

Forgot your password?
Top-level
nixCraft 🐧

OpenAI, Mircosoft, and Google together kill the open web. Thousands of independent blogs and forums are now nowhere in search engines or pushed back to page two to support their AI and partnerships with Reddit, StackOverflow, and more. Many humans contributed to these sites hoping to build a knowledge base for humanity, but now greedy people like Sama and OpenAI are taking over everything.

58 comments
Pajo_16

@nixCraft Is there any way to find these sites? You are correct in that they disappeared but there must be a way to find them. Hopefully 🤞

Paul Sutton

@Pajo_16 @nixCraft

If people find them, perhaps post links here.

Beam me out!

@fury999io @Pajo_16 Thanks! I wanted to recommend this, but realized I didn't bookmark it and forgot the url.

DELETED

@fury999io @Pajo_16 aren't there some general open web directories somewhere too?

sigi714

@Pajo_16 @nixCraft On page two. On page 5 you can find bad SEO optimizing agencies.

marcus 😷🧂

@Pajo_16 @nixCraft I've heard people saying that they're adding "before:2023" to their searches to filter out a lot of the ai garbage. Will only help as long as the search topic is somewhat timeless, obviously.

Pajo_16

@marcusdeh @nixCraft
Folks, thanks for all the suggestions. It's appreciated.

P Stewart

@marcusdeh @Pajo_16 @nixCraft Even that seems shaky these days - I'll try putting date restrictions on searches and regularly get stuff that the results page says is a week old, but actually dates from 2011. (Or vice versa.)

I'm not sure if they're just not respecting search syntax, something's breaking on the search engines' side of things, or if people are figuring out ways to make pages appear to be a different age than they actually are.

Christian Krebel

@Pajo_16 @nixCraft I can recommend the search engine #Kagi. They have a different index, one can put a weight on specific domains and they have a project called small web to randomly find those gems.

P4

@ChristianKrebel @Pajo_16 @nixCraft isn't Kagi in on the AI bullshit? I wouldn't trust them not to screw everyone over once they get popular enough.

Christian Krebel

@p4 @Pajo_16 @nixCraft well they have integrated AI, but more on action. Most of the time you will have to trigger an AI feature yourself. Also, they have an assistant where you can choose the models you want to use (APIs are more anonymous) and the answers will have sources from their index which is the best of both worlds IMHO.

Jeremy Yap

@Pajo_16 @nixCraft

If you're looking for blogs and other personal sites I recommend bookmarking these search engines:
- search.marginalia.nu/
- ichi.do/
- clew.se/
- searchmysite.net/
- wiby.me/

Would also recommend checking out this very excellent piece as well for alternative search options: seirdy.one/posts/2021/03/10/se

Third spruce tree on the left

@jeruyyap @Pajo_16 @nixCraft Support the micro-brew search engines that are trying to make Search Engines Not Suck Again!

(new slogan #SENSA)

Amin Hollon 🇺🇸🇲🇾🇮🇳🇦🇫

@jeruyyap @Pajo_16 @nixCraft

Happy to be mentioned! ;)

Clew is very beta at the moment but I've just started my summer break so I should have some good time for dedicated work on it. :)

Paul McBride

@jeruyyap @Pajo_16 @nixCraft Kagi has a great “small web” search filter too

Eric the Cerise

@Pajo_16

Learn how to search the 'Net w/o the mainstream engines.

Some of the best known alternatives are either getting bad, too (DuckDuckGo is pushing its own AI "solution"), others are "meta-search engines", which run your query thru Google, Bing, etc for you ... which is great for privacy but doesn't do much to improve the search quality.

Look at the various #SearXNG instances (searx.space/ ) ... those are meta-search services, but they hit a *lot* of primary engines.

Look for new engines, new projects, alternative solutions.

The #IndieWeb used to have a search engine indieweb.org/search ... looks like it might be dead now? But check it out anyway.

Check out wiby.me , an alternate search engine specifically for those old, alt, and independent web sites that the main engines are burying.

Search for search.

stract.com/ is a beta project, one-person show running on a server in the guy's basement, but it has promise.

clew.se is a *very* beta service, made by a kid who just graduated college last week (congrats @amin ! ), he is also trying to set up an algorithm that prefers alternate, home-made "real-people" websites.

There are many alternatives out there ... most are not "ready for prime time" and it's gonna be a learning curve to adjust.

Don't just use 'em. Promote 'em. Help them if you can. Many need non-tech help with documentation and translations and etc.

@nixCraft

@Pajo_16

Learn how to search the 'Net w/o the mainstream engines.

Some of the best known alternatives are either getting bad, too (DuckDuckGo is pushing its own AI "solution"), others are "meta-search engines", which run your query thru Google, Bing, etc for you ... which is great for privacy but doesn't do much to improve the search quality.

Amin Hollon 🇺🇸🇲🇾🇮🇳🇦🇫

@ErictheCerise @Pajo_16 @nixCraft

I, uh, didn't actually graduate college. I just finished my sophomore year. XD

Amin Hollon 🇺🇸🇲🇾🇮🇳🇦🇫

@ErictheCerise @Pajo_16 @nixCraft

Hahaha, no worries, this is a continuation of a long, long trend of people overestimating my age online; you're actually closer than most. The usual guess is that I'm in my 30s. XD

Working on a Communications degree with a Professional Writing minor. :)

Sci-Fi Girl

@ErictheCerise

Maybe add search.marginalia.nu/ to the list?

Their focus is on finding small, old and obscure websites. 😎

And I'll have to look at the ones in your list that are new to me!

@Pajo_16 @amin @nixCraft

Amin Hollon 🇺🇸🇲🇾🇮🇳🇦🇫

@5ciFiGirl @ErictheCerise @Pajo_16 @nixCraft

Yeah, Marginalia 1000% deserves the spot more than me. I took a ton of inspiration from their work and they've actually been around for significantly longer than this recent wave of launches. :)

Sci-Fi Girl

@amin

Cool! Having more options is definitely better!! 😎

@ErictheCerise @Pajo_16 @nixCraft

Amin Hollon 🇺🇸🇲🇾🇮🇳🇦🇫

@5ciFiGirl @ErictheCerise @Pajo_16 @nixCraft

Yep!

It's very similar to Clew in goals (promoting personal, non-commercial websites) and even uses the same ranking function at heart (BM25F) but I did make a number of changes in methodology, for example:

- Most of my webpage discovery is centered around RSS feeds (which is both a great mature technology and means sites with RSS feeds [often personal sites] are gonna be better-treated by the crawler)
- Marginalia still indexes big sites like Wikipedia and StackExchange while I specifically blacklist them from the crawler (helps emphasize small sites and saves significant resources for the crawler; I may do some kind of integration in the future but for now I have bangs if you wanna search them)
- Marginalia does warn about javascript, ads, etc., but I don't think it affects pages' rankings, while I penalize ads and trackers
- I'm really proud of my brand new page weight indicators, which I haven't seen anything like in other search engines before. :)

All that said Clew is definitely still very beta. XD

@5ciFiGirl @ErictheCerise @Pajo_16 @nixCraft

Yep!

It's very similar to Clew in goals (promoting personal, non-commercial websites) and even uses the same ranking function at heart (BM25F) but I did make a number of changes in methodology, for example:

- Most of my webpage discovery is centered around RSS feeds (which is both a great mature technology and means sites with RSS feeds [often personal sites] are gonna be better-treated by the crawler)
- Marginalia still indexes big sites like Wikipedia...

Andrew Zonenberg

@amin @5ciFiGirl @ErictheCerise @Pajo_16 @nixCraft You down rank pages with ads and trackers? If only this was more common...

Amin Hollon 🇺🇸🇲🇾🇮🇳🇦🇫

@azonenberg @5ciFiGirl @ErictheCerise @Pajo_16 @nixCraft

Right??? XD

But all the mainstream search engines are mostly by companies that sell advertising and tracking services so it's not likely in them.

I've found it really effective at fighting SEO, though; if people are trying to hack the system to get you on their site, they probably have ads or tracking. ;)

Eric the Cerise

@5ciFiGirl

No 'maybe' about it, that's an awesome one, and new for me. Check out his 'About' page ( marginalia.nu/marginalia-searc ) ... I want to have his babies.

@Pajo_16 @amin @nixCraft

Amin Hollon 🇺🇸🇲🇾🇮🇳🇦🇫

@5ciFiGirl @ErictheCerise @Pajo_16 @nixCraft

A ton of people link to his "The Small Website Discoverability Crisis" when justifying their own search engines (which is great) but I also find it kinda hilarious that they often don't seem to realize he has his own search engine too. XD

Amin Hollon 🇺🇸🇲🇾🇮🇳🇦🇫

@Pajo_16 @nixCraft

Blogrolls are a great option. :)

Mine's here if you want a good starting place: https://benjaminhollon.com/blogroll/

Then many of those sites have their own blogrolls; and so on and so on and so on.

Elias

@Pajo_16

> Is there any way to find these sites?

One alternative, independent search engine is #Mojeek that has its own index, using that you may be able to find things that Google/Microsoft decided to remove from their search results: mojeek.com/
@nixCraft

funbaker #AssangeIsNotGuilty

@nixCraft so... renting a garage and building a new search engine from second hand pc's?

dilletante

@funbaker @nixCraft

Doesn't need a search engine. Just afew web pages of sites organised by category.

funbaker #AssangeIsNotGuilty

@dilettante I remember a time where search engines where good and provided you with answers to the most obscure problems which you'd never expect on ShitOverflow. I want that back. @nixCraft

Natasha Nox 🇺🇦🇵🇸

@funbaker @dilettante @nixCraft At this point we should just try anew, throw away the whole browser-based web shit and fork Gopher.

funbaker #AssangeIsNotGuilty

@Natanox HTTP is not that bad, we just need to throw away Javascript.

But ofc every idea is welcome.
@dilettante @nixCraft

Dio9sys

@Natanox @funbaker @dilettante @nixCraft Is this the right time for me to be That Person and mention how cool and fun Gemini protocol is?

Andreas, DJ3EI, he/him

I know only very little about Gemini.

But it wasn't my impression thus far that Gemini solves the problem to find stuff on certain subject matters one happens to be interested in. I.e., the search engine problem.

Is that wrong? Do I need to learn something?

@Dio9sys @Natanox @funbaker @dilettante @nixCraft

Dio9sys

@dj3ei There are gemini search engines out there now, which definitely helps (one that I particularly like is the Kennedy search engine on gemini://gemi.dev) and gemini constellations which are like web rings, and some capsules that have recommended capsules based on subject a la web directory.

It's definitely not on parity with all the things on the web, but it's very fun imo and I love how it's small, light and aggressively indie

kikebenlloch

@Natanox @funbaker @dilettante @nixCraft It's funny I was too thinking about gopher the other day 😆

not matt :tblverified:

@funbaker make sure it’s your parents garage and that the rent is actually just doing chores and getting good grades.

funbaker #AssangeIsNotGuilty

@borlax Oh I'm doing a lot of chores, but I'm way behind living in my parent's house.

Amiya Behera #FBPPR

@nixCraft Just we need to quit the search engine.

Presearch, Brave many more decentralized ones to come in few years.

iambrainstorming.github.io/cha

Amiya Behera #FBPPR

@nixCraft Its about 6 months now, that I have not touched google.

Kasion

@nixCraft
Take a look at #Facebook for 10 minutes and you'll see what a true #AI web looks like. It's a barren wasteland that no one interacts with. Let them have their fun with these sites knowning its just going to be bots talking to AI. #google search is whats dead not the #openweb.

Pep

@nixCraft The smaller, human-run web needs to come back via Mastodon, old-school forums and others. Let the big companies have their AI-powered Dead Internet, primarily away from the rest of us.

Blake

@nixCraft this is honestly why I’ve lost interest in #tech, #computers and the #internet as a whole. It’s sad, I used to find it all so intriguing to learn and I used to even make videos about it. I really just don’t care anymore, it feels impossible to find anything organic these days.

FediThing 🏳️‍🌈

@nixCraft

Tech corporations are strip-mining the commons in every possible way. It's despicable 😡

What will they do when they've finished this process? What will be left? It's unsustainable in the long term.

en.wikipedia.org/wiki/Surface_

AmbularD

@nixCraft The good news is, because of this, human-curated directories, forums and even webrings have started to make a comeback. The open web existed before Google, and it can continue even if all the search engines turn to crap.

Go Up