@5ciFiGirl @ErictheCerise @Pajo_16 @nixCraft
Yep!
It's very similar to Clew in goals (promoting personal, non-commercial websites) and even uses the same ranking function at heart (BM25F) but I did make a number of changes in methodology, for example:
- Most of my webpage discovery is centered around RSS feeds (which is both a great mature technology and means sites with RSS feeds [often personal sites] are gonna be better-treated by the crawler)
- Marginalia still indexes big sites like Wikipedia and StackExchange while I specifically blacklist them from the crawler (helps emphasize small sites and saves significant resources for the crawler; I may do some kind of integration in the future but for now I have bangs if you wanna search them)
- Marginalia does warn about javascript, ads, etc., but I don't think it affects pages' rankings, while I penalize ads and trackers
- I'm really proud of my brand new page weight indicators, which I haven't seen anything like in other search engines before. :)
All that said Clew is definitely still very beta. XD
@amin
Fascinating! 😎
@ErictheCerise @Pajo_16 @nixCraft