How can we add sites to Clew to index?
15 comments
Since Iโve only just gone public, currently the process is โsend me a linkโ. ;) For best results the site should have a RSS/Atom/JSON feed since thatโs how new posts/pages are primarily discovered. In my plans for this Beta phase is to develop a better site submission process. Not currently, no. I'm undecided on whether they give me much more value than crawling an RSS/Atom/JSON feedโฆ I'm also wary about using any information that's not user-facing; probably not a big concern with a sitemap, though. That said it would be really easy to implement into the crawler with the way I've got things coded currently (I'd treat it like another web feed, with some custom parsing code to extract the links). So it probably will be implemented at some point. @CaptainJanegay @selea @joacim I'll likely start taking them into account, then! Good feedback, thanks. Added it to the issue tracker: https://codeberg.org/Clew/Clew/issues/5 Here my blog (mostly political and tech stuffs) https://encrypted.tesio.it/atom.xml Here the blog of an Italian organization that fought #BigTech in Italy with a certain success before the new fake #PrivacyShield was adopted: https://monitora-pa.it/atom.xml Both are manually crafted indipendent website (including the feeds), with no tracking or ads. Perfect, I've added both feeds! Since I'm currently running the crawler locally for easier development, it'll start showing up in results when I next upload an update to the index. Since so many people have sent me links to their sites or sites they're interested in I'll probably be doing that later "today". (It's 5am here and I still haven't been able to fall asleep after repeated tries. Guess I should give it one more go.) Done, it'll show up next time I upload an update to the index, probably later today. ;) |
@selea @amin +1