Email or username:

Password:

Forgot your password?
marius

Since I'm such a fan of handmade programming, I find myself this fine eve implementing document indexing from scratch(ish) for my #GoActivityPub library storage backends.

6 comments
marius

I'm sure I made plenty of mistakes, but I have to admit I find it surprisingly satisfying to be able to operate on a data type that I can overlay on top of the existing #FedBOX storage engines and get native and *fast* querying for them.

The indexes are quite chunky despite being built on top of roaring bitmaps because there's so many "indexable" elements in an #ActivityPub object. (Currently I'm indexing the type, the content, summary, name, preferredUsername, the recipients, the actor and the object)

As I explore some more, I hope I streamline some of these issues, and make the whole thing more robust.

I'm sure I made plenty of mistakes, but I have to admit I find it surprisingly satisfying to be able to operate on a data type that I can overlay on top of the existing #FedBOX storage engines and get native and *fast* querying for them.

The indexes are quite chunky despite being built on top of roaring bitmaps because there's so many "indexable" elements in an #ActivityPub object. (Currently I'm indexing the type, the content, summary, name, preferredUsername, the recipients, the actor and the object)

marius

By *native* I mean that I can have my own little API for searching:

Screenshot of a piece of code where we search an indexed ActivityPub storage backend with three criteria: the type is Create, the recipients list contains a certain IRI and the object is another IRI.
A screenshot of a terminal where the result of the search is printed out.

The result consists of a single Create activity, that matches the criteria in the previous image.
marius

Frantic day today, around 10h of productive work on improving the Index and moving it as part of the go-ap/filters module.

+1510/-11 lines of which 987 belong to tests.

Coverage is not entirely sufficient yet, because it's missing the checks for the top level Index.Add() and Index.Search() methods.

Another thing left to do is the persistence to disk.

The **reason** why I wanted to move the work I've done yesterday to this module is that instead of the custom client.SearchByX() functions, I wanted to retrofit the existing functionality already present in the filters module. Ah, also moving the bitmaps themselves to a semblance of generic types....

Frantic day today, around 10h of productive work on improving the Index and moving it as part of the go-ap/filters module.

+1510/-11 lines of which 987 belong to tests.

Coverage is not entirely sufficient yet, because it's missing the checks for the top level Index.Add() and Index.Search() methods.

Another thing left to do is the persistence to disk.

Screenshot of git diff output for today's work on the go-ap/filters Go module.

The summary is +1510/-11 lines of code (of which we calculated that tests are 987).
marius

The new API would not be terribly different.

A screenshot similar to the one in a parent post with code that would search in an GoActivityPub Index.

The basic criteria are to search for objects with the type "Create", with recipients matching an IRI and having their object property equal to another IRI.
marius

The experiment of using roaring bitmaps as the foundation for indexing #ActivityPub objects is half successful and half not.

The good news is that soon I'll be able to replace the #brutalinks client access to it's activitypub backend with something that's built on top of local storage that makes use of the indexes, therefore being much, much faster.

The bad news is that adding indexing to the storage backends themselves didn't result in too much performance gains, but I just suspect I'm doing something wrong.

#GoActivityPub #golang

The experiment of using roaring bitmaps as the foundation for indexing #ActivityPub objects is half successful and half not.

The good news is that soon I'll be able to replace the #brutalinks client access to it's activitypub backend with something that's built on top of local storage that makes use of the indexes, therefore being much, much faster.

Go Up