Excited to announce that I will be at #fediforum today...

Excited to announce that I will be at #fediforum today speed demo-ing my latest project: an ActivityPub data observatory!

This observatory does not collect any user data or metadata. Instead I am looking at the *shape* (aka schema) of data being sent around the fediverse. This will let software devs ask questions like "How is a Mastodon 4.2.0 image post formatted differently from a Misskey 2024.7.0 image post?"

And we'll get real answers based on data rather than on poor documentation.

Like 14 September at 14:27 | Open on friend.camp

17 comments

Darius Kazemi

I won't be actually LAUNCHING this tool until I've found out how you all would feel about it being opt-out vs opt-in. I will provide a longer blog post for you all to read with details, but in short:

It would be really helpful for general interop on the fedi if this were opt-out. But if people are generally freaked out by having technical details about software data formats being opt-out... I'll make it opt-in.

Quick explanation of the data scrubbing in the attached images

14 September at 14:43 | Open on friend.camp

blaine

@darius nice! The only folks who *I* could imagine insisting on this being opt-in are Oracle's legal team, and they were told in no uncertain terms that this sort of data isn't even *eligible* for opt-out, even in the US of A. 😅

14 September at 14:49 | Open on mastodon.social

Darius Kazemi

@blaine every morning I ask myself: "Am I going to do something today that Oracle's legal team won't like?" and if the answer is no I have already failed

14 September at 14:50 | Open on friend.camp

Jamie Booth

@darius
@blaine

I'm reasonably certain *thinking* the name "Oracle" probably qualifies as something their legal team objects to. 😁

(At least without a PO attached)

15 September at 20:17 | Open on boothcomputing.social

William Pietri

@darius I'd say it's fine since it's not collecting user data. However, given how much jerks have caused sensitivity here I'd suggest an explanation page that uses some of your own posts as examples, with detailed explanations. And for usability/accessibility reasons, it should be in text, and with much higher contrast. Machine representations look forbidding to non-technical people anyhow, but especially so when dark and hard to read.

14 September at 14:53 | Open on sfba.social

Darius Kazemi

@williampietri Yes, sorry, this is something I whipped up in a few minutes for a microblog post and is not going to be what my macroblog post looks like

14 September at 14:58 | Open on friend.camp

In #Flancia we'll meet

@darius very nice, thanks for checking but to me it's super clear this is fine to scrape by default/be opt-out.

14 September at 15:19 | Open on social.coop

Jeremy Bornstein

This looks very cool, thank you!

As an implementor, one of the additional things I'm curious about are the commonalities (or lack thereof) among the structure of various URIs. Would you be open to, for example, analyzing common prefixes in a single activity, to notice for example that the actor ID is or is not present as a portion of (say) the followers collection?

19 September at 2:25 | Open on jeremy.org

Seth 🎙️:jawn_sg:

@darius Oh this is too cool!

14 September at 14:46 | Open on s3th.me

Evan Prodromou

@darius can you compare to browser.pub?

14 September at 15:22 | Open on cosocial.ca

Darius Kazemi

@evan yeah. on browserpub I can say "hey help me take a look at these particular messages I know about". This observatory will surface information about stuff floating around the fedi that I don't even know about. For example I am already learning about server software I've never even heard of, and I would not have put that into browser.pub because I wouldn't have known it existed

14 September at 15:27 | Open on friend.camp

Evan Prodromou

@darius Ah, OK, interesting. Where does your network tap plug in?

14 September at 15:29 | Open on cosocial.ca

Darius Kazemi

@evan still figuring it out. Right now I am subscribing to a public relay as that is the most software-neutral source I could think of, but I am looking at other ingestion methods too. Importantly I want to ingest AP only... I'm not going to hit proprietary API endpoints like most scrapers do

14 September at 15:34 | Open on friend.camp

Evan Prodromou

@darius barf, no

14 September at 15:35 | Open on cosocial.ca

d(jack’o la)ngo 🎃

@darius love this, it might even more useful than a test suite!

14 September at 16:02 | Open on social.coop

Darius Kazemi

@django I think this will be really helpful for people writing tests for a test suite! Like it's one thing to write a test suite that tests conformance with a standard, it's another thing to write a test suite that tests conformance with actual software out in the wild

14 September at 16:05 | Open on friend.camp

Marco Rogers

@darius this looks cool.

14 September at 17:35 | Open on social.polotek.net