@dthompson came back and laid it out in more formal terms and said I was right.
But I was still nervous, so I called up one of my old MIT AI Lab type friends and rambled about it to them on a call. What did they think?
Top-level
@dthompson came back and laid it out in more formal terms and said I was right. But I was still nervous, so I called up one of my old MIT AI Lab type friends and rambled about it to them on a call. What did they think? 22 comments
Let's start with the following: - ATProto has positioned itself as "no compromises on centralized use cases". Well, in that case, let's say it can't do *worse* than eg ActivityPub. This includes with replies. You can't do *worse* than ActivityPub on replies and mentioning someone, etc. - We will interpret the most centralized system as one where there's only one provider for storage and distribution of all messages: the least amount of user participation - Just as blogging is decentralized but Google (and Google Reader) are not, it is not enough to have just PDS'es in Bluesky be self-hosted. When we say self-hosted, we really mean self-hosted: users are participating in the distribution of their content. - We will consider this a gradient. We can analyze the system from the greatest extreme of centralization which can "scale towards" the greatest degree of decentralization. - Finally, we will analyze both in terms of the load of a single participant on the network but also in terms of the amount of network traffic as a whole. Okay. That is the structure we will use for our analysis. Let's compare "message passing" vs ATProto-style "global public shared heap". So okay. Let's get the CS notation out of the way: "Message passing" at full decentralization: Okay, that's reasonable and what you'd expect "Public global no-missed-messages (or not worse than AP) shared-heap" ATProto style at full decentralization: Oof I'd better back this up because that ain't good! In other words, as our systems get more decentralized, message passing handles things fine. Individual nodes can participate in the network no matter how big it gets. The zoom-out for the network as a whole doesn't get more complicated as we add more users OR move more users towards self hosting. Things are NOT good, if I'm correct above, as we make things more decentralized in the atproto-public-shared-heap model. The more self-hosting and indeed the more "full nodes" join, the more it gets expensive for each of the nodes and the network EXPLODES! Truly self-hosted atproto is NOT POSSIBLE! And there is no solution to this without adding directed message passing. Another way to say this is: to fix a system like ATProto to allow for self-hosting, you have to ultimately fundamentally change it to be a lot more like a system like ActivityPub! Now I left more of the precise analytical explanation in my blogpost. But social media isn't great for that, so go check out my blogpost if you want to go through all that (eg if you're more like @dthompson and less like me, I'm a narrative person) https://dustycloud.org/blog/re-re-bluesky-decentralization/ Here's our story: Now just before you say "wait but ATProto isn't for DMs", yes, but one way this could happen is that eg Bob follows Alice, Carol follows Bob, etc. What I'm saying is, messages can have an "intended audience". That's what we're using here. Before we get into this, remember, the main difference between "message passing" and the "shared heap" is the former has directed and delivered messages, the latter does not. See prev blogpost for explainer. So, what happens in a day for both systems? Because that's what we really want to find out. Under message passing, Alice sends her message to Bob. Only Bob need *receive* the message. So on and so forth. - For an individual self-hosted node, messages passed per day: 1. That's about what we'd expect. Under the public-gods-eye-view-shared-heap model, each user must know of all messages to know what may be relevant. Each user must *receive* all messages. Sounds survivable with 26 users though, right? Message passing: Public gods-eye-view-shared-heap-model: Now, could we handle a million self hosted users? Is it possible? No problem in message passing. EXPLOSIVE with atproto. What if we had a million users and added just 5 more? How many more messages must the network bear? 5 new messages in message passing. "Christine that's ridiculous, we're not expecting a million self-hosted users" Well I think it would be nice! But regardless, ActivityPub has 27,000 servers on it, all meaningfully participating in the network. ATProto, in its current design, would be crushed to DEATH "But Christine", you may say, "I heard gossip might fix this!" No. It cannot. In fact, I was being more generous than a gossip network, and assumed you only *received* a message once. With gossip you might *receive* more than once. But you need to receive a message to know it. ATProto was designed for a "big world" view. That's fine! But I'm trying to show seriously what happens if it was actually, really decentralized. *Every* fully participating node added to the network makes the network explosively more expensive. ATProto doesn't scale towards decentralization. |
"I think it's pretty clear immediately that it's quadratic. This is basic engineering considerations, the first thing you do when you start designing a system," they said.
Well that's a relief, why isn't it clear to everyone else, I asked?
So they suggested I lay it out to you as I did to them.