Every self-hosted user on the instance also needs to fetch every self-hosted user on the network, which also seems it would mean that the amount of network traffic for retrieving information is O(n^2)?
Top-level
Every self-hosted user on the instance also needs to fetch every self-hosted user on the network, which also seems it would mean that the amount of network traffic for retrieving information is O(n^2)? 31 comments
@Claire The article I pointed to was about fully self hosting your own atoproto infrastructure, including relay, and people seem to be getting excited about it on bluesky as being feasible, and I'm feeling uncertain it is @cwebber i mean it's “feasible” with extremely serious investment, not just for the average hobbyist @cwebber @Claire Clearly for an individual it's far too expensive to run the whole stack, but I guess the expectation is that 3rd party will like... form to gather that kind of resources/funding to run relays? Still unclear to me how things actually work when more than one Relay exist. Is the Bluesky app supposed to be able to switch between multiple at once? @eramdam @cwebber the front-facing Bluesky app is built to use Bluesky PBC's AppView, i don't think it has any provision to switch to another provider with the same API (but i might be wrong) you can theoretically build an equivalent relay+AppView that works in the same way with the same data (although you have yet to build some of the pieces yourself, not everything in bsky is open source afaik), but it's unclear what incentives you'd have to do that @cwebber @eramdam disparate repos of content-addressed objects that are truly decentralized (up to DID schemes), gathered by relays/AppViews which do not have to be centralized (but have to be huge nodes to be useful for Bluesky's purposes of a public microblogging thing) the web DID scheme is about as decentralized as current-world ActivityPub i guess (with the nice addition of it being easier to move some details around), and the PLC one is… a very centralized placeholder? @cwebber @Claire @eramdam don't think the goal w/ atproto is to be "more decentralized" in the abstract. we (team) had worked on SSB and dat, which were radically decentralized/p2p but hard to work with and grow. would not supplant "the platforms". atproto came out of identifying the *minimum* necessary decentralization properties, and ensuring those are strongly locked in. we basically settled on: @cwebber @Claire @eramdam I have a longer post on this, and our progress, on my personal blog: @cwebber @Claire @eramdam (though it wasn't directly influential on atproto design, and Backus has since pulled the post) @bnewbold @cwebber @Claire @eramdam don't expect a tracker company that plans to make money on the big data of tracking torrents to fully embrace the magnet file specification or the DHT in favor of their cure. e.g., the "VPNs cure everything" of social media you can't make someone learn that which their job depends on them not getting. @risottobias @cwebber @Claire @eramdam the value of data-for-sale is usually proportional to how exclusive access to it is. if data is public, it is commodity and "worth" a whole lot less. IMHO client apps are way under-appreciated in this regard: they can track attention/behavior way better than API servers can. @risottobias @cwebber @Claire @eramdam for me credible exit means being able to replace the largest operator, who is indexing the full network. if the network is large, such an index is going to be large! spinning up a cozy/indie/small network is good, and important, but the full-network thing is also important in society. @eramdam @cwebber also one thing i have not read up about yet is how the PDS notify the relays of updates, as a new relay, do you need to convince every PDS to talk to you? or are PDS happily sending everything to everyone who asks? in the latter, what if the number of relays increase? (at least the PDS doesn't have to do a lot of computing afaik) @cwebber That's what I like about the Fediverse and its projects. They're basically self-contained, with very little base maintenance cost, even those engines on the heavier side, like Mastodon or Misskey. Leaves more leeway for horizontal scaling, more people can afford it. Bluesky kinda looks designed by billionaires for other billionaires. @Claire Nah, Fedi has its fair share of issues. Loss of identity on the server loss is one of them. It might be mitigated by migrations and/or replications but I haven't seen anyone do this so far. @Claire @drq @cwebber the architecture of atproto kind of presumes that there is value/utility in "full world" indices: ability to search and find strangers with no social-graph-connection across the network. I don't think all (or even much at all) social web stuff needs to end up in that bucket! but that bucket is going to exist in the world and have an impact. I love the AP social outcomes, but if AP doesn't provide for big-world broadcast, Twitter will continue to have a role in society I think the charitable argument is that the many:many architecture of the Fediverse doesn't work for BIG networks due to hosting costs and fragility (the "can't see all the comments" problem), and this necessitates a few:many approach with caching layers (relays, AppViews). They hope to make hosting "Google" cheap enough that it's plausible for a donation-funded org or small business to do it. As @hazelweakly has noted, just hosting Hachyderm is $1k a month on owned servers. I think the risk they're running with this approach is that there's no fundamental guarantee that the cost of being Google won't run into the tens of millions eventually, and as they're taking VC funding the incentives to avoid this may not be in place. Fediverse spreads this cost around the network pretty "fairly" if less efficiently, hence the $12k / yr expense of hosting Hachyderm. > Data in the Atmosphere is stored on users' personal repos. It's almost like each user has their own website. Our goal is to aggregate data from the users into our SQLite DB. Very helpful replies from @bnewbold over on Bluesky: https://bsky.app/profile/bnewbold.net/post/3lahbqsaexy2z |
@cwebber if by “instance” you mean the relay+AppView, no, users would just query it and get ready-to-use results like you can currently do on bsky.app?
but yeah, you are not meant to self-host your own bluesky, only your own data repository basically