Email or username:

Password:

Forgot your password?
Top-level
L. Rhodes

Looking at the differences between multi-server fediverse relations and multi-server Bluesky relations ought to make it easier to see why Bluesky couldn't just lift Mastdon's moderation solutions wholesale. And it opens up a bundle of questions about how federation and moderation work on the Bluesky network that probably won't be clear outside of the company until the beta ends and third parties start federating their own servers into the network.

20 comments
L. Rhodes

Some questions about how Bluesky will work once there are multiple servers:

• Will servers be able to defederate from one another? (Presumably yes.)

• Would that block public posts, or just direct posts? (Maybe the latter.)

• Will servers be able to defederate from indexers? (I'd guess no.)

• Are blocks processed at the host or index level? (Hard to say.)

• Will users be able to filter at the host level (e.g. by keyword), or only the algorithm level? (Looks like the latter right now.)

L. Rhodes

A friendly mutual let me poke around in their account a bit, and I saw two types of moderation at the account level. One is account blocks, which seems straightforward, but given the differences in the underlying protocol, I don't think we should assume too much about how it works. The other is categorical content filtering, which relies on ML-enabled post flagging at the index server level, though I don't know if flagged posts are kept out by the algorithm or blocked at the host server.

L. Rhodes

In general, it looks to me like the protocol is designed to delegate most machine-level decisions about an account's timeline to the index server, so any time there's ambiguity there, I incline toward the interpretation that the host server is passively accepting the timeline served by the index server. So my guess would be that content filtering is handled by the algorithm and served to the host server that way, but that's just a guess.

L. Rhodes

Circling back: "Are blocks processed at the host or index level?" I think the post below suggests that posts by blocked accounts are probably filtered out by the index server, not the server where your account is hosted. That seems to me the best explanation for why they're visible on the protocol: so that indexers can use that info to arrange the user's timeline. Otherwise, the blocks could be handled by the host, with no need to make them public. But I may be missing some nuance here.

L. Rhodes

Another thing that just occurred to me: #Bluesky started off as a project exploring the possibility of transporting #Twitter onto a protocol that allows for account portability. I've been thinking of that mostly in terms of data capture — the network is designed to make users trackable no matter which server they call home. But if the plan is still to transplant Twitter onto the AT protocol, then eventually the entire population of Twitter will merge into the Bluesky network.

L. Rhodes

There was a recent tempest in a teacup over a nazi getting into the #Bluesky beta. The devteam was quick to kick him right back out again, and everyone went back to doing whatever it is they do in there, content that they had learned the proper lesson from the "nazi bar" anecdote. But what happens is Elon flips a switch, and suddenly the entire population of #Twitter is on the Bluesky network? How does the population there deal with an instant influx of tens of thousands of hard right wingers?

L. Rhodes

This goes back to the questions I asked before. On the fediverse, we'd isolate the bulk of them from the greater network by suspending their host servers. Does #Bluesky provide tools that allow admins or individual accounts to completely exclude everything from a specific server? Will the indexing servers defederate from known nazi havens? Will the algorithms serve posts from nazi accounts so long as they don't trim the ML flag for hate speech? So many unanswered questions.

L. Rhodes

If #Twitter does merge into the network, my guess is it will (a) happen fast — like, within the next year, and (b) put an end to the social media reshuffling phase we've been going through since October. #Bluesky instantly reached critical mass; people will get to use "Twitter" while credibly maintaining they're not on Twitter. No fediverse project will be able to count on growth by matching Twitter features. Better to start focusing on what differentiates fediverse services from Bluesky.

L. Rhodes

One hugely important question that I don't see being asked (perhaps because not many people have worked through the implications) is: Who gets to be a #Bluesky indexer? Theoretically, any entity that can maintain one can run an indexing server, but are there controls at the account server layer to allow or deny indexers access to the data coming through that server? What's to stop, say, a Cambridge Analytica or the CCP from mining all of the data on the network for political use?

L. Rhodes replied to L.

Partial answer here: blueskyweb.xyz/blog/5-5-2023-f

"The federation architecture allows anyone to host a BGS, though it's a fairly resource-demanding service."

BGS = Big Graph Service, #Bluesky's new jargon for network-crawling data indexers.

This is pretty much the answer I expected: No real guardrails on who gets to index the network and provide feeds. State actors aren't likely to be daunted by the resource demands.

L. Rhodes replied to L.

That blog post also outlines a piece that wasn't obvious to me from the protocol spec: "An App View is the piece that actually assembles your feed and all the other data you see in the app, and is generally expected to be downstream from a BGS’s firehose of data."

So algorithmic filtering presumably happens on a smaller server independent of both the BGS and PDS, which makes the #Bluesky nerwork even more complex than I visualized in the OP. And presumably, anyone can run an App View, too.

L. Rhodes replied to L.

Here's #Bluesky's diagram of the federated network.

Labelers are independent services where posts coming from the BGS are tagged to make them easier to filter. Accounts (and maybe admins) can then hook into a labeler to outsource some of the moderation load.

Feed generators sort and filter posts. This is where the custom algorithms live.

App Views also do some sorting, but they're mostly (I believe) for sorting out post types for app types, e.g. photoblogging, microblogging, etc.

L. Rhodes replied to L.

This is much more complicated than I initially understood, both technically and socially. Since Labeler, App View and Feed Gen are all separate services that can be run independently of both the BGS and PDS, taking full advantage of the network means implicitly trusting four different entities on top of your local host. And even if you don't trust them, they get a say in how (or if) your posts are received by others on the network. I guess that's what #Bluesky means by the "reach layer."

L. Rhodes replied to L.

So let's say I'm the CCP, and I want to control what Chinese citizens can do on #Bluesky. I could start by configuring the national network to only allow traffic to and from PDS that connect to approved Feed Generators. I could run the BGS that crawls those PDS to exclude servers from other countries. I could run the Labelers that feed into those Generators and flag posts as "seditious" so that they get filtered. And, of course, I could investigate anyone who gets flagged by the ML software.

L. Rhodes replied to L.

But how about a non-state actor example? Let's say I'm just a ML enthusiast and decide to run my own #Bluesky Labeler. Maybe I run a pretty good service, and lots of people on the network start relying on it for moderation. And maybe I'm also a big 'ol TERF, and adjust the Labeler so that accounts by people whose profiles identify them as trans get mislabeled into frequently moderated categories. AFAICT, the only safeguard against this is… marketplace dynamics.

L. Rhodes replied to L.

#Bluesky is pretty clear about the market-oriented direction of how they're structuring the network: blueskyweb.xyz/blog/3-30-2023- In effect, they're offloading a lot of responsibility onto "consumer choice." But these are consumer choices about services that are, to a large extent, black boxes. Unless the BGS, Labelers, App Views and Feed Gens are radically transparent about how they handle data, you have to choose based on little more than how you feel about your timeline.

L. Rhodes replied to L.

This, from "Federation Architecture Overview," is also pretty wild: "For example, the BGS might crawl to grab data such as a certain post’s likes and reposts, and the app view will output the count of those metrics."

#Bluesky's solution to the out-of-sync metrics people sometimes complain about on #Mastodon is to separately crawl for likes and reposts, and pass that info through the App View. Which seems like an opportunity to inject false metrics, particularly if anyone can run an App View.

L. Rhodes replied to L.

The post says that #Bluesky is experimenting with breaking Labeling and Feed Gen out from App View, so this more complicated infrastructure isn't necessarily the form the network will ultimately take. Hopefully, they'll walk back that plan, because it seems to me that the flexibility it adds mostly opens up new vectors for bad actors who want to reshape traffic on the network to their own ends.

Daniel Schildt replied to L.

@lrhodes Thank you for the deep technical overview about the current status of the system.

L. Rhodes replied to Daniel

@autiomaa You're welcome, but it's really not all that deep. I'm really just going off of the spec as written, along with a few blog posts. The only thing that makes my analysis stand out is the fact that so few of the people covering Bluesky are offering any sort of structural analysis at all.

Go Up