A friendly mutual let me poke around in their account...

A friendly mutual let me poke around in their account a bit, and I saw two types of moderation at the account level. One is account blocks, which seems straightforward, but given the differences in the underlying protocol, I don't think we should assume too much about how it works. The other is categorical content filtering, which relies on ML-enabled post flagging at the index server level, though I don't know if flagged posts are kept out by the algorithm or blocked at the host server.

Like 1 May 2023 at 14:49 | Open on merveilles.town

18 comments

L. Rhodes

In general, it looks to me like the protocol is designed to delegate most machine-level decisions about an account's timeline to the index server, so any time there's ambiguity there, I incline toward the interpretation that the host server is passively accepting the timeline served by the index server. So my guess would be that content filtering is handled by the algorithm and served to the host server that way, but that's just a guess.

1 May 2023 at 14:52 | Open on merveilles.town

L. Rhodes

Circling back: "Are blocks processed at the host or index level?" I think the post below suggests that posts by blocked accounts are probably filtered out by the index server, not the server where your account is hosted. That seems to me the best explanation for why they're visible on the protocol: so that indexers can use that info to arrange the user's timeline. Otherwise, the blocks could be handled by the host, with no need to make them public. But I may be missing some nuance here.

A screenshot of a post from @bsky.app, reading: "Important: Similar to your likes, your block list is public data! While users of the Bluesky app can't easily find this list now, the data is public and enumerable at the protocol level. This means that third-party apps or clients can surface and display your list."

1 May 2023 at 15:08 | Open on merveilles.town

L. Rhodes

Another thing that just occurred to me: #Bluesky started off as a project exploring the possibility of transporting #Twitter onto a protocol that allows for account portability. I've been thinking of that mostly in terms of data capture — the network is designed to make users trackable no matter which server they call home. But if the plan is still to transplant Twitter onto the AT protocol, then eventually the entire population of Twitter will merge into the Bluesky network.

2 May 2023 at 16:05 | Open on merveilles.town

L. Rhodes

There was a recent tempest in a teacup over a nazi getting into the #Bluesky beta. The devteam was quick to kick him right back out again, and everyone went back to doing whatever it is they do in there, content that they had learned the proper lesson from the "nazi bar" anecdote. But what happens is Elon flips a switch, and suddenly the entire population of #Twitter is on the Bluesky network? How does the population there deal with an instant influx of tens of thousands of hard right wingers?

2 May 2023 at 16:13 | Open on merveilles.town

L. Rhodes

This goes back to the questions I asked before. On the fediverse, we'd isolate the bulk of them from the greater network by suspending their host servers. Does #Bluesky provide tools that allow admins or individual accounts to completely exclude everything from a specific server? Will the indexing servers defederate from known nazi havens? Will the algorithms serve posts from nazi accounts so long as they don't trim the ML flag for hate speech? So many unanswered questions.

2 May 2023 at 16:21 | Open on merveilles.town

L. Rhodes

If #Twitter does merge into the network, my guess is it will (a) happen fast — like, within the next year, and (b) put an end to the social media reshuffling phase we've been going through since October. #Bluesky instantly reached critical mass; people will get to use "Twitter" while credibly maintaining they're not on Twitter. No fediverse project will be able to count on growth by matching Twitter features. Better to start focusing on what differentiates fediverse services from Bluesky.

2 May 2023 at 17:42 | Open on merveilles.town

L. Rhodes

One hugely important question that I don't see being asked (perhaps because not many people have worked through the implications) is: Who gets to be a #Bluesky indexer? Theoretically, any entity that can maintain one can run an indexing server, but are there controls at the account server layer to allow or deny indexers access to the data coming through that server? What's to stop, say, a Cambridge Analytica or the CCP from mining all of the data on the network for political use?

3 May 2023 at 13:12 | Open on merveilles.town

L. Rhodes replied to L.

Partial answer here: https://blueskyweb.xyz/blog/5-5-2023-federation-architecture

"The federation architecture allows anyone to host a BGS, though it's a fairly resource-demanding service."

BGS = Big Graph Service, #Bluesky's new jargon for network-crawling data indexers.

This is pretty much the answer I expected: No real guardrails on who gets to index the network and provide feeds. State actors aren't likely to be daunted by the resource demands.

6 May 2023 at 18:38 | Open on merveilles.town

L. Rhodes replied to L.

That blog post also outlines a piece that wasn't obvious to me from the protocol spec: "An App View is the piece that actually assembles your feed and all the other data you see in the app, and is generally expected to be downstream from a BGS’s firehose of data."

So algorithmic filtering presumably happens on a smaller server independent of both the BGS and PDS, which makes the #Bluesky nerwork even more complex than I visualized in the OP. And presumably, anyone can run an App View, too.

6 May 2023 at 18:56 | Open on merveilles.town

L. Rhodes replied to L.

Here's #Bluesky's diagram of the federated network.

Labelers are independent services where posts coming from the BGS are tagged to make them easier to filter. Accounts (and maybe admins) can then hook into a labeler to outsource some of the moderation load.

Feed generators sort and filter posts. This is where the custom algorithms live.

App Views also do some sorting, but they're mostly (I believe) for sorting out post types for app types, e.g. photoblogging, microblogging, etc.

A diagram showing two devices connecting to two different Personal Data Servers, each providing data to a Big Graph Service, which pushes data back out to a Labeler, and App View and a Feed Gen, each of which also passes data back to one PDS or to the next item in the group. Sorry, I sure that's confusing, but as I said, it's a more complicated picture than those I provided in my earlier post.

6 May 2023 at 19:18 | Open on merveilles.town

L. Rhodes replied to L.

This is much more complicated than I initially understood, both technically and socially. Since Labeler, App View and Feed Gen are all separate services that can be run independently of both the BGS and PDS, taking full advantage of the network means implicitly trusting four different entities on top of your local host. And even if you don't trust them, they get a say in how (or if) your posts are received by others on the network. I guess that's what #Bluesky means by the "reach layer."

6 May 2023 at 19:31 | Open on merveilles.town

L. Rhodes replied to L.

So let's say I'm the CCP, and I want to control what Chinese citizens can do on #Bluesky. I could start by configuring the national network to only allow traffic to and from PDS that connect to approved Feed Generators. I could run the BGS that crawls those PDS to exclude servers from other countries. I could run the Labelers that feed into those Generators and flag posts as "seditious" so that they get filtered. And, of course, I could investigate anyone who gets flagged by the ML software.

6 May 2023 at 19:36 | Open on merveilles.town

L. Rhodes replied to L.

But how about a non-state actor example? Let's say I'm just a ML enthusiast and decide to run my own #Bluesky Labeler. Maybe I run a pretty good service, and lots of people on the network start relying on it for moderation. And maybe I'm also a big 'ol TERF, and adjust the Labeler so that accounts by people whose profiles identify them as trans get mislabeled into frequently moderated categories. AFAICT, the only safeguard against this is… marketplace dynamics.

6 May 2023 at 19:46 | Open on merveilles.town

L. Rhodes replied to L.

#Bluesky is pretty clear about the market-oriented direction of how they're structuring the network: https://blueskyweb.xyz/blog/3-30-2023-algorithmic-choice In effect, they're offloading a lot of responsibility onto "consumer choice." But these are consumer choices about services that are, to a large extent, black boxes. Unless the BGS, Labelers, App Views and Feed Gens are radically transparent about how they handle data, you have to choose based on little more than how you feel about your timeline.

6 May 2023 at 19:55 | Open on merveilles.town

L. Rhodes replied to L.

This, from "Federation Architecture Overview," is also pretty wild: "For example, the BGS might crawl to grab data such as a certain post’s likes and reposts, and the app view will output the count of those metrics."

#Bluesky's solution to the out-of-sync metrics people sometimes complain about on #Mastodon is to separately crawl for likes and reposts, and pass that info through the App View. Which seems like an opportunity to inject false metrics, particularly if anyone can run an App View.

6 May 2023 at 20:06 | Open on merveilles.town

L. Rhodes replied to L.

The post says that #Bluesky is experimenting with breaking Labeling and Feed Gen out from App View, so this more complicated infrastructure isn't necessarily the form the network will ultimately take. Hopefully, they'll walk back that plan, because it seems to me that the flexibility it adds mostly opens up new vectors for bad actors who want to reshape traffic on the network to their own ends.

6 May 2023 at 20:10 | Open on merveilles.town

Daniel Schildt replied to L.

@lrhodes Thank you for the deep technical overview about the current status of the system.

6 May 2023 at 20:16 | Open on mastodon.social

L. Rhodes replied to Daniel

@autiomaa You're welcome, but it's really not all that deep. I'm really just going off of the spec as written, along with a few blog posts. The only thing that makes my analysis stand out is the fact that so few of the people covering Bluesky are offering any sort of structural analysis at all.

6 May 2023 at 20:18 | Open on merveilles.town

Go Up