@Raccoon When someone tells me they're going to use...

Raccoon at TechHub for Harris's posts Post Back to profile

When someone tells me they're going to use ML for moderation, or for flagging toxic posts, I ask which model they're going to use, and what info the model is going to do inference on.

If the input doesn't include the relationship between the two people, and the community that it is being said in, then it is impossible to not get many false positives. There is not enough context to do reliable inference based on just a short text sample.

https://hachyderm.io/@mekkaokereke/109989027419424661

3/N

Like 22 August at 3:19 | Wall-to-wall | Open on hachyderm.io

6 comments

mekka okereke :verified:

@Raccoon

So no, I don't like ML for moderation.

I could like it in theory, but in practice I rarely see implementations that:
a) include enough context
b) do not amplify the very problem that is experienced by the most vulnerable users

4/4

22 August at 3:24 | Open on hachyderm.io

Raccoon at TechHub for Harris

@mekkaokereke
Thanks for that long response, you really brought up a good concern that I wasn't thinking about, but I've read about from past work in the field.

Obviously, any implementation we have needs to keep this in mind, and I will note, current systems which flag slurs already end up flagging posts by black and queer people using the N word and the F word respectively, which is not the kind of thing we are looking to catch with this. I'm well aware of the issue of these AI systems seeing different styles of communication and deciding differently based on that.

That said, this feels like a stronger argument against letting it run unsupervised rather than using it as a flagging system in general: if we all know it's an automated system, that it's fallible, and that it's point is to make sure we see things to look over them, and not to tell us they're bad, in theory we should be providing fair moderation.

(Continued)

@mekkaokereke
Thanks for that long response, you really brought up a good concern that I wasn't thinking about, but I've read about from past work in the field.

Expand text...

22 August at 3:32 | Open on techhub.social

Jon

In practice, requiring human oversight of automated decision making doesn't correct for bias or errors -- people tend to defer to the automated system. Ben Green's excellent paper on this focuses on government use of automated systems, but the dynamic applies more generally. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3921216

First, evidence suggests that people are unable to perform the desired oversight functions. Second, as a result of the first flaw, human oversight policies legitimize government uses of faulty and controversial algorithms without addressing the fundamental issues with these tools.

And sure, as you point out, mistakes are made today by human moderators ... but those mistakes contaminate any training set. And algorithms typically magnify biases in the underlying data.

@Raccoon@techhub.social @mekkaokereke@hachyderm.io

Expand text...

22 August at 3:46 | Open on blahaj.zone

Raccoon at TechHub for Harris

@jdp23 @mekkaokereke
Oh no, if I at any point suggested that I thought that an AI can be a better moderator than a human then I have written it poorly. No machine should ever be responsible for a management decision because a machine can't be held accountable.

Humans are definitely the better choice for moderation decisions.

This is a good point about the oversight problem though: with a system that just flags certain words or combinations thereof, it's easy for people to understand, internally, that these posts might not be bad. With a system that's doing some complicated thing that we don't understand beneath the surface, it's going to be a bit harder to make that connection.

And once again, this is a case of the system not really justifying itself: how much will it actually catch that isn't caught by simpler systems, and does that outweigh the real potential for poor oversight of a system with bad biases?

Humans are definitely the better choice for moderation decisions.

Expand text...

22 August at 3:52 | Open on techhub.social

Jon

Agreed that simpler tools that are easier for people to understand the limits of might be less prone to the oversight problems. I talked once with an r/AskHistorians moderator about how tools fit into their intersectional moderation approach, and they told me that they used some very simple pattern-matching tools to improve efficiency ... stuff like that can be quite useful, if everybody understands the limitations and processes make sure there isn't too much reliance on the tools.

But that's a strong argument against *AI-based* systems!

Of course, a different way to look at it is that there's an opportunity to start from scratch, build a good training set and algorithms on top of it that focus on explainability and being used as a tool to help moderators (rather than a magic bullet). There are some great AI researchers and content moderation experts here who really do understand the issues and limitations of today's systems. But, it's a research project, not something that's deployable today.

@Raccoon@techhub.social @mekkaokereke@hachyderm.io

Expand text...

22 August at 4:36 | Open on blahaj.zone

Jon

Also, related to your question of how much AI-based moderation would actually help, there's an important point in the "Moderation: Key Observations" section of the Governance on Fediverse Microblogging Servers that @darius@friend.camp and @kissane@mas.to just published:

A lot of Fediverse moderation work is relatively trivial for experienced server teams. This includes dealing with spam, obvious rulebreaking (trolls, hate servers), and reports that aren’t by or about people actually on a given server. For some kinds of servers and for certain higher-profile or high-intensity members on other kinds of servers, moderators also receive a high volume of reports about member behaviors (like nudity or frank discussion of heated topics) that their server either explicitly or implicitly allows, and which the moderators therefore close without actioning.

These kinds of reports are the cleanest targets for tooling upgrades and shared/coalitional moderation, but it’s also worth noting that except in special circumstances (like a spam wave or a sudden reduction in available moderators), this is not usually the part of moderation work that produces intense stress for the teams we interviewed. (This is one of the findings that we believe does not necessarily generalize across other small and medium-sized servers.)

@Raccoon@techhub.social @mekkaokereke@hachyderm.io

A lot of Fediverse moderation work is relatively trivial for experienced server teams. This includes dealing with spam, obvious rulebreaking (trolls, hate servers), and reports that aren’t by or about people actually on a given server....

Expand text...

22 August at 4:44 | Open on blahaj.zone