Moderate People, Not Content: A Restorative Approach to Combating Toxicity Online

It’s no secret that social media is playing an increasingly pervasive role in our daily lives. People from all over the globe who share common identities, experiences, or interests can create communities and connect with each other with the click of a button. In tandem, we’re seeing a steady rise in online toxicity. According to the Pew Research Center, 41% of Americans have personally experienced online harassment, with the severity and variety of these incidents intensifying since 2017. These vary from offensive name-calling to physical threats of violence to sexual harassment.

In response to the surge of egregious hate speech in these online communities, social media companies are taking significant steps to remove harmful content from their platforms. Appointed content moderators are tasked with manually identifying and removing content that violates community guidelines. Depending on the company’s policy and the nature of the violation, offenders might even be removed from the platform entirely.

However, this procedure has proved to have its downsides. Dr. Sarah T Roberts of UCLA describes the job of content moderation as “demand[ing] total psychic engagement and commitment in a way that was disturbing”, leading to self-isolation, substance abuse, and other long-term psychological effects. To make matters worse, these jobs are also notorious for their low wages and abject working conditions. Outsourcing content moderation to human workers, especially in this sort of environment, is cruel and unsustainable. It depends on the exploitation of marginalized individuals, and assumes, rather inhumanely, that it is possible for people to invest so much time and mental energy into reading the worst things on the Internet.

In response to these concerns, some companies have enacted automated content moderation. But this comes with its own slew of issues, mainly centered on its lack of accuracy and reliability as a result of not having a contextual understanding of human language. This allows nefarious users to exploit the pitfalls of these algorithms and post harmful content freely, while benign creators are having their content shadowbanned— that is, blocked without their knowledge— for no apparent reason.

To put it simply, these solutions have it backwards. What’s plaguing these communities isn’t a problem of content, but a problem of people and relationships, argues Dr. Niloufar Salehi of UC Berkeley’s School of Information. Ventures that simply aim to remove harmful users or content from the platform aren’t getting to the heart of the issue, and will thus only provide a temporary fix until the issue reproduces itself in a new form.

Instead, Dr. Salehi proposes the use of restorative justice, and goes on to apply it as a framework for responding to harm online. Some key aspects of her proposal include: assigning a trained caseworker to address extreme cases of harm, modifying content removal processes to be trauma-aware, encouraging the harmer to take responsibility for their actions, and setting concrete expectations for harm prevention.

Of course, this is not an enticing plan from the perspective of profits: it’s slower, more labor-intensive, and a lot more expensive. It requires platforms to both willingly lower profit margins and take responsibility for educating and bettering their users, rather than simply creating and enforcing rules. But given how these platforms are so instrumental in creating and maintaining communities, as well as swaying the public opinion, I argue that it is not a decision but a duty for them to effectively address harm that originates from them.

Drawing on her work, as well as our own procedure for mediating harm circles on campus, I speculate on what the fundamentals of restorative harm responses might look like online. How might the anonymity that is encoded into several social media platforms interfere with true accountability on the part of the offender? How much should the victim’s discretion be considered in deciding whether or not to ban the offender? And how would we deal with the immense volume of these incidents? Restorative responses are far more labor-intensive than content moderation, and far less automatable.

But let’s set aside the (frankly dispiriting) questions for a moment and focus on the overwhelming part of Dr. Salehi’s proposal: hope. A bareboned, experimental version of this kind of intervention is already demonstrating some success. If we manage to work RJ interventions into harm responses on social media, it could mean getting the individuals behind some incredibly atrocious harms to take accountability and reform themselves. By being intentional and preventive about addressing harm, we could begin to conceptualize the Internet as a radically empathetic place.