Google Thinks It Can Use AI to Fight Trolls

A new tool uses artificial intelligence to determine what speech is out of bounds

By • -

Comment sections are in retreat. Companies ranging from NPR to IMDb are closing down their miniaturized public squares because they have been overrun with hateful commentary or because policing them requires too many resources. Largely unfettered free speech scales, the social media giants have taught us, but constructive dialogue doesn’t.

Jigsaw, a subsidiary of Google parent company Alphabet, thinks it can change that. On Thursday, the tech incubator, along with Google’s Counter Abuse Technology team, launched Perspective, a public API that uses artificial intelligence to automatically flag online speech that it deems “toxic.” By comparing new comments with a large data set of archived comments that were judged by humans for their toxicity in scoring panels, Jigsaw believes it can divine when a person’s speech has crossed the line from harsh to hateful.

But who decides what is toxic? We do, collectively. Jigsaw has obtained massive data sets of comments from Wikipedia, The New York Times, and other sources. The Times offered data on which comments its moderators had flagged as inappropriate, and Jigsaw itself enlisted thousands of people from a crowdsourcing service to judge whether Wikipedia comments were harassing. In the version of the tool on the Perspective website, users can also contest the toxicity rating of a specific word or phrase, which should theoretically alter future ratings. “It’s still early days and we will get a lot of things wrong,” the site warns.

Publishers will be able to make use of this tool in a variety of ways. The New York Times appears to be taking the most responsible approach, using Jigsaw’s tech to aggregate automatically flagged comments so that human moderators can review them more quickly. The Economist and The Guardian also plan to use the AI screening in their own comment sections.

But it’s also easy to imagine organizations using Perspective as a quick fix to automatically eradicate hate speech. Trying to apply an objective score to something as malleable as speech is a losing proposition. Different communities maintain different interpretations of language, and the meanings of words change over time. If there’s anything Facebook has taught us, it’s that letting an algorithm make content choices can have unexpectedly negative outcomes.

Jigsaw itself has said it doesn’t want publishers to use its tool as a sledgehammer, and the company says Perspective is a work in progress. But given the power it may accrue in governing speech in the coming years, it’s worthy of close scrutiny. The toxicity-reading demo on the Perspective website illustrates an algorithm that is good at flagging patently vitriolic speech, as you’d expect, but also relatively adept at catching coded language. “Thug,” for example, scores a 60 percent on the interface’s zero-to-100 scale (with 100 being the most toxic). For reference, a string of gibberish text scores a 34 percent; Beyoncé scores a 7 percent.

We threw a variety of trollish, politically charged, and neutral turns of phrase into the Perspective tool to get a sense of how Jigsaw is currently ranking language. The biggest takeaway: Old terms of hate (“bitch” — 97 percent; “stupid” — 95 percent) are easy to flag, but language is always changing and the newest insults (“Trumpkin” — 34 percent; “snowflake” — 2 percent) can slip past automated barriers.

Victor Luckerson

Keep Exploring