Online discussions are flooded with toxic comments, which discourage people from reading and participating. Research says that roughly a quarter of posts are toxic and up to 80 % of users admit it puts them off. The Elf AI project offers a practical solution: it combines artificial intelligence and human moderators to keep discussions factual and civil.
Why toxic content is a problem
Toxic comments may not be illegal, but they poison the public debate: they spread insults, disinformation, and hatred toward groups. In an offline environment such behavior would meet with an immediate reaction, but online it often goes unanswered and even garners engagement. The result is the departure of polite participants and the deteriorating quality of dialogue.
According to data, roughly 25 % of comments can be labeled as toxic, which significantly degrades the environment on social networks and under media articles. As many as 80 % of people admit that such content discourages them from reading, writing posts, or interacting with brands. It is therefore not only a question of culture, but also of trust and reputation.
Elf AI: a combination of machine and human
The project emerged shortly after the outbreak of the war in Ukraine, when toxic and disinformation posts spread rapidly on Slovak sites. The team developed a language model trained on comments labeled by human moderators. The system continuously monitors discussions, evaluates posts according to community rules, and hides them in cases of clear violations.
The model is approximately 80–85 % reliable; the remaining cases are assessed by a team of 16 moderators, so-called elves, working in shifts. The AI has three options: leave the comment, hide it when there is high confidence of a violation, or escalate it to a human. Moderators’ decisions serve as feedback, so the model keeps improving; importantly, posts are hidden, not deleted, in line with platform settings.
Moderation is not censorship, the results are surprising
The project emphasizes that it does not target political opinions expressed politely. Intervention occurs when a post violates community rules – for example, uses profanity or attacks minorities. In addition to filtering, sentiment analysis is used to help track the overall mood in discussions.
Three years of practice have brought an interesting finding: if toxic posts are hidden, the number of comments in the discussion usually increases. People who would otherwise be afraid to join in gain a sense of safety and come into the conversation; at the same time, the number of hate outbursts drops because trolls see that the space is guarded. To date, the system has processed 60 million comments, hidden 12 million, and identified over 5 000 fake accounts.