Moderating hateful comments has become an unbearable routine for many social media administrators. Tomáš Halás and a team of experienced IT specialists therefore created the Troll Wu app, which automatically protects discussions in various languages. They explain how the system works, why it is not censorship, and what real-world data shows.
How it works and what makes it different
The foundation is training their own artificial intelligence on a large volume of anonymized comments, which are independently labeled by at least three people. The team builds on the fact that every language has its own nuances, slang, and veiled meanings of insults that general-purpose models do not recognize. Therefore, for each language they work with native annotators to capture expressions that would escape a foreigner. The result is more accurate detection of vulgarities and hate speech in specific countries and varieties.
The application is officially integrated with Facebook, YouTube, and TikTok and works in real time. With the client's consent granted, every new comment is sent for evaluation, their own model (not GPT) assesses it, and in case of a rules violation reports it to the platform so that it hides it. The entire process takes seconds and scales to large volumes of discussions. This relieves the human team and speeds up the response to problematic content.
Moderation versus censorship
The creators liken moderation to rules in a restaurant: the owner is responsible for an environment where guests feel safe. Interventions happen only under clients' pages – people can spread their opinions elsewhere, but not break the rules in someone else's 'establishment'. Comments with hateful content are automatically hidden and the author does not receive a notification; this is intentional, so the toxic exchange is not further validated. Some expressions are also on the edge of the law, although in practice they are rarely penalized.
Studies and real-world experience show that without moderation, vulnerable groups, women, or children disappear from discussions, and gradually even 'ordinary' commenters. Tests on large profiles in various countries have shown that moderation does not reduce organic reach; on the contrary, it brings conversations back to substance. Among the clients are companies sensitive to brand safety, state institutions, sports clubs, and over 40 Slovak NGOs that often become targets of attacks. Moreover, the Bez hejtu initiative showed that if the largest profiles moderated systematically, the volume of hate visible to the public would drop dramatically, and hundreds of entities have already joined this call.