Become a partner

Personal Data Anonymization Using AI

Filip Bednárik - IT Consultant, essential data ·

In a world where artificial intelligence is available to general public and able to extract information from text in a matter of seconds, there arises a greater need to protect personal data when publishing documents online. Anonymization is an important process that retains the nature of the original document while removing personally identifiable information from it. The Ministry of Justice of the Slovak Republic in collaboration with Essential Data are currently deploying an IT solution that combines human and artificial intelligence to speed up and improve the anonymization process of judicial decisions in Slovakia. The innovation lies in the combination of the use of modern web technologies, UX (User experience) principles, knowledge in the form of rules for the recognition of personal information, NLP (Natural language processing) tools and an AI model trained on inputs from court staff. From the production operation we can see that more than 55% of the documents are anonymized without the need for correction, the average time of anonymization is 3.5 minutes, and the success rate at which the AI correctly identifies the data is around 90%.

Court decisions must be accessible to the public, yet must not disclose personal data. The solution on which Venal Data is collaborating with the Ministry of Justice combines artificial intelligence and human oversight to make anonymization fast and consistent. The result is safer publication and better handling of documents, including the extraction of references to legal regulations.

Why and how to anonymize court decisions

Anonymization protects names, personal identification numbers, dates of birth, and vehicle registration numbers so that sensitive data do not end up on the web. Artificial intelligence identifies and replaces them in the text; for example, the names of parties are changed to abbreviations such as A, B, C. At the same time, references to regulations are extracted from the documents so that the relevant provisions can be easily found on Slov-Lex. The public thus sees the substance of court decision-making, but without intruding on individuals’ privacy.

Read more

Filip Bednárik

essential data
Filip Bednárik has worked for Essential Data since 2014. He is a key expert in projects in which the intelligent search in the Slovak language is being implemented. He enjoys working with big data and its analysis, evaluation, and presentation. Since 2018 Filip has been working as an AI expert on the project “Anonymization Services for the Minis…
Páčil sa ti článok? Zdieľaj ho a povedz o ňom aj ostatným