Diagnosing Alzheimer’s disease from speech is no longer sci‑fi: researchers from SAV combine speech recognition and synthesis, artificial intelligence, and social robotics. Thanks to apps and robots, it is possible to noninvasively detect mild cognitive impairment and make doctors’ work easier. Róbert Sabo’s lecture showed how this technology is being tested with everyday people and where it is headed.
From speech recognition to natural communication
There is a long path from reliable word recognition to diagnosis from speech. The team is behind, for example, the APD automatic transcription of dictation used by judges, and the SARA server system for editors, which can transcribe recorded videos and podcasts. The recognizer runs on servers in cooperation with the Technical University in Košice. In the Rým mobile app, recognition follows the text of a fairy tale and triggers illustrative sounds that draw children into the story.
The institute has also developed Slovak speech synthesis, including an expressive voice, with which a social robot now speaks. The research also includes voice biometrics, i.e., speaker recognition, and the detection of emotions or stress in the voice. When these capabilities are combined with dialogue management, an interface emerges that learns to adapt to the person – for example in pace or responses – so that the communication feels natural.
Voice as a window into cognition
The first step toward diagnosis was the Eva app five years ago, which won IT Product of the Year 2023. The app administers well‑known tests: naming pictures and a free description of a scene, during which the accuracy and complexity of sentences are monitored. Recordings are processed on the server, where acoustic voice analysis runs along with neural networks trained on data from over a thousand healthy and ill people. The system also perceives subtle signs, such as slowed word retrieval, and in the end either reassures the user or recommends a visit to a neurologist.
In the current APVV project, the tasks are assigned directly by a social robot. The person speaks into a microphone, speech is transcribed by an automatic recognizer, the dialogue manager (for more complex questions it asks ChatGPT for advice) prepares a response, and speech synthesis plays it smoothly with alignment to the robot’s mouth movements. The team has collected around 200 conversations, including in the noisy environment of Researchers’ Night, and is specifically testing people 50+, 60+, and individuals with mild cognitive impairment. An interesting result: the older the respondents, the more natural the communication felt to them; up to 90% of people over 50 considered the robot suitable for cognitive testing, with younger people tending to have higher demands, while older people more easily accept imperfections and see practical benefits.
What we are improving and where we apply it
The next goals aim at even smoother dialogue. The robot should learn to take the initiative without waiting for a long silence and to recognize from intonation whether the person has finished speaking. Work is underway on interrupting the robot’s speech (so‑called barge‑in), on emotions in voice synthesis, and on using visual tracking of reactions. New tasks and the transfer of methods from the tablet solution are also planned, so that the system can more reliably detect mild cognitive impairment or Alzheimer’s disease.
Since not every institution can afford a robot for approximately 15,000, the team also created its virtual version for a large screen. A reception assistant detects arrival, recognizes a face or voice, greets by name, and provides information about the institute through a customized AI assistant. It advises visitors where a particular staff member is located and displays menus from multiple restaurants to employees. Experience suggests that people enjoy talking with robots, and this approach has a chance to find application in healthcare as well.