Predicting Dementia From Spontaneous Speech Using Large Language Models
Friday, December 12, 2025
9:00 AM-11:00 AM
BIOMED PhD Research Proposal
Title:
Predicting Dementia From Spontaneous Speech Using Large Language Models
Speaker:
Felix Agbavor, PhD Candidate
School of Biomedical Engineering, Science and Health Systems
Drexel University
Advisors:
Hualou Liang, PhD
Professor
School of Biomedical Engineering, Science and Health Systems
Drexel University
Andres Kriete, PhD
Associate Dean for Academic Affairs
Teaching Professor
School of Biomedical Engineering, Science and Health Systems
Drexel University
Details:
Language impairment is an important biomarker of neurodegenerative disorders such as Alzheimer’s disease (AD). Given its prevalence and still no cure available for AD treatment, there is an urgent need for the early diagnosis of dementia, which would yield benefits in improving quality of life for individuals with dementia. Artificial Intelligence (AI), particularly natural language processing (NLP), has recently been increasingly used for early prediction of AD through speech. The use of speech as a biomarker provides quick, cheap, accurate and non-invasive diagnosis of AD and clinical screening.
We aim to advance speech-based assessment of AD by uniting large language models (LLMs) for text understanding with foundation models for audio. First, we demonstrate that Generative Pretrained Transformer 3 (GPT-3) text embeddings which are vector representations of transcribed speech that capture rich semantics can reliably (i) distinguish individuals with AD from healthy controls and (ii) infer cognitive test scores directly from speech. These embeddings outperform conventional acoustic feature baselines and perform competitively with fine-tuned models. Building on this foundation, we address multilingual generalization using Whisper, a speech foundation model, to extract language agnostic audio embeddings coupled with language-specific ensembles that capture dialectal and phonotactic nuances. In the INTERSPEECH 2024 TAUKADIAL Challenge, we investigated the performance of AD detection pipelines for both English and Chinese speakers. Our novel language-specific ensemble approach achieved an unweighted average recall of 81.83% for mild cognitive impairment (MCI) classification (2nd place) and a root mean squared error of 1.196 for cognitive score prediction (1st place), outperforming language-agnostic pipelines.
Finally, we plan to extend the pipeline to other modalities with a tri-modal cross-attention fusion of LLM-based text embeddings, pretrained speech embeddings, and a representation of the cookie theft image to integrate complementary lexical–semantic, acoustic–prosodic, and visual context cues. This learned fusion aims to enhance robustness to noisy channels and yield more stable subject-level decisions suitable for real-world screening. Taken together, these findings establish a practical foundation for reliable AD screening with speech, while ongoing work aims to extend the pipelines to tri-modal fusion.
Contact Information
Natalia Broz
njb33@drexel.edu