Predicting Dementia from Spontaneous Speech Using Large Language Models

Monday, June 22, 2026

9:00 AM-11:00 AM

BIOMED PhD Thesis Defense

Title:
Predicting Dementia from Spontaneous Speech Using Large Language Models

Speaker:
Felix Agbavor, PhD Candidate
School of Biomedical Engineering, Science and Health Systems
Drexel University

Advisors:
Hualou Liang, PhD
Professor
School of Biomedical Engineering, Science and Health Systems
Drexel University

Andres Kriete, PhD
Associate Dean for Academic Affairs and Teaching Professor
School of Biomedical Engineering, Science and Health Systems
Drexel University

Details:
Alzheimer’s disease (AD) and related cognitive disorders are typically diagnosed using clinical assessments that can be costly, time-intensive, and difficult to scale for frequent monitoring. Speech provides a practical alternative because it is natural, non-invasive, inexpensive to collect, and closely coupled to cognition. This dissertation investigates how foundation-model representations can improve speech-based cognitive impairment prediction while addressing two major barriers to deployment: limited generalization beyond English and brittleness under single-modality reliance.

The dissertation develops and evaluates three complementary contributions. First, it establishes speech-only foundations for both diagnostic prediction and severity estimation. Using transcript-first modeling, large language model (LLM) embeddings extracted from spontaneous speech transcripts support accurate AD classification and cognitive score prediction, outperforming conventional handcrafted feature baselines. Using end-to-end voice modeling, self-supervised speech representations enable direct waveform-to-prediction modeling with strong discrimination and stable performance under external validation, while supporting calibrated probability outputs suitable for screening-style interpretation.

Second, the dissertation treats multilingual robustness as a first-class objective. Using the TAUKADIAL bilingual setting (English and Mandarin Chinese), it evaluates language-agnostic versus language-specific strategies built on multilingual speech embeddings. Results show that strong bilingual performance benefits from language-specific modeling and task-aware aggregation across multiple picture-description prompts, while cross-language transfer remains challenging under distribution shift.

Third, the dissertation extends beyond speech-only modeling to multimodal picture-description screening. It proposes an embedding-level fusion framework that integrates text, audio, and the shared image stimulus using cross-attention, and it evaluates unimodal, bimodal, and trimodal configurations under a unified protocol. The fusion results demonstrate that multimodal integration improves dementia prediction beyond unimodal baselines and clarifies how each modality contributes, with language content providing the dominant signal and audio and image providing complementary gains.

Overall, this dissertation demonstrates that foundation-model embeddings provide an effective backbone for scalable cognitive impairment screening from speech, but that trustworthy deployment requires explicit attention to multilingual generalization, robustness, and interpretability.

Contact Information

Natalia Broz
njb33@drexel.edu

Location

Remote

Audience

Undergraduate Students
Graduate Students
Faculty
Staff