Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The early diagnosis of Alzheimer’s Disease (AD) through non invasive methods remains a significant healthcare challenge. We present NeuroXVocal, the first end-to-end explainable AD classification system that achieves state-of-the-art performance while providing clinically interpretable explanations. Our novel dual-component architecture consists of: (1) Neuro, a multimodal classifier implementing a unique transformer based fusion strategy that projects acoustic, textual, and speech embeddings into a common dimensional space for complex cross-modal interactions; and (2) XVocal, a specialized RAG-based explainer that retrieves relevant clinical literature to generate evidence-based explanations. Unlike previous approaches using late fusion or simple concatenation, our architecture enables both robust classification and meaningful clinical insights. Using the IS2021 ADReSSo Challenge benchmark dataset, NeuroXVocal achieved 95.77% accuracy, significantly outperforming previous state-of-the-art. Medical professionals validated the clinical relevance of XVocal’s explanations through structured evaluation. This work advances beyond pure classification to bridge the gap between machine learning predictions and clinical decision-making. Code available at: https://github.com/NNtamp/NeuroXVocal.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0196_paper.pdf

SharedIt Link: https://rdcu.be/eHxcv

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05185-1_40

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/NNtamp/NeuroXVocal

Link to the Dataset(s)

N/A

BibTex

@InProceedings{NtaNik_NeuroXVocal_MICCAI2025,
        author = { Ntampakis, Nikolaos AND Diamantaras, Konstantinos AND Chouvarda, Ioanna AND Tsolaki, Magda AND Sarigianndis, Panagiotis AND Argyriou, Vasileios},
        title = { { NeuroXVocal: Detection and Explanation of Alzheimer’s Disease through Non-invasive Analysis of Picture-prompted Speech } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {410 -- 419}
}

Reviews

Review #1

Please describe the contribution of the paper

the authors proposed a method named NeuroXVocal, a dual-component system that not only classifies but also explains potential AD cases through speech analysis. The classification component (Neuro) processes three distinct data streams: acoustic features capturing speech patterns and voice characteristics, textual features extracted from speech transcriptions, and precomputed embeddings representing linguistic patterns.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Development of an end-to-end holistic framework that seamlessly integrates classification and explanation for diagnosis of potential AD patients.
2. Introduction of explainability component (XVocal), bridging the gap between machine learning predictions and clinical interpretability. Qualitative evaluation through structured questionnaires completed by medical professionals validated XVocal’s ability to produce clear and clinically relevant explanations, validating its potential as a reliable clinical decision support tool.
3. Achievement of state-of-the-art performance with 95.77% accuracy in classification (Neuro) on the IS2021 ADReSSo Challenge benchmark dataset, significantly outperforming existing approaches
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
This field has been extensively studied and researchers have developed numerous solutions that demonstrates better results than the proposed approach.
- The abstract and conclusion are not written properly; the authors need to rewrite it.
- The motivation and contribution of this work needs to be highlighted in proper manner in the introduction section to make it clearer for the readers to comprehend the main theme targeted in this article.
- The contribution of this work is limited since there exists many articles that targeted the same topic. The authors should highlight their contributions in a proper manner by emphasizing on how their work is different from other articles?
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The contribution of this work is limited since there exists many articles that targeted the same topic. The authors should highlight their contributions in a proper manner by emphasizing on how their work is different from other articles?
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

I am not satisfied with the authors reply. Hence, I recommend rejection.

Review #2

Please describe the contribution of the paper

This paper introduces a two‑part system that not only spots signs of Alzheimer’s from a patient’s speech but also backs up its diagnosis with real research. It fuses acoustic, embedding, and textual features through a custom transformer to detect Alzheimer’s Disease. Then, it taps into aRAG setup to pull in and explain relevant clinical studies to explain the reason behind each class label assignment.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Coupling an AD classifier with a RAG explainer that helps in interpretability of the assigned labels and clinician‑friendly insights.
- Using multimodal inputs that actually have helped in improving accuracy of the network according to the ablation study results.
- State-of-the-art classification performance.
- Expert evaluation that I believe is an step to assess real‑world clinical utility and user acceptance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The paper only gives accuracy and F1‑score for the Neuro classifier. I think the authors should include additional metrics, like precision, recall, and AUC, so we can get a more complete picture of the model’s performance.
- I believe it’s unnecessary to define accuracy and F1‑score in detail, since they’re standard metrics that most readers will already understand.
- Table 1 shows only the Neuro classifier’s 5‑fold cross‑validation results, but doesn’t provide the same for the baseline methods.
- The paper doesn’t clearly state whether the baseline methods use all of the 3 modalities or not. So it’s impossible to tell whether NeuroXVocal’s gains come from its three‑stream input or simply from a better fusion architecture.
- The caption for Figure 1 is very brief and doesn’t summarize the dual‑component architecture. A more descriptive caption should briefly outline how the multimodal Neuro classifier and the RAG‑based XVocal explainer interact and how the three feature streams flow through the system, allowing the figure to stand alone.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I believe the paper’s novel combination of a high‑accuracy multimodal AD classifier and a RAG‑based explainer is both technically solid and clinically promising. However, it needs clearer reporting, especially additional metrics (sensitivity, specificity, AUC), baseline modality details, and more informative figures to fully validate and reproduce its results.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors acknowledged the need for additional evaluation metrics and clarified that they selected commonly reported ones for fair comparison with prior work. Importantly, they committed to including precision, recall, AUC, and modality breakdowns in their revised results. They also responded to my comments on figure clarity by agreeing to revise Figure 1 with a more informative caption that better illustrates the interaction between components. Overall, I appreciate their responsiveness and believe these revisions will meaningfully improve the paper’s clarity and reproducibility.

Review #3

Please describe the contribution of the paper

The paper demonstrates improved accuracy in predicting Alzheimer’s Disease (AD) by using a classifier that integrates multiple modalities of speech data. It leverages Large Language Models (LLMs) to enhance the clinical interpretability of predictions, with the explainability assessed through structured questionnaires completed by medical professionals.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

A novel application: While many machine learning models have been developed for disease classification, understanding how to effectively use these predictions in a clinical setting remains a challenge. For instance, it is not yet clear why changes in specific speech pattern features might indicate a higher likelihood of developing AD. The paper explores how to present the most useful features from machine learning models in a way that can better assist clinicians in diagnosing diseases. XVocal offers an opportunity to translate the technical language of models and machines into more accessible language for practical use.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The demonstration of clinical feasibility is lacking clarity regarding the nature of the explanations and which specific aspects are beneficial to doctors. For example, it would be valuable to provide data showing that clinicians might struggle to confidently diagnose AD on their own, but with the explanations provided by XVocal, the diagnosis rate could improve from X to Y, etc. Additionally, the evaluation includes only 20 patient cases; it would be beneficial to assess XVocal’s performance on non-patient cases to minimize false positives. Furthermore, showcasing what the explanation results look like would enhance clinical feasibility
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper introduces a novel approach by integrating multi-modality speech data for AD prediction, enhancing classification accuracy. It also leverages LLM to improve clinical explainability, which is a valuable contribution. However, improvements are needed on clinical feasibility as mentioned above in the weakness section.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely thank the reviewers for their thorough evaluation and constructive feedback on our manuscript. Novelty (R1, R2): We are grateful for the insightful observations regarding our work. To the best of our knowledge our work is the first explainable AD classification method based solely on acoustic, textual & linguistic features achieving SOTA performance on the ADReSSo dataset. Our key innovations include:

Our novel transformer-based architecture implementing a unique multi-modal fusion strategy distinctly different from previous approaches (Fu et al., Lee et al.) Unlike prior methods using late fusion or simple concatenation, we project acoustic, textual features, and speech embeddings to a common dimensional space before fusion, enabling complex cross-modal interactions. Our ablation study confirms this architecture’s superiority.

Our e2e holistic framework uniquely integrates classification and explanation, where outputs from the multimodal Neuro component directly feed into the XVocal explainer. This creates a seamless pipeline where features contribute both to classification and to generating clinically relevant explanations—advancing beyond systems focused solely on classification or explanation.

Our novel XVocal explainer implements a RAG approach tailored specifically for speech-based AD detection. It constructs prompts incorporating the Neuro predictions, acoustic and transcription features used to retrieve relevant clinical literature. Key innovations include: (1) a domain-specific chunking strategy, (2) a specialized prompt template for AD speech markers, and (3) a curated knowledge base of relevant AD research publications.

While achieving SOTA 95.77% accuracy on the ADReSSo benchmark (outperforming Fu et al.’s 90.3% and Lee et al.’s 88.73%), our focus extends beyond mere classification to provide clinical explanations validated by medical professionals. Action: We commit to enhance the abstract, introduction, and conclusion sections to better highlight these contributions. Evaluation (R2): We are deeply grateful for R2’s suggestion regarding additional performance metrics. We wish to clarify that Table 1 presents metrics that were commonly reported across the cited publications to ensure fair comparison. Action: We would be pleased to expand Table 1 to include type of modalities, additional metrics and k-fold validation for our approach and baseline methods where available. Clinical Feasibility (R3): We are grateful to R3 for recognizing the clinical value of our approach. We would like to clarify that representative examples of XVocal’s explanations can be found in the questionnaire link provided on page 6, paragraph 2. Our evaluation deliberately included a balanced sample of 10 AD and 10 CN individuals, as stated in section 3.3. The status of each individual was identified at the beginning of each assessment section. In our revision, we would enhance this section by elaborating on how the system connects acoustic and linguistic features to AD indicators. We are honored to share that, based on the positive feedback from medical professionals during our evaluation, our team has been invited to deploy NeuroXVocal as a clinical application for further validation with actual patients. Paper Organization (R1, R2, MR): We are grateful for the suggestions regarding the manuscript’s organization and clarity. We would be honored to enhance our paper by revising the abstract and conclusion sections to highlight better our contributions and findings, developing a more informative Fig. 1, addressing all typographical errors and removing definitions of standard metrics, creating more space in the manuscript. Conclusion: We are committed to addressing all feedback in our revision. We believe our work makes valuable contributions to both the technical and clinical aspects of AD detection through speech analysis, and we appreciate the reviewers’ guidance in helping us better communicate these advances.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

The reviewers offer many thoughtful comments and insights, attention to which could markedly improve the paper. Also, a thorough n’th proofread (eg “Intoduction”)
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The authors did through rebuttal and addressed the major concerns of the reviewers.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

NeuroXVocal: Detection and Explanation of Alzheimer’s Disease through Non-invasive Analysis of Picture-prompted Speech

Author(s):