Abstract

The clinical diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) primarily relies on scale questionnaires, clinical interviews, and executive function tests, which face challenges including limited medical resources, low diagnostic efficiency, and high dependence on clinicians’ subjective experience. Existing AI-assisted diagnostic approaches based on behavioral analysis lack sufficient result interpretability, hindering their integration with conventional diagnostic workflows and practical clinical application. This paper proposes EDWAR, an Explainable ADHD Diagnostic Framework Using Weakly-Supervised Action Recognition, which establishes a collaborative diagnostic mechanism integrating behavioral analysis with traditional test records. By employing weakly-supervised action recognition methodology requiring only diagnostic labels and video-level annotations of abnormal behaviors, our framework not only achieves high diagnostic accuracy but also provides transparent interpretation through both video-level and timestep-wise anomaly action recognition. Experimental results demonstrate that EDWAR attains superior diagnostic performance while offering convincing and explainable evidence.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2224_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/impAcreat/EDWAR.git

Link to the Dataset(s)

Our dataset: https://github.com/impAcreat/EDWAR.git

BibTex

@InProceedings{FanNin_Explainable_MICCAI2025,
        author = { Fan, Ninghan and Kong, Ming and Huang, Jing and Chen, Bingdi and Zhu, Qiang},
        title = { { Explainable ADHD Diagnostic Framework Using Weakly-Supervised Action Recognition } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {182 -- 192}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The main contribution of this paper is proposing a new ADHD Diagnostic Framework based on a deep learning model to localise anormal movement patterns by analysing skeletal motions extracted from video.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The model is trained using a large dataset with annotations verified by a clinical team as the authors claimed. Providing explainability to deep learning models is well-motivated and visualising abnormal patterns temporally and help clinicians to verify the predicted outcomes.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Evaluation is very weak. While the proposed EDWAR outperformed all other methods in Table 1, those methods were published between 2017 and 2021. I am not convinced those are state-of-the-art in the literature, given the fast pace in AI-related research nowadays.

    The 2nd best method on Table 1, [27], was compared to another ADHD detection approach in:

    C. Nash, R. Nair and S. M. Naqvi, “Insights Into Detecting Adult ADHD Symptoms Through Advanced Dual-Stream Machine Learning,” in IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 32, pp. 3378-3387, 2024, doi: 10.1109/TNSRE.2024.3450848.

    and it significantly outperformed [27].

    Another more recent skeletal motion based ADHD detection model:

    Y. Li, R. Nair and S. M. Naqvi, “Video-Based Skeleton Data Analysis for ADHD Detection,” 2023 IEEE Symposium Series on Computational Intelligence (SSCI), Mexico City, Mexico, 2023, pp. 1-6, doi: 10.1109/SSCI52147.2023.10372062.

    The authors have to compare with more recent methods in the literature to demonstrate the effectiveness of their method.

    There is also no mention of sharing the anonymised data nor the model, so it is unsure how the results can be reproduced by other teams.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The lack of comparisons to the latest methods and non-reproducible results. Details can be found in the weaknesses section.

    In the methodology, it is also unclear why the video is down-sampled to 1 frame per second. Is it really possible to extract meaningful motion features from such a low frame rate?

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    While I appreciate the authors’ effort spent on the rebuttal and improving the reproducibility by sharing an anonymized dataset, the evaluation in the submitted version is not extensive and doesn’t compare to the more recent work.



Review #2

  • Please describe the contribution of the paper

    This work presents the new approach EDWAR, an AI-based, explainable ADHD diagnostic framework which combines behavioral analysis based on videos with traditional test records. A weakly-supervised action recognition component analyses videos for anomaly actions which are then scored and combined with executive function test metrics for diagnosis. Explainability is ensured by highlighting the video sequences containing anomaly actions to the medical practitioners.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel approach for ADHD diagnosis that aims to overcome existing weaknesses of current approaches. Strong aspects of this work are particularly:

    • Explainability: Indication of relevant video sequences makes it useful for application in a clinical setup.
    • Integration with standardized neuropsychological tests is a plus by incorporating medical knowledge and supporting acceptance in the medical community.
    • Extendability: The approach can be easily applied to other, especially neuropsychological, diseases.

    • Writing/Paper Presentation: The paper is well structured and well written.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The dataset is focused on children of the age 6-12 with a strong bias towards men. It is not per se clear that the good results of the presented method also transfers to other patients, especially with regard to the bias towards men. It is also not clear how this narrowed dataset influenced the experimental results, in particular with regard to the worse results for traditional methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Sect. 2, page 5: Can you elaborate a bit more on how the test metrics r look like? Is it simply a vector containing three numbers from the three standardized neuropsychological tests that have been mentioned in Sect. 2.1?
    • Sect. 2.2: What are the different classes c that we can expect?
    • Page 4: Could you briefly define a_i compared to alpha? I guess it is the output of the activation functions?
    • Page 5, last line: “as described in Section 3.1” -> Do you mean Section 2.1?
    • Table 2: Can you elaborate a bit more on the first line? What does it mean that none of the three components is present? What is still left to be tested then?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a very interesting work with good potential to be applied in clinical applications for diagnosis of ADHD. The paper is very well written and structured and has many strong aspects.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a framework for diagnosis of attention deficit hyperactivity disorder (ADHD) with explainability by utilizing both executive function test and action information to conclude the diagnosis. Experiments are conducted on real-world subjects, and presented results show that the proposed method, explainable ADHD diagnostic framework using weakly-supervised action recognition (EDWAR), manages to outperform existing SOTA methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The framework design is very novel. It introduces a weakly-supervised action recognition model to conventional diagnostic protocols so that it enables explainability in addition to good diagnosis performance.
    2. Experiment design is detailed and well explained. Even without public source code, it is well expected that the proposed work has good reproductivity.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Experiment is only conducted on a non-public dataset. To further validate the effectiveness of the proposed method, major experiments should also be conducted on at least one more public dataset (if available).
    2. Only one baseline method (Bert*) uses both executive function test and action information, which is the same as the proposed framework. In other words, there are not sufficient baseline methods that are also multimodal like the proposed method, which reduces the robustness of the conclusion.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I have basically no concerns regarding the framework itself. It looks very promising to me. However, I do have certain concerns regarding the experiment part, as shown in the weakness part. If the concerns can be addressed during the rebuttal (if available), I believe that this paper can be an excellent work.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The rebuttal basically managed to resolve my concerns.




Author Feedback

We sincerely thank all reviewers for their careful evaluation and valuable suggestions, which have significantly improved our work. We appreciate the positive feedback from Reviewers 2 and 3, as well as the constructive comments from Reviewer 1. Below, we address each reviewer’s concerns in detail.

Q1: Evaluation is insufficient, and reproducibility is unclear. We appreciate your suggestion to introduce recent studies, which will strengthen our credibility. We noticed and cited [1], but didn’t compare with it due to its focus on adult ADHD and modality differences. We thank the reviewer for referencing [2], which also solves ADHD diagnosis via skeleton and motion analysis. In contrast, our work only needs video-level rather than frame-level action labeling, demonstrating greater flexibility.

We have reproduced these works and adapt them to our proposed dataset’s modalities. For fair comparison, we also integrated the test metrics (TM) according to our framework into these baselines. The results showed that both works performed more competitively than other baselines, and EDWAR still outperformed them, confirming its competitiveness with the newest SOTAs. According to the conference policy, we cannot provide the detailed results here, but we will include them in the revised version. Besides, we promise to publish the anonymized dataset afterwards to ensure reproducibility.

Q2: Motion feature reliability under low sampling rates. Given the long duration of whole testing sessions (avg. 19.5 min), we adopted a lower sampling rate to balance efficiency and performance. Since the abnormal motions defined in this study typically have extended durations, a 1 fps sampling rate is sufficient for accurate detection. On the test set, EDWAR achieves satisfactory effectiveness of abnormal behavior recognition, demonstrating the effectiveness of 1 fps sampling for this task. The detailed results will be attached in the final version.

Reviewer 2 Q1: Dataset bias. Compared to related works [1,2], our dataset is relatively large. The gender imbalance reflects the clinical reality of higher ADHD prevalence in male children, consistent with prior studies. Besides, EDWAR still has superiority over baselines on the female subset. We will supplement in the final version.

Q2: Composition of test metrics. The test metrics encompass 23 indicators of the three tests’ standard metrics, such as accuracy and reaction time in Stroop, perseverative errors in WCST, and the accuracy of 6 kinds of expression identification, et al. We will provide the explicit definition in the released data.

Q3: Class C in Method. We defined 6 ADHD-related pathological hyperactive actions in Section 2.1, “Training Data Annotation” part, as the anomaly action categories in Section 2.2.

Q4: Definition of a_i. We confirm that a_i denotes the output of the activation function and apologize for the ambiguity. This will be clarified in the upcoming revision.

Q5: Typo on page 5, last line. We appreciate the reviewer’s careful reading and will correct it.

Q6: Experimental setup at the first row of the ablation study. The experiment replaces WSAR with a BERT-based model that predicts abnormality scores and is optimized solely with L_diag. Different from the BERT baseline in Table 1, this approach predicts abnormality scores before disease diagnosis.

Reviewer 3 Q1: Dataset availability. Due to the absence of a widely accepted open-source ADHD diagnosis dataset, we were unable to validate our method on external data. To ensure reproducibility, we will release our code along with the dataset, including skeleton sequences and medical test indicators.

Q2: Insufficient multimodal baseline. We respectfully refer to our detailed response to Reviewer 1’s Q1. We believe it can address the concerns about baseline comparisons.

[1] Insights Into Detecting Adult ADHD Symptoms Through Advanced Dual-Stream Machine Learning. [2] Video-Based Skeleton Data Analysis for ADHD Detection




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a new technique to detect ADHD in children. The reviewer’s comments and the author rebuttal are very interesting. While R2 and R3 have recommended Accept, R1 has leans towards Rejecting the paper for lack of comparison with prior art. I have read the comments and author rebuttal in detail. I am of the opinion that a straighforward comparison with the works refered by R1 will not be fair because of the age of the target population, the difference in the modality of these works and the granularity of annotation. I therefore recommend an Accept.



back to top