Abstract

Despite advances in deep learning, current automated methods for strabismus classification face two key challenges: limited interpretability and a lack of focus on strabismus subtypes. These issues undermine clinical trust, hinder practical adoption, and limit personalized treatment. To address this, we propose a Causality-Inspired Graph Neural Network (CI-GNN) framework that identifies causally related visual features from eye regions and constructs a graph structure for robust prediction, moving beyond reliance on raw image pixels. This causality-driven design enhances both interpretability and clinical relevance by providing more transparent diagnostic outcomes. We also establish a representative benchmark for strabismus subtype classification, focusing on deviation direction and horizontal angle variation (e.g., A/V-pattern). Experiments show that our method achieves state-of-the-art accuracy—89.8% and 88.1% on the two subtype tasks, respectively. Furthermore, by incorporating the SHapley explanation technique, CI-GNN offers clinician-friendly diagnostic evidence. Leveraging sparse causal features, the framework requires only 0.0003 GFLOPs, making it highly efficient and suitable for edge deployment. Overall, this work demonstrates the potential of integrating causal knowledge with GNNs to significantly enhance the performance, efficiency, and interpretability of strabismus diagnosis, offering promising directions for intelligent medical applications.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3330_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZheJia_CausalityInspired_MICCAI2025,
        author = { Zheng, Jiawen and Luo, Li and Zhuang, Jiafan and Wei, Peiwei and Zhong, Lihao and Xie, Xiaoling and Guo, Jinming and Xie, Meng and Kang, Xiaoli and Cen, Jie and Dong, Lingyan and Zheng, Ce and Fan, Zhun},
        title = { { Causality-Inspired Graph Neural Network for Interpretable Strabismus Subtype Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {108 -- 118}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a Causality-Inspired Graph Neural Network (CI-GNN) for strabismus subtype classification that addresses two significant challenges in automated strabismus diagnosis: limited interpretability and insufficient research on strabismus subtypes. The proposed framework automatically identifies causally related visual features from eye regions, constructs graph-like structures for predictions, and provides clinically interpretable diagnoses. The approach achieves state-of-the-art performance while being computationally efficient and offers a valuable contribution to the field of medical diagnosis through the integration of causality with deep learning techniques.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Methodological Integration: Successfully combines causal discovery with graph neural networks, creating a bridge between statistical learning and domain knowledge.

    Clinical Usage: Automatically discovered causal features align with established clinical guidelines for strabismus diagnosis.

    Interpretability: Provides clinician-friendly explanations highlighting specific abnormal features, vastly improving on traditional heatmap approaches.

    Evaluation: Includes comparisons with state-of-the-art models, cross-environment validation, detailed ablation studies, and interpretability analysis.

    Computational Efficiency: Achieves a 1/100,000 reduction in computational cost compared to ViT models, making it suitable for resource-constrained environments.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Dataset Limitations: Insufficient discussion of dataset demographics, acquisition conditions, and potential biases that might affect generalizability. Insufficient Analysis of Failure Cases: Limited examination of when and why the model fails, which would provide valuable insights into its limitations. Need for More Clinical Validation: Would benefit from structured clinical validation measuring how interpretable outputs influence real clinical decision-making. Limited Discussion of Causal Discovery Limitations: Inadequate exploration of the assumptions and limitations of the causal discovery process used.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper represents a significant advancement in interpretable AI for medical diagnosis. Its strengths—particularly the seamless integration of domain knowledge with machine learning, dramatic improvement in computational efficiency, and focus on clinically meaningful subtypes—outweigh the limitations. The CI-GNN framework effectively addresses the “black-box” nature of traditional deep learning approaches while maintaining high performance. Though there are areas for improvement (more comparisons with similar approaches, better dataset characterization, failure case analysis, and clinical validation), the methodology has potential applications beyond strabismus to other medical imaging domains. Overall, this work makes an important contribution toward more interpretable, efficient, and clinically relevant AI systems for medical diagnosis.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces the Causality-Inspired Graph Neural Network (CI-GNN) framework to address two key challenges in automated strabismus diagnosis: the lack of focus on clinically important subtypes and the limited interpretability of existing deep learning models. The proposed method extracts high-level features from nine-gaze photographs, employs causal discovery algorithms to select a subset of diagnostically relevant features, and uses a Graph Neural Network (GNN) operating on these sparse features to perform subtype classification. The authors claim SOTA accuracy, enhanced interpretability using SHAP, and high computational efficiency.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1)Address key clinical needs: The work commendably targets two significant gaps in automated strabismus analysis: classifying clinically relevant subtypes (direction and angle variation types), which is essential for treatment planning, and improves the interpretability of diagnostic models to foster clinical trust (mentioned in Abstract, Introduction). 2) Novel methodological pipeline: The proposed pipeline, which integrates (1) clinically-informed feature extraction (Section 2.2, Table 1), (2) formal causal discovery for feature selection (Section 2.3), and (3) GNN modeling across multiple gaze positions using these selected features (Section 2.4), presents a novel approach for this specific task. The idea of using causality to guide feature selection for interpretability is interesting. 3) Strong classification performance and efficiency: The proposed CI-GNN framework demonstrates superior classification accuracy compared to standard deep learning baselines (VGG, ResNet, ViT, SwinViT) on their custom benchmark dataset for both direction and angle subtype tasks ( Table 3). Furthermore, the reported computational efficiency (0.0003 GFLOPs) is outstanding, making the approach potentially suitable for deployment on resource-constrained or edge devices (Section 3.3). 4) Inclusion of cross-validation and ablation: The experimental design includes cross-environment validation on data from another institution, providing some evidence of generalizability (Table 3). Crucially, an ablation study is performed to demonstrate the positive contribution of each main component of the framework (Table 4).

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) Insufficient validation of interpretability: A central claim and major motivation of the paper is providing enhanced, “clinician-friendly diagnostic evidence” (Abstract, Section 3.3). However, the validation of this claim is lacking, since it solely relies on a qualitative comparison between SHAP explainations and heatmap from only two examples. There are no quantitative metrics or user studies involving ophthalmologists used to assess interpretability. 2) Limited depth in causality analysis: Although the manuscript repeatedly emphasizes “causality” both in the title and throughout the text, the causal discovery algorithms essentially serve as advanced feature selection techniques (identifying Markov blankets). There is no explicit validation or interpretation of the learned causal graph (DAG). The stability and robustness of the feature selection process are also not assessed. 3) Potential reliance on initial feature engineering: The framework’s performance depends on the quality and completeness of the initial set of candidate features extracted based on clinical guidelines (Section 2.2, Table 1). While interpretable, this step introduces potential bias if relevant features are missed during this manual design phase. 4) Necessity and complexity of GNN: The manuscript employs a GCN model on a fully connected graph comprising only nine nodes with relatively low feature dimensionality (as indicated in Table 2). While ablation studies (Table 4) demonstrate superior performance of GDM over MLP, the manuscript does not provide comparative analyses against other potentially suitable architectures for sequential or set-based data (such as RNNs or simpler aggregation methods). 5) Lack baseline model details and dataset accessibility: The manuscript does not clearly describe how baseline models (e.g., VGG, ResNet) are applied to the nine-position ocular images—whether through direct concatenation or individual processing followed by fusion. The main results are based on a in-house created dataset (Section 3.1), and its public availability is not confirmed, thus hindering reproducibility and independent verification.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite addressing important clinical problems with a novel and efficient approach, and including valuable experiments like cross-validation and ablation studies, the paper suffers from some obvious weaknesses needed further rebuttal for meeting the standards of MICCAI.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors explain and address most of concerns. Although this is a newly well-motivated task based on key point features and causal reasoning, the methodological novelty is moderate and need to dig in further. Although the authors emphasizes their indicators are consulted with experienced ophthalmologists in single-source, they should provide more details or open-source code or data to the commity for full reproducibility and clinical validation and reduce the hypothesis bias.



Review #3

  • Please describe the contribution of the paper

    The paper presents an integration of causal discovery and graph neural networks for strabismus subtype classification, aiming to improve interpretability and diagnostic relevance. By leveraging structured features from nine-gaze images and employing SHAP for explanation, the approach offers a more transparent alternative to conventional deep learning models. The focus on subtype classification rather than binary detection is clinically meaningful, and the model demonstrates competitive performance with low computational demands. While promising, the real-world impact would depend on further clinical validation and broader deployment.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a Causality-Inspired Graph Neural Network for strabismus subtype classification, combining causal discovery with graph neural networks in a novel and clinically motivated way. Instead of relying on raw image data, it identifies causally relevant visual features from nine-gaze photographs and structures them into a graph that mirrors clinical assessment practices. This improves interpretability and aligns the model’s reasoning with ophthalmologic guidelines.

    A notable strength is the focus on fine-grained subtype classification—such as deviation direction and angle patterns—rather than simple binary diagnosis. The use of SHAP-based explanations adds transparency, offering clinicians clear, feature-level insights into the model’s predictions. Evaluation is thorough, with comparisons against strong baselines and validation on external clinical data. The model also achieves high efficiency, making it well-suited for use in low-resource settings. Overall, the work is a well-executed step toward more interpretable and deployable AI in ophthalmology.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper’s primary methodological contribution—the integration of causal discovery with graph neural networks—is adapted from existing techniques rather than entirely novel. The causal feature selection process builds on prior work such as the HCM algorithm, and while its application to strabismus is new, the approach itself is not a significant advance in causal inference methods. In terms of application, the novelty also here lies more in the subtype classification and use of structured features, which, while useful, represents an incremental step rather than a fundamentally new direction.

    The paper also lacks a detailed discussion of real-world clinical feasibility. Although the model is efficient and outputs interpretable features, there is no validation from clinical users or assessment of how the required nine-gaze images fit into routine workflows, especially in lower-resource settings.

    Finally, the evaluation, while competitive, could be more comprehensive. Metrics beyond accuracy, such as sensitivity, specificity, or AUC, would better reflect clinical utility. The absence of comparison to clinician performance further limits the practical interpretation of results.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a well-motivated and interpretable approach to strabismus subtype classification using causal discovery and graph neural networks. It shows strong performance, clinical relevance, and efficiency. However, the methodological novelty is moderate, and details for full reproducibility and clinical validation are limited. Overall, it offers a meaningful contribution with room for further development.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely thank all reviewers for their constructive feedback. We carefully address each major concern below. Q1: Limited novelty in causal inference We appreciate the concern regarding novelty. While our work does not propose a new causal inference algorithm, its contribution lies in the first systematic application of causal inference to strabismus classification. We designed a generalizable framework that complatible with multiple mainstream causal discovery methods,it is scalable and adaptable and enabling interpretable and clinically meaningful analysis. To the best of our knowledge, this is the first causal framework tailored to this domain, offering both practical utility and theoretical grounding. Q2: Lack of detailed baseline comparison The baseline models (CNN/ViT) treat the task as standard image classification using Nine-Gaze Photographs (Fig. 3), where each sample is a full gaze image. In contrast, our method extracts structured clinical features from key points and applies causal reasoning, enabling interpretation aligned with physiological mechanisms. We have added further clarification in the revised manuscript. Q3: Insufficient discussion of limitations in causal discovery Our causal discovery relies on standard assumptions: (1) causal relationships follow a mixed Structural Equation Model (SEM); (2) sample size and variable count are within practical limits; (3) no feedback loops exist in the causal graph;(4) independence of noise term. These assumptions are well met in our study, given the stable geometry of eye regions and sufficient sample size (1,075). We have clarified this in the manuscript. Q4: Lack of analysis on failure cases We acknowledge that misclassifications can occur when gaze direction is unreliable due to poor patient cooperation. Addressing this issue, for example through improved gaze quality assessment or error-tolerant modeling, will be a focus of future work. Q5: Insufficient validation of interpretability All the initial model features are derived from established clinical indicators and finalized through repeated consultation with experienced ophthalmologists (Table 1). This ensures that each decision component is interpretable within the clinical workflow. Furthermore, in Section 3.2, we have conducted a detailed discussion on the key variables obtained by the learned causal graph. Q6: Lack of discussion on real-world deployment Our full pipeline—from image acquisition to diagnosis—is lightweight and can run on low-cost devices (e.g., Raspberry Pi with a basic camera). This ensures practicality in resource-constrained settings. We emphasize this in the updated discussion section. Q7: Lack of discussion on further clinical validation We fully agree with the importance of clinical validation. While our current focus is on the construction and preliminary verification of the proposed framework, we have already incorporated clinical considerations by evaluating interpretability through performance metrics and cross-environment generalization. In future work, we plan to conduct more comprehensive clinical studies, including user studies with ophthalmologists and systematic comparisons with expert diagnoses, to assess how explainable outputs—especially feature attributions—can support clinical decision-making. Q8: Future Work and Additional Suggestions We appreciate the reviewers’ suggestions and plan to: (1) combine domain knowledge and data-driven methods for automatic selection of initial features, (2) release our code and dataset to the community, (3) add new baselines (e.g., RNN, set-based models), and (4) expand metrics (e.g., sensitivity, specificity, AUC). In summary, our work introduces a clinically grounded and extensible causal framework for strabismus diagnosis. It enhances interpretability, aligns with real-world clinical workflows, and offers a foundation for further research in medical AI and causal modeling.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This is a high-quality paper that presents a novel, efficient, and clinically relevant framework for an important diagnostic task. The reviewers are in clear agreement on its merits.



back to top