Abstract

Delayed treatment of retinopathy of prematurity (ROP) can diminish therapeutic efficacy and may lead to severe, potentially irreversible damage. Automated diagnosis of ROP presents significant challenges, including the detection of subtle early lesions, the variability of clinical phenotypes, and inconsistencies in imaging quality. To address these, which cannot be well addressed by existing general foundation models, we propose structure-aware proxy interaction network (SABPI-Net) within a universal learning framewrok. SABPI-Net incorporates a high-frequency mapping branch, and introduces a proxy interaction attention module to enable effective interaction between its trunk feature encoding branch and the high-frequency mapping branch. This enhances the model’s ability to perceive fine retinal detail structures. Domain-agnostic embedding space self-matching, guided by a memory-bank low-frequency component replacement strategy, facilitates domain-invariant learning and ensures consistent model performance across diverse image styles. In this study, classification task for ROP is conducted on the largest clinical color fundus photography dataset to date, achieving an accuracy of 95.32\%. Extensive experiments further validate the effectiveness and superiority of SABPI-Net in diagnosing ROP diseases.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2355_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{CheSha_SABPINet_MICCAI2025,
        author = { Chen, Shaobin and Zhao, Xinyu and Fu, Huazhu and Tan, Tao and Huang, Jiaju and Xiong, Xiangyu and Wu, Zhenquan and Dashtbozorg, Behdad and Lei, Baiying and Zhang, Guoming and Sun, Yue},
        title = { { SABPI-Net: A Novel Structure-Aware Network for Accurate and Domain-Invariant Retinopathy of Prematurity Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {455 -- 465}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a Structure-Aware Proxy Interaction Network within a universal learning framework for clinical fundus photography analysis. The network features a high-frequency mapping branch to enhance the model’s capacity for fine-grained structural detail extraction and a Proxy Interaction Attention module to facilitate interactions between the trunk feature encoding branch and the HFM branch. Moreover, a domain-agnostic embedding space self-matching strategy, guided by a low-frequency component replacement mechanism via memory bank, enables domain-invariant representation learning. The method is trained and validated on the largest clinical fundus photography dataset to date, and further evaluated on a public dataset (OIA-ODIR), achieving 95.32% accuracy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Dataset Scale and Coverage: The model is trained on a large-scale clinical dataset with broad disease coverage, enhancing its real-world applicability.

    Generalization Evaluation: The authors go beyond internal validation and also test their model on a public dataset (OIA-ODIR), comparing against seven state-of-the-art models, which adds credibility to their claims of robustness and generalization.

    Architectural Design: The framework is novel in design, combining a high-frequency mapping branch for better structural capture and a domain-agnostic embedding strategy to ensure consistent performance across image styles.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Missing Experiments: Although the paper includes comparisons with seven recent SOTA methods, most are from 2023 or earlier. Including comparisons to the most recent methods (2024–2025) would further validate the paper’s relevance in a fast-evolving field.

    Lack of Clarity:

    Figure 1 lacks consistency in naming and abbreviations: key components such as Trunk Feature Encoding (TFE) and Domain-Agnostic Embedding Space Self-Matching (DAESSM) are mentioned in the text but not labeled in the figure, while the High Frequency Mapping (HFM) branch appears in the figure. Clear, consistent labeling between the text and figures would improve readability.

    In Section 4 (Dataset and Implementation), the paper states that “image data is uniformly resized” — but does not specify to what resolution. Furthermore, it would be helpful to include a discussion on how different resizing strategies may impact the results.

    Figure 3(a) could be improved for clarity. The visual layout currently includes two columns labeled “w HFM branch” and two labeled “w/o HFM branch”, but it’s not visually or descriptively clear how they differ. A clearer figure caption or diagram separation would help.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the proposed method demonstrates promising results and introduces a novel structure-aware proxy interaction network trained on a large dataset, I have some concerns regarding the novelty, clarity of presentation, and the completeness of the evaluation. Given its potential, I lean toward a weak accept.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose SABPI-Net, a novel architecture for automated diagnosis of retinopathy of prematurity that addresses disease complexity and image variability. It combines a high-frequency mapping branch with a trunk feature encoder via a proxy interaction attention module, and employs domain-agnostic self-matching to ensure consistent performance. Evaluated on a large dataset (~170,000 fundus images) collated by the authors, the method demonstrates strong, robust performance and outperforms existing approaches.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The study design is robust, featuring comprehensive comparisons with previous methods and detailed ablation studies to assess the impact of each proposed module.
    • The introduction is engaging and well-written, providing clear context on the clinical problem and relevant prior work.
    • The dataset is large and a great step to enables development and evaluation of machine learning models for ROP —commendable work in assembling it.
    • The topic is highly clinically relevant and addresses a critical need in early diagnosis and treatment planning for retinopathy of prematurity.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • In Figure 3, it’s unclear what the two “with and without HFM” comparisons represent, as the input image appears identical—are these visualizations also reflecting varying PIAM block usage (as in b)? Clarifying this in the figure or caption would be helpful.
    • While the dataset is a major strength, potential biases—such as single-site imaging protocols or population homogeneity—are not fully addressed. Notably, performance drops on external datasets (e.g., OIA-ODIR) suggest limited generalizability. What are the main factors affecting this, and are there plans to expand the dataset to improve robustness?
    • With a strong model and large dataset, the paper could go further in outlining the path toward clinical translation. What steps are needed for SABPI-Net to be usable in practice? Further discussion of interpretability and clinical trust in model outputs would be valuable.
    • Given the overall high performance, the paper would benefit from analysis of failure cases or diagnostic uncertainty. How does the model handle borderline or low-quality images, and are there systematic biases in misclassifications?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper tackles a clinically relevant question, presenting an interesting methodology and leveraging a large, institution-collected dataset to address it. It is generally well-written and thoughtfully designed, featuring comprehensive comparisons to prior work and detailed ablation studies.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is a new deep learning model called SABPI-Net for detecting Retinopathy of Prematurity (ROP) from eye images. The model can recognize fine details in the retina using a special high-frequency branch. It also uses a smart system to improve how features are learned, enhances interaction between detailed and general features for better feature learning. Finally, it includes a method to make sure the model works well on images from different devices or hospitals. This makes the model more accurate and reliable in real-world settings for accurate and generalizable ROP diagnosis.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel Architecture – SABPI-Net: The paper introduces a new network architecture specifically used for Retinopathy of Prematurity (ROP) diagnosis. It integrates a High-Frequency Mapping (HFM) branch with a Trunk Feature Encoding branch, which is a novel design for enhancing fine-grained retinal structure recognition-crucial for early-stage ROP detection.
    2. Proxy Interaction Attention Module (PIAM) – Novel Feature Fusion Strategy: The PIAM module is a unique contribution that uses proxy tokens from high-frequency features to influence the main image tokens. This two-stage attention mechanism allows detailed structural cues to be distributed back into the global feature map, which is not seen in existing ROP models, making it a clear methodological novelty. 3.Strong and Comprehensive Evaluation: The model is benchmarked against seven state-of-the-art methods including ConvNeXt, Swin-S, ViT-B etc.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1.The idea of using proxy tokens for cross-branch attention is interesting, but the paper does not explain why the specific pooling method or number of proxy tokens (CP) was chosen. 2.The definitions of ‘Other’ and ‘Any Stage’ classes are not clearly explained.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper should be accepted due to its strong methodological contributions, particularly the novel integration of high-frequency mapping, proxy interaction attention, and domain-agnostic learning in the SABPI-Net architecture, which is appropriate for the complex task of ROP diagnosis. It is trained on the largest known ROP dataset (174,540 images) and effectively addressing rare but critical categories such as A-ROP and laser-treated ROP. The evaluation is comprehensive, outperforming seven state-of-the-art models across multiple metrics. The presentation is well-structured. However, it does not include external testing on other hospitals, and it does not clearly show how its frequency-based method is different from older studies. Despite these small issues, the paper shows clear value and should be accepted.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

N/A




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top