Abstract

Fetal abdominal malformations are serious congenital anomalies that require accurate diagnosis to guide pregnancy management and reduce mortality. Although AI has demonstrated significant potential in medical diagnosis, its application to prenatal abdominal anomalies remains limited. Most existing studies focus on image-level classification and rely on standard plane localization, placing less emphasis on case-level diagnosis. In this paper, we develop a case-level multiple instance learning (MIL)-based method, free of standard plane localization, for classifying fetal abdominal anomalies in prenatal ultrasound. Our contribution is three-fold. First, we adopt a mixture-of-attention-experts module (MoAE) to weight different attention heads for various planes. Secondly, we propose a medical-knowledge-driven feature selection module (MFS) to align image features with medical knowledge, performing self-supervised image token selection at the case-level. Finally, we propose a prompt-based prototype learning (PPL) to enhance the MFS. Extensively validated on a large prenatal abdominal ultrasound dataset containing 2,419 cases, with a total of 24,748 images and 6 categories, our proposed method outperforms the state-of-the-art competitors. Codes are available at: https://github.com/LL-AC/AAcls.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2388_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/LL-AC/AAcls

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiaHua_MedicalKnowledge_MICCAI2025,
        author = { Liang, Huanwen and Xu, Jingxian and Zhang, Yuanji and Huang, Yuhao and Zhang, Yuhan and Yang, Xin and Li, Ran and Deng, Xuedong and Liu, Yanjun and Tao, Guowei and Wu, Yun and Zhao, Sheng and Gao, Xinru and Ni, Dong},
        title = { { Medical-Knowledge Driven Multiple Instance Learning for Classifying Severe Abdominal Anomalies on Prenatal Ultrasound } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {348 -- 358}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper’s main contribution is a novel case-level Multiple Instance Learning (MIL) method for classifying fetal abdominal anomalies in prenatal ultrasound, which uniquely combines a Mixture-of-Attention-Experts (MoAE) module for weighting attention across different planes, a Medical-Knowledge-Driven Feature Selection (MFS) module for aligning image features with medical knowledge, and prompt-based prototype learning (PPL) to enhance MFS. By eliminating the need for standard plane localization and leveraging medical knowledge, the proposed method achieves state-of-the-art performance on a large dataset, offering a more robust and potentially more accurate solution for prenatal diagnosis.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper tackles the challenge of fetal abdominal anomaly classification using a case-level MIL approach that explicitly avoids the need for standard plane localization. This is a significant departure from existing methods that often rely on accurate plane detection as a prerequisite. The diversity in presentation of fetal anomalies and the difficulty of consistently acquiring standard planes, especially at earlier gestational ages, are significant clinical challenges. By eliminating this requirement, the method becomes more robust, potentially more accurate, and less reliant on operator skill. This opens up possibilities for wider application in diverse clinical settings. The design and implementation of the MoAE, MFS, and PPL modules are individual contributions that contribute to the overall novelty of the approach. The MoAE likely improves feature integration across different planes, enhancing the robustness of the model. The MFS aligns the model with medical knowledge, enabling selection with more clinical relevance. Finally, the PPL enhances the MFS modules. The extensive validation on a large prenatal abdominal ultrasound dataset (2,419 cases, 24,748 images, 6 categories) is a significant strength, indicating that the model has been rigorously tested. This contributes to a higher level of confidence in the reported results, providing compelling evidence of the approach’s effectiveness.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Firstly, the innovation of the MIL framework itself is limited. While the application and specific modules are novel, the underlying use of Multiple Instance Learning (MIL) in medical image classification is not entirely new. The abstract mentions prior MIL approaches, suggesting that the core framework is adapted, not invented. The authors should clearly delineate the novel aspects of their MIL implementation compared to existing MIL methods in medical imaging (see, for example, papers [4, 5, 6, 9, 10, 12, 17, 20, 24, 27, 28] for examples of existing MIL techniques).

    Secondly, while integrating medical knowledge is a strength, the details of how this knowledge is represented and integrated require further scrutiny. How do the authors define and represent “medical knowledge”? Is it simply a set of keywords, a structured ontology, or something else? How is this knowledge used to guide feature selection in the MFS module? The paper should elaborate and provide a rationale for the chosen approach, and provide a comparison to other knowledge integration techniques.

    Further points to consider include whether there is any evidence to show that the medical knowledge is indeed helpful. Does the model simply integrate medical knowledge as a “black box,” or does it allow one to see how the medical knowledge and image features combine for prediction? Furthermore, showcasing examples of correctly and incorrectly classified cases, along with visualizations of the relevant image features (the visualizations of results in the paper is blurry, and even when I zoom in, I cannot clearly see the scope of the lesion), would help build trust in the method. Ideally, the paper should include some form of clinical validation, such as a study with radiologists using the system to diagnose cases. Without this validation, it is difficult to gauge how this system would be applied in the real world. To ensure transparency and ethical considerations are fully addressed, could the authors provide more information regarding the dataset? Specifically, what were the reasons for not using public datasets, and how was informed consent obtained from the pregnant women and their families? Is there documentation available to verify that the data collection was conducted ethically and legally?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a promising approach for fetal abdominal anomaly classification using a novel MIL-based framework that circumvents standard plane localization and incorporates medical knowledge for improved robustness and interpretability, supported by a large dataset and state-of-the-art performance claims. However, significant weaknesses persist, including limited novelty in the core MIL framework, vague details on medical knowledge integration, a lack of interpretability analysis, insufficient discussion of dataset bias and generalizability, ethical issues, lack of qualitative validation, limited demonstrations of clinical feasibility, which must be convincingly addressed to warrant acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I just have concerns regarding the dataset.



Review #2

  • Please describe the contribution of the paper

    -Mixture-of-Attention-Experts (MoAE) Module: Employed for weighting attention heads for different anatomical planes to aid in feature integration. -Medical-Knowledge-Driven Feature Selection (MFS) Module: Utilizes medical knowledge to perform self-supervised selection of image tokens, aiming to reduce dependency on standard plane localization. -Prompt-Based Prototype Learning (PPL): Enhances the effectiveness of the feature selection process through a specialized loss function ensuring distinctiveness of selected features.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Innovative Use of CLIP-like Settings: The integration of medical knowledge for prompt learning in image feature selection is an interesting application of a pre-established technique, allowing the model to align more closely with clinical understanding. -Improved Case-Level Classification: Transitioning focus from image-level to case-level classification addresses prevalent issues in prenatal abdominal anomalies detection and aims to improve diagnostic accuracy. -Extensive Validation: The approach has been rigorously evaluated on a large-scale dataset comprising 24,748 prenatal ultrasound images, demonstrating strong performance relative to existing methods.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    -Limited Novelty in Methodology: The work is primarily a mechanical combination of known modules like MoAE and CLIP settings, demonstrating limited novelty in terms of developing new methodologies. -Challenge in Visualization (Fig 4): The visualization presented in Fig 4 is difficult to interpret due to its size and lack of clarity. The statement regarding reducing the influence of normal images suggests selective instance activation, but inconsistencies in the number of retained instances need better documentation and comparison against ground truth to enhance understanding. -Fixed Prompt Limitation: Fixed prompts for each abnormality do not address intra-class variances or the diverse presentation of conditions, potentially limiting adaptability to different instances within a class. -Organizing Ablation Studies: The order of ablation studies is not intuitive, as it skips the evaluation involving only MoAE plus MFS, hindering clarity on improvements from each module specifically. -Metric Justification: Metrics such as sensitivity and precision should accompany F1 scores to provide comprehensive insight. The drop in F1 scores for certain categories with the incorporation of MFS warrants explanation, emphasizing how MFS contributes to the overall methodology and whether its integration significantly impacts performance.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper addresses important clinical needs in prenatal ultrasound anomaly classification, it is burdened by limited methodological innovation. The application and integration of established modules without substantial enhancements restricts the novelty and potential impact. Moreover, the current presentation challenges such as visualization issues and insufficient exploration of metric implications, particularly with MFS, hinder thorough assessment of results. These factors should guide improvement to strengthen the submission. Nonetheless, contributions focusing on case-level classification have merit, suggesting clinical relevance. Therefore, my overall score reflects acknowledgment of clinical potential balanced by methodological and presentation shortcomings.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thanks for addressing my concerns. Overall, this work is interesting and convincing.



Review #3

  • Please describe the contribution of the paper

    This study proposes a classification framework for distinguishing five types of abdominal diseases and normal abdominal ultrasound images. The authors adopt a case-level classification strategy, which is particularly appealing from a clinical perspective.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The case-level classification is well motivated, as standard views are often unavailable in pathological cases. This setting also better reflects real-world clinical practice, where diagnoses rely on multiple views from the entire case rather than isolated images.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The MoAE module [7] appears to be directly adopted from existing work without task-specific modifications.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    1.What specific sections are used for multi-view abdominal disease diagnosis? Figure 1 shows inconsistent sections across cases. 2.The section on Fetal Anomalies Detection highlights several brain disease studies, but offers little discussion on abdominal-related research—please clarify this gap. 3.The definitions of x′ and f in Equation (1) on page 4 are unclear and should be explicitly stated. 4.Page 4 states “We introduce the mixture-of-attention-experts (MoAE) module [7] as shown in Fig. 3,” but it is unclear whether this module is proposed by the authors, directly adopted, or modified from [7]; please clarify.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study is well motivated, and the key frame selection strategy is clinically meaningful and insightful.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The rebuttal effectively addresses the concerns raised during the initial review. Considering the manuscript’s originality and its potential clinical relevance, I continue to support acceptance.




Author Feedback

We thank all the reviewers (R) for reviewing and recognizing our work. We provide explanations to address the comments. The code will be released upon acceptance.

Q1. IRB approval (MetaR1, R1) This study was approved by the IRB and conducted following the declaration of Helsinki. It was registered with the National Clinical Trial Registry on May 10, 2023. The IRB approval claim will be added in the final version.

Q2. Novelty (R1, R2) Most existing MIL techniques were designed for WSI and fail to consider the multi-view nature of US imaging. Moreover, identifying diagnostic images from multiple multi-view images is clinically critical for case-level diagnosis. Our method is simple, effective, and intuitive. (1) The mixture-of-attention-experts (MoAE) improves feature integration across different planes. (2) The medical-knowledge-driven-feature-selection (MFS) aligns image features with medical knowledge and proposes a similarity-based adaptive feature selection method, which enables feature selection that is clinically relevant. Compared with the traditional Top-k selection method, this approach is more flexible. (3) The prompt-base-prototype-learning (PPL) differentiates features across different categories. Our method is not a mere mechanical combination but is thoughtfully designed to meet clinical needs.

Q3. Details of medical knowledge (R1) Our method is inspired by [15, 21, 26] to guide MFS. First, our method calculates the feature similarity between images and medical knowledge. Subsequently, it activates image instances using an adaptive threshold based on feature similarity for guiding the selection of clinically relevant images. Medical knowledge is defined as a sentence about the disease, covering its name, medical definition, and characteristic features in images. [Ref1] demonstrated that rich medical knowledge can enhance the model’s disease perception.

Q4. Why not public datasets (R1) To the best of our knowledge, there is no public dataset for case-level diagnosis of prenatal anomalies. We will discuss with our partner hospitals the possibility of releasing a dataset in the future.

Q5. Experiments (R1, R2) In the ablation studies, the experiment involving only MoAE plus MFS was omitted due to space limitations. Although using only MFS leads to a slight drop in F1. for certain categories, it is undeniable that the Sen. is almost improved, which is crucial for the early anomalies diagnosis. This demonstrates that MFS enhances the model’s focus on anomalies. Due to space limitations, the paper only presents Sen. and F1.

Q6. Visualization and clinical validation (R1, R2) (1) We will revise Fig. 4 to improve understanding, e.g., by adding annotations to indicate lesion locations. (2) Our method allows us to see the activated image instances through the adaptive threshold described in Sec. 2.3. (3) Clinical validation will be supplemented in our future work.

Q7. Suggestions(R1,R2,R3) We thank the reviewers for their suggestions. Due to space limitations, we will add more quantitative and interpretability analysis in journey work. And we will also improve the prompt limitation and MoAE for task-specific, which may help improve the model performance.

Q8. Addition comments (R3) (1) The images presented in Fig.1 include the abdominal transverse plane, renal sagittal and coronal planes. We will annotate anomalies findings in Fig.1 in the final version. (2) To the best of our knowledge, there is limited research on the diagnosis of fetal abdominal anomalies. A recent study [Ref2] addressed the diagnosis of biliary atresia. (3) f is an image encoder, and x’ is feature extracted by f. (4) Some imprecise descriptions will be revised in the final version.

[Ref1] Qin Z. , Yi H. , et al. Medical image understanding with pretrained vision language models: A comprehensive study. arXiv. [Ref2] He F. , Li G. , et al. Transfer learning method for prenatal ultrasound diagnosis of biliary atresia[J]. npj Digital Medicine, 2025.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    There is a concern regarding IRB approval; the authors should confirm appropriate approval for this study.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal addressed most of the reviewers’ concerns, and all reviewers maintained a positive evaluation, indicating that this paper should undoubtedly be accepted. However, some concerns from the reviewers need to be addressed in the final version.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top