Abstract

The blind sweep ultrasound protocol, coupled with artificial intelligence (AI), offers promising solutions for expanding ultrasound availability in low-resource settings. However, existing AI approaches for gestational age (GA) prediction using bind sweeps face challenges like reliance on manual segmentation, computational inefficiency from high frame volume, and suboptimal sampling strategies that compromise performance, particularly with smaller datasets. We propose SelectGA, a novel framework for automated blind sweep analysis that enables effective fine-tuning of pretrained models through adaptive frame selection for GA prediction. Our approach identifies the most informative and least redundant frames, enhancing both training efficiency and prediction accuracy. Validated on data collected from ultrasound devices in diverse resource environments, SelectGA improves gestational age prediction accuracy by 27% on mean absolute error metrics. These results demonstrate substantially improved generalizability, establishing foundations for sustainable AI adoption in prenatal care across resource-constrained settings. Code is available at: https://github.com/tanya-akumu/selectGA

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3136_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/tanya-akumu/selectGA

Link to the Dataset(s)

N/A

BibTex

@InProceedings{AkuTan_Adaptive_MICCAI2025,
        author = { Akumu, Tanya and Elbatel, Marawan and Campello, Victor M. and Osuala, Richard and Martin-Isla, Carlos and Valenzuela, Ignacio and Li, Xiaomeng and Khanal, Bishesh and Lekadir, Karim},
        title = { { Adaptive Frame Selection for Gestational Age Estimation from Blind Sweep Fetal Ultrasound Videos } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {2 -- 12}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This study proposes a method for evaluating gestational age in a blind-sweep ultrasound protocol that can be used in resource-limited settings. The method streamlines an anatomy detector and diversity-guided frame selector to estimate gestational age in the most informative frames of the sweep. The method is trained and evaluated on scans from 150 patients acquired at two institutions and estimates gestational age estimated within 1 – 2 weeks of ground truth on average.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The study is well organized, easy to follow, and includes pseudocode.

    The dataset includes a substantial number of patients (150) scanned at two institutions.

    Although the components of the algorithm are not entirely novel, the idea of streamlining an anatomy detector, diversity-guided frame selector, and gestational age estimator is interesting and intuitively motivated.

    Gestational age estimation is evaluated with respect to both trimester and institution, and the performance is promising with respect to clinically acceptable ranges.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    It is not entirely clear if the training/validation/testing split is carried out at the patient level or the video level. Table 1 suggests the split may be carried out at the video label, such that videos from the same patient could be mixed in the training and testing sets. This needs to be clarified.

    The comparison methods are not necessarily expected to be well suited for gestational age prediction. It is unclear if and how they are adapted for this task, particularly EchoNET. This weakens enthusiasm for the superior performance of the proposed method.

    It seems that three previous studies (references 8, 15, 20) report higher accuracy in estimating gestational age than the proposed method. Therefore, it would have been more compelling to show quantitative results comparing the proposed method to uniform and random frame selection. It is unclear that the proposed method is an improvement upon previous work.

    It is expected that the confidence threshold (alpha) and number of clusters (K) could substantively impact performance. This was not evaluated.

    Minor: One of the ultrasound scanners used was not a point-of-care (handheld) system, which is less ideal for testing an algorithm for resource-limited settings. Computational efficiency was also not evaluated.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach to automated frame selection for gestational age prediction is interesting, the size of the dataset is good, and the paper is well written. However, it is not clear that the proposed method improves upon previous cited work.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The proposed strategy for automated frame selection for gestational age prediction is interesting and useful, and the potential clinical application is timely. My primary concern about this work is that the reported accuracy is not as high as three other referenced papers. However, the authors have clarified in the rebuttal that the datasets and model weights from those papers are not publicly available for comparison. So I believe the best effort was made to validate the proposed method.



Review #2

  • Please describe the contribution of the paper

    This paper introduces SelectGA, a novel framework for estimating gestational age (GA) from blind sweep ultrasound videos. The primary contribution is an adaptive frame selection approach that identifies the most informative and least redundant frames from ultrasound sweep videos, enhancing both training efficiency and prediction accuracy. The authors develop a two-stage framework: (1) Adaptive Frame Selection, which employs an anatomically guided selector to filter frames containing fetal structures, followed by a diversity-guided selector that identifies the most dissimilar representative frames; and (2) GA Prediction, which utilizes the selected optimal frames in a deep learning model to estimate gestational age. The authors evaluate their approach on a multi-center dataset collected from two geographically diverse settings, demonstrating significant improvements over existing methods. SelectGA achieves a 27% improvement in mean absolute error compared to baseline approaches, with 63.9% of predictions falling within 7 days of the ground truth. The framework shows particularly strong performance in resource-constrained settings and across different trimesters, making it especially valuable for expanding ultrasound accessibility in low-resource environments.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel Adaptive Frame Selection Methodology: The paper introduces an innovative approach to frame selection from blind sweep ultrasound videos that combines anatomical guidance with diversity-based selection. This addresses a critical challenge in blind sweep analysis – the presence of numerous uninformative or redundant frames – in a computationally efficient manner.
    2. Clinical Relevance and Practical Application: The work directly addresses a significant healthcare challenge in low-resource settings by enabling GA estimation from blind sweep ultrasound videos that can be captured by minimally trained healthcare workers. The reported accuracy (63.9% of predictions within 7 days) approaches clinical utility thresholds.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Limited Dataset Size: The total dataset consists of only 162 study scans from two centers (114 from one center and 48 from another). This relatively small sample size raises questions about the generalizability of the results to wider populations and clinical settings. Similar studies, such as Pokaprakarn et al. [20] cited in the paper, utilized much larger datasets (109,806 videos).
    2. Insufficient Technical Details on Object Detector Training: While the paper mentions that the Faster R-CNN object detector was fine-tuned on 500 images from open-source fetal ultrasound standard planes, it lacks details on the specific training protocol, performance metrics of the detector itself, and how different detector performances might impact the overall system.
    3. Limited Analysis of Computational Efficiency: Though computational efficiency is mentioned as a motivation, the paper does not provide quantitative analysis of inference time, model size, or memory requirements compared to baseline methods. This information would be particularly relevant for deployment in resource-constrained settings.
    4. Incomplete Handling of Uncertainty: While the paper briefly mentions a preliminary uncertainty analysis through clustering initialization variance, it lacks a comprehensive approach to quantifying and communicating prediction uncertainty, which is crucial for clinical decision-making, especially in challenging cases.
    5. Absence of Clinical Validation: The paper lacks validation by clinicians to assess whether the improvements in quantitative metrics translate to clinically meaningful benefits. Input from sonographers or obstetricians would strengthen the clinical relevance claims.
    6. Unclear Use of the ‘Blind Sweep Protocol’: The paper doesn’t clearly define or standardize what constitutes a “blind sweep protocol” beyond stating that it involves “pre-defined sweeps over the maternal abdomen without real-time visualization.” More details on the protocol standardization would help with reproducibility.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Expanding the dataset size or validating on an external dataset to demonstrate broader generalizability.
    2. Providing additional details on the object detector training and performance, including the types of fetal structures it was trained to detect and its accuracy metrics.
    3. Including a more thorough analysis of computational requirements and model efficiency, which would strengthen the case for deployment in low-resource settings.
    4. Developing a more comprehensive uncertainty quantification framework to help clinicians interpret prediction reliability.
    5. Considering clinical feedback on the results, perhaps through a small reader study with obstetricians or sonographers.
    6. Exploring how the proposed method performs with different ultrasound device qualities, as low-resource settings often have access to lower-quality imaging equipment.
    7. Providing more detailed analysis of failure cases to understand the limitations of the current approach.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend a weak rejection of this paper based on several significant concerns, though I believe the authors could potentially address these issues in their rebuttal. While the paper presents an interesting approach to gestational age estimation from blind sweep ultrasound videos, there are substantial limitations that undermine its current contribution. The most critical concern is the extremely limited dataset size of only 162 study scans across two centers. This is particularly problematic when compared to prior work such as Pokaprakarn et al. [20], which utilized over 109,000 videos. Such a small sample raises serious questions about the generalizability and robustness of the reported results. Statistical significance and reliability of the performance improvements are difficult to establish with such limited data. The technical aspects of the object detector training lack sufficient detail. Without clear information on how the detector was trained, evaluated, and optimized, it’s difficult to assess this critical component of the system. The performance of the anatomically guided selector directly impacts the overall framework, yet the paper provides minimal insight into this aspect. The paper claims advantages for resource-constrained settings but fails to quantitatively analyze computational requirements. Without comparing inference time, model size, or memory usage to baseline methods, the practical benefit for low-resource deployment remains unsubstantiated. Furthermore, the lack of clinical validation is a significant weakness for a clinically-oriented contribution. Without input from medical professionals regarding the clinical utility of the improvements, it’s difficult to assess whether the statistical gains translate to meaningful clinical benefits. The uncertainty handling in the paper is minimal, consisting only of a brief mention of clustering initialization variance. For a medical application where prediction confidence is crucial, this represents a substantial limitation. While the authors have developed an interesting approach to frame selection that shows promise in their limited experiments, these fundamental issues need to be addressed before the work reaches the quality threshold expected for MICCAI. If the authors can substantially address these concerns in their rebuttal—perhaps by providing additional validation on external datasets, more detailed analysis of computational efficiency, or clinical assessment of results—the paper could be reconsidered.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The author addressed most of my concerns.



Review #3

  • Please describe the contribution of the paper

    The authors claim to have come up with a novel model “SelectGA” to adaptively select frames from a blind sweep ultrasound video. The hypothesis is that since the blind sweep cines are prone to having a lot of redundant frames, selecting a few distinct frames with useful fetal information and only using those to do estimation would improve the predictions of gestational ages. They have introduced a Gestational age prediction model using SelectGA, and claim that this model outperforms existing methods especially when there is low data available for training.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of looking for keyframes with fetal structures and then selecting a few of those through clustering in the feature space - although not novel - is a good strategy, and the authors have shown it to be effective in increasing gestational age predictions.

    The evaluation has been performed well with comparison to different methods, stratified by trimesters and combined dataset. An ablation study of the proposed model was also provided showing improvements incremental improvements in predictive value when WAA, anatomically guided frame selection, and diversity guided selection were added.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There’s at least one paper that does keyframe detection in fetal ultrasound videos by relatively new sonographers. It’s called MMSummary - Multimodal Summary Generation for Fetal Ultrasound Video. It’s not exactly the same work, but is very related to the authors work here. It would be great to include it (and any other work on keyframe detection) in prior work.

    Dataset: It would be nice to show distribution of gestational ages in the dataset. Why is training over smaller dataset crucial, when you could augment the dataset to the existing datasets? Because of anonymity, it’s not clear how apart the geographical centers are.

    Method: Frame selection threshold alpha is set to 0.25. Isn’t this very low? Effectively it seems that we are saying that any frame with any kind of biometry (even with low confidence) should be included in clustering. Number of clusters is set to 16. How “far” were these clusters from each other? i.e. did these clusters define some particular anatomies?

    Results: Would be great to see some performance metrics here. How much time does it take to run an inference on a full cine to get to the prediction? Based on this, what are your plans on making this method to be used clinically, and what are the challenges that you might face? Was the prediction done on a full set of sweeps or only given one sweep? Clinically - how would you expect this prediction to work?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Nice idea - however I am curious about the performance and how it compares to some of the methods.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like the idea, and the experiments and methods look good. The paper is written well. However, this would have been more interesting if the SelectGA was run on one of the datasets that are reporting results. I’m concerned because the MAE, <7d% and <14d% results reported are way off from what were reported in the papers (ex Resnet50).

    Also, although there seems to be an “improvement” in predictive value of the method, depending on the <7d% and <14d% may suggest more outlier GA predictions. I would like to see the distribution of the predictive GAs.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers (R1, R2, R3) for their constructive feedback. We appreciate the positive recognition of our work as a “Novel Adaptive Frame Selection Methodology”(R3), “intuitively motivated”(R1) with “great clinical potential”(R2). Below we address the main concerns: 1)Improvement on previous works (R1, R2) Prior works [8,15,20] report varying performance, primarily due to differences in datasets, none of which are publicly available. Additionally, the model weights are not released, preventing direct comparison. To ensure a fair comparison under the same data constraints, we re-implemented their methods on our dataset. For EchoNet[17], we replaced the final LVEF regression layer with our GA regression head and fine-tuned the entire model end-to-end using our loss function, keeping the same backbone architecture. Similar adaptations were performed for Resnet50[20], USFM[13], ViFi-CLIP[21], and Qwen-VL[3]. The default fine-tuning does not use any special techniques, which allows us to guarantee a fair comparison across all methods. The baseline implementation includes the sampling method used in the original works: random sampling for [20,13] and uniform sampling for [17,3,21]. Thus, Table 2 provides quantitative results on the different sampling methods on the same dataset. Our adaptive selection method demonstrates quantitative superiority (27.1% improvement in MAE), improving on the state-of-art.

2) Dataset, Computational efficiency & low-resource focus (R2,R3) To clarify, our dataset splits were performed at the patient level ensuring no data leakage across splits. We acknowledge our dataset size (162 scans, 1,314 videos) is smaller than [20], but as noted by R1, it is substantial and multi-center across countries. We emphasize that our work targets the low-data regime, which is a challenge in medical imaging for low-resource settings. Our method demonstrates robust performance precisely in these challenging conditions. We addressed low-resource constraints along multiple dimensions: a) Protocol applicability: Our framework is designed for blind sweep protocol which is explicitly developed for resource-constrained settings where expert sonographers are unavailable. Unlike standard protocols requiring extensive training, blind sweeps can be performed by minimally trained personnel, making our approach inherently aligned with low-resource deployment [6]. b) Computational efficiency: SelectGA processes only 16 frames per video (21,024 frames total) versus the full 245,048 frames in our dataset, reducing computational complexity by 91.4%. This translates to 10x reduction in processing time and memory requirements.

3) Parameter selection of K and α (R1,R2) R2 raised a concern on the confidence threshold (α=0.25). This lower value is necessary for blind sweep videos, which differ from cineloop videos despite both being in the fetal ultrasound domain. Blindsweeps are acquired by minimally trained staff, resulting in fewer clear, high-quality standard views and greater variability. Consequently, model confidence tends to be lower on blindsweeps due to increased anatomical ambiguity and protocol-induced domain shift. The lower threshold maximizes detection of potential biometry frames, which are then refined by our diversity-guided selector. For K=16, we selected this value after testing the range 8-24, finding it to be the minimum that maintained performance (variations <5% MAE for K>16) while optimizing computational efficiency.

4) Clinical validation and Blindsweep protocol (R3) While formal clinical validation is planned for future work, our method’s performance is clinically significant according to ISUOG guidelines [23]. The blind sweep protocol is extensively defined in Section 2.1 and we will provide our full source code, which includes the implementation of the object detector, for full reproducibility.

We believe these clarifications address the main concerns and demonstrate the value of our contribution.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The reviewers provided a split assessment of this paper, with one leaning toward weak acceptance and two suggesting weak rejection. I recommend a rebuttal.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    reviewers agree the paper is well written and the idea is very novel, experiments also show good results. The authors have addressed the reviewer concerns well, and have agreed accept post rebuttal



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top