Abstract

Fetal motion is a critical indicator of neurological development and intrauterine health, yet its quantification remains challenging, particularly at earlier gestational ages (GA). Current methods track fetal motion by predicting the location of annotated landmarks on 3D echo planar imaging (EPI) time-series, primarily in third-trimester fetuses. The predicted landmarks enable simplification of the fetal body for downstream analysis. While these methods perform well within their training age distribution, they consistently fail to generalize to early GAs due to significant anatomical changes in both mother and fetus across gestation, as well as the difficulty of obtaining annotated early GA EPI data. In this work, we develop a cross-population data augmentation framework that enables pose estimation models to robustly generalize to younger GA clinical cohorts using only annotated images from older GA cohorts. Specifically, we introduce a fetal-specific augmentation strategy that simulates the distinct intrauterine environment and fetal positioning of early GAs. Our experiments find that cross-population augmentation yields reduced variability and significant improvements across both older GA and challenging early GA cases. By enabling more reliable pose estimation across gestation, our work potentially facilitates early clinical detection and intervention in challenging 4D fetal imaging settings.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3296_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/3296_supp.zip

Link to the Code Repository

https://github.com/sebodiaz/cross-population-pose

Link to the Dataset(s)

N/A

BibTex

@InProceedings{DiaSeb_Robust_MICCAI2025,
        author = { Diaz, Sebastian and Billot, Benjamin and Dey, Neel and Zhang, Molin and Abaci Turk, Esra and Grant, P. Ellen and Golland, Polina and Adalsteinsson, Elfar},
        title = { { Robust Fetal Pose Estimation across Gestational Ages via Cross-Population Augmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15966},
        month = {September},
        page = {550 -- 560}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a data augmentation framework aimed at improving the generalization of pose estimation models in fetal MRI scans to younger gestational ages using training data annotated only from older GA cohorts. The authors introduce a fetal-specific augmentation strategy that simulates smaller fetal anatomy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well written and clearly structured, with a well-motivated problem. The authors effectively communicate the clinical and technical relevance of improving pose estimation models for younger gestational ages.
    • The experimental setup is thoughtfully designed. The use of a research dataset for training and validation, combined with an additional and independent test set consisting of younger GA fetuses, provides a strong basis to assess the generalization capabilities of the proposed method.
    • The authors present clear and well-organized figures and analyses, including performance breakdowns across different gestational ages and landmark groups
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The inpainting strategy used by the authors is relatively simplistic: removing one fetus from an image and inserting it into another. While this method appears to be effective in the context of their study, more sophisticated inpainting techniques exist that could generate more anatomically realistic and coherent images. The authors should discuss this.
    • While the reviewer agrees that scaling down the fetus can simulate aspects of the intrauterine environment typical of younger gestational ages, fetal size is not the only distinguishing factor. Morphological and anatomical features differ significantly between early and late gestation. The proposed method does not address such morphological variability, which may limit the realism of the augmentation strategy. The authors should at least discuss the possibility of inpainting fetuses of younger GA datasets in images with older GA.
    • The authors claim to simulate fetal poses characteristic of earlier gestational ages. However, since the augmented data is derived solely from inpainted images using older GA subjects, it is unclear how the model is exposed to the natural pose variations of younger fetuses. Younger fetuses typically exhibit a greater range of movement due to a larger intrauterine space relative to their body size. The authors should clarify how such pose diversity is achieved, or justify why their current strategy is sufficient to approximate it.
    • It is not clearly explained whether the performance gains observed against the baseline are due to the augmentation strategy or other architectural differences in the network. The authors compare their method to the baseline, but do not present results from their own network without augmentation. Are the baseline and proposed networks similar enough for the authors to argue that the differences are only due to the augmentation?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • In Figure 2, noticeable differences between the original and inpainted volumes suggest some visual inconsistency. It would be helpful for the authors to include examples of real MR images from younger GA fetuses to allow for visual comparison with the inpainted volumes.
    • In the final version of the paper, the authors should provide more details regarding the segmentation network used to obtain the body and amniotic fluid masks.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the proposed methods are not highly novel from a technical standpoint, the paper addresses an clinically relevant application. The proposed augmentation strategy, though simple, appears to be effective and results show advantages for younger GA cohorts. The reviewer believes that the weaknesses identified can be clarified in the rebuttal to demonstate the clinical feasibility and to justify a future acceptance.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose to use image inpainting as an augmentation strategy to increase robustness against poor accuracy of fetal landmarks detection in fetal MRI, particularly for earlier gestational ages (GAs) from 20 weeks. They segment the uterus and fetus, then generate synthetic training samples by cross-combining transformed fetal and uterine regions. Their results show that this augmentation increases data diversity, improving results especially GAs in 20-27 where baseline results have much lower accuracy and high variability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is mostly well written and easy to follow.

    The proposed approach leverages a useful insight that lower GAs robustness could be improved by increasing smaller fetuses with bigger amniotic fluid through synthetic data generation; this is acheived through segmentation of fetuas and amniotic fluid and the inpainting strategy.

    The experiments are well designed across research and clinical data, and ablation studies show the value of proposed augmentation strategy through suitable figures.

    The proposed augmentation strategy has the potential to extend beyond keypoint detection to other fetal imaging tasks and modalities.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper could have improved motivation and clarity by explaining briefly, in a few sentences, the time scale and example fetal motions that are important to measure, instead of only citing the relevant papers; typical motion other than the regular ones like heart beat, and what happens if the fetus is in rest during imaging.

    The results and ablation study sections lack explanation. Why is it the case that the proposed methodology performs way better than baseline in the clinical cohort than the research cohort [Fig. 3].?

    The standard deviation in PCK seem quite high in Table 1. It isn’t clear how this standard deviation was calculated (was it the result of multiple seeded experiments?) and why is it very high?

    How would the variation in batch-size impact the performance?

    The augmentation strategy that the authors have proposed require segmentation mask of uterus and fetus. This can be time consuming and costly, especially for volumetric data.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    It seems that inpainting is doing primarily a geometric transformation for smaller age by scaling the segmented fetus. However, at younger weeks it is not just the size of the fetus but the anatomy and tissue characteristics are also different. Is it that the geometric transformation look to be good enough with biggest variation from 20 weeks to 27 weeks captured through this transformation? It would be better to discuss this on the potential of adding age-specific tissue change augmentation too.

    Are these volumes motion corrected through certain computational tools? How does motion artifact impact the existing pipeline?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper adapts inpainting data augmentation strategy to improve fetal MRI landmarks detection. Especially for earlier gestational age where the prior results were not good. The approach has the potential to translate beyond keypoint detection. However, incorporating more interpretation of the results and the ablation study would further strengthen the work.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Authors sufficiently address the points raised in the reviews.



Review #3

  • Please describe the contribution of the paper
    1. The authors proposed a cross-population data augmentation framework. This can estimate the fetal pose in the first trimester by simulating the early intrauterine environment and fetal poses using only third-trimester datasets through an innovative data augmentation strategy.
    2. For fetal pose estimation at different gestational weeks, this method shows significant advantages in fetuses at earlier gestational weeks, with a narrower error distribution, demonstrating the robustness and consistency of the model.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is designed innovatively and rationally. The fetal inpainting augmentation method specifically simulates the unique situation of early-gestation fetuses, and solves the problem of difficult fetal pose estimation in the early gestation period.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Some important descriptions are lacking, like dataset annotation details (annotation time), and in the fetal restoration augmentation method, the basis for selecting hyper-parameters (γ, α, etc.) in amniotic fluid intensity synthesis and their optimization are not detailed. Also, the distribution of samples at different gestational weeks in the 27-37-week-dominant dataset is not described.
    2. Some key experiments are missing. The paper only uses a lightweight 3D UNet for landmark detection without comparing it with other possible existing models.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Are all the samples normal fetuses? If yes, the presence of abnormal cases in clinical reality may affect the model’s robustness.
    2. In fetal restoration augmentation, synthesized amniotic fluid intensity depends on the original median. Differences in amniotic fluid intensity among pregnant women may exist. Do these affect synthesized data quality and model performance? Have more robust methods been considered (like controllable synthesized amniotic fluid intensity)?
    3. Currently, only PCK is used to evaluate model performance, which is limited. For example, adding AUC can comprehensively assess the model from different aspects.
    4. In this paper, data augmentation used fixed parameters. It’s advised to develop an adaptive method adjusting augmentation parameters by fetal gestational week characteristics. For first-trimester fetuses, increase transformation amplitude and frequency due to their large pose changes and small size for better real - world simulation.
    5. An experimental explanation for setting fetal restoration volume as 15% of the total volume is needed.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Limited experiments, with only ablation studies reported.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    I suggest rejecting this paper because: 1) The authors only answered some of my questions; 2) I still think it lacks key experiments and evaluations.




Author Feedback

We thank the reviewers for their insightful feedback. We are glad all reviewers found our proposed augmentation method effective for generalizing to unseen, difficult populations.

R1: Relatively simple method

As the reviewers acknowledged, our simple approach leads to dramatic improvement in performance. While sophisticated methods, such as deep generative models, can produce more realistic images, fetal imaging simply lacks the scale of the datasets required for training them. Moreover, recent SynthSeg (2023) and SynthMorph (2020) methods suggest that generating the most realistic images in data augmentation is unnecessary to improve performance. We will address this point in the revision.

R1,3: Age-specific morphological and pose variation unaddressed

We emphatically agree we should generate images that capture age-specific morphological or pose variation. Unfortunately no method or data exist currently to capture this variation across GAs. Our aim is to collect sufficient data across GAs to enable modeling in the future. The proposed method promises to support this goal by enabling articulated pose characterization in younger cohorts.

R1: Discuss inpainting younger GA fetuses in images with older GA

We absolutely agree it would benefit the model to inpaint younger fetuses into older GA images and vice versa. Unfortunately robust segmentation of younger fetuses is not feasible right now, and is the goal of future work. Given segmentation of younger GA, we will be able to include this augmentation.

R1: Results from own network without augmentation

The results in Table 1 include the same architecture without the proposed augmentation and establish that the performance gains are due to the proposed augmentations.

R3: Paper could be further motivated with added clarity

We thank the reviewer for this suggestion and will expand the introduction with clinical relevance of fetal motion.

R3: Why better performance compared to baseline in clinical and not research?

The research test cohort comprises subjects with similar GAs to those in the training set hence the high performance. In contrast, the clinical cohort is significantly younger and diverse than the training data, and the baseline fails to generalize. Our aggressive augmentation strategy (intensity and cross-population) enables us to greatly outperform the baseline on the held-out clinical cohort with a substantial domain gap in anatomy (GA) and intensity distribution (sequence, artifacts, resolution), which is not necessary for the research data.

R3: Higher StdDev on the clinical set in Table 1

Higher StdDev in the clinical cohort is due to the increased difficulty of detecting keypoints in younger subjects with smaller limbs and degraded image quality. Each datapoint in Table 1 represents a volumetric time-series.

R3: Effect of batch size (BS) on performance?

Preliminary experiments showed no difference in performance between BS 8 and 16 beyond iterations needed for convergence.

R3: Need for costly and time-consuming training segmentations

We agree our augmentation method requires training segmentations, which are obtained offline from an in-house network. This point will be discussed.

R4: Missing details: annotation time, selection of hyperparameters in inpainted volumes, and distribution of research GA.

Keypoint annotation was 30 hours. Hyperparameters were set by performance on the validation set (gamma=U~[0.75-1.75], alpha=0.75). For the research dataset, we provided the week range and the mean age (31.74w).

R4: Paper only uses 3D Unet

UNet architecture has previously achieved excellent performance on older fetuses, thus we focused on augmentation to bridge the cross-population domain gap. The proposed strategies can be readily combined with other architectures, to be investigated in future work.

We will address all the minor concerns and typos in the revision. We thank the reviewers for their helpful comments and suggestions.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While two reviewers found some merit to the paper, the inpainting technique for augmentation has limitations, as it cannot account for the surrounding tissue changes. Admittedly, this is very hard to model. However, this does limit applicability. the other issues raises still stand after the rebuttal.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top