Abstract

Accurate gestational age (GA) estimation, ideally through fetal ultrasound measurement, is a crucial aspect of providing excellent antenatal care. However, deriving GA from manual fetal biometric measurements depends on the operator and is time-consuming. Hence, automatic computer-assisted methods are demanded in clinical practice. In this paper, we present a novel feature fusion framework to estimate GA using fetal ultrasound images without any measurement information. We adopt a deep learning model to extract deep representations from ultrasound images. We extract radiomic features to reveal patterns and characteristics of fetal brain growth. To harness the interpretability of radiomics in medical imaging analysis, we estimate GA by fusing radiomic features and deep representations. Our framework estimates GA with a mean absolute error of 8.0 days across three trimesters, outperforming current machine learning-based methods at these gestational ages. Experimental results demonstrate the robustness of our framework across different populations in diverse geographical regions. Our code is publicly available on GitHub.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1798_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/13204942/RadiomicsImageFusion_FetalUS

Link to the Dataset(s)

HC18: https://zenodo.org/records/1327317 ES-TT: https://zenodo.org/records/3904280

BibTex

@InProceedings{WanFan_Fusing_MICCAI2025,
        author = { Wang, Fangyijie and Liang, Yuan and Bhattacharjee, Sourav and Campbell, Abey and Curran, Kathleen M. and Silvestre, Guénolé},
        title = { { Fusing Radiomic Features with Deep Representations for Gestational Age Estimation in Fetal Ultrasound Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {237 -- 247}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a method for GA estimation using US images of the fetal head. The main idea is the combination of radiomics and deep features using cross-attention. The method is evaluated on two public datasets: ES-TT and HC18, and tested using different network architectures and fusion strategies (none, concatenation, cross-attention), and classical ML models (only radiomics). The proposed configuration obtained the best result of a mean error of 8 days.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of integrating radiomics features into a DL model is valid and interesting. Radiomics include some possibility for interpretability, which makes them interesting for GA estimation.
    • Cross-attention is a valid choice for feature fusion.
    • The figures are mostly well chosen to support the paper.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • While cross-attention for feature fusion is valid, I question its novelty. CA has been used before for feature fusion (e.g., Liu et al. 2024). How is the formulation in this work novel compared to previous approaches? Related work and discussions are missing. In my understanding, it seems the only novelty lies in the kind of features: radiomics and deep features for GA estimation. [Liu, et al. “Multi-modal Data Fusion with Missing Data Handling for Mild Cognitive Impairment Progression Prediction.” MICCAI, 2024.]
    • The main clinical aim is GA estimation. However, only fetal head images are used, which makes the estimation not very accurate. Why weren’t AC and FL standard views included? A discussion is missing on the relation between HC and GA, especially for abnormal growth.
    • It is unclear how the ground truth GA is obtained. In Sec 3.1, the authors talk about a formula for calculating the GA from HC, which is not given. But why is this necessary? Is the information of GA not included in the datasets? And even if not, only using the HC biometric is not very accurate. A discussion on this issue is definitely missing. In addition, I don’t understand, how the HC measurement is obtained from the mask. Typically, an ellipse is fitted to the mask and its parameters are used for the HC calculation. The authors “calculated the number of pixels p_num along the edge of the corresponding ROI […]”. I don’t see how this could yield an HC. And if it does, how accurate is it compared to the true HC?
    • The paper is not easy to follow. Some information is missing (e.g., GA formula) while other details are not important (e.g., Eq. (1), too basic and not used in the remainder of the paper). The method itself is easy to understand, but the evaluation not. It is unclear what the baseline and the concatenation method are. They are not explained. I guess for the latter the features are merely concatenated. For the former: are only deep features used or only radiomics? Also, is the ConvNeXt model fine-tuned during training? Or is only the cross-attention module trained?
    • It remains unclear how the proposed method compares to state-of-the-art GA estimation methods. Or simply using the ground truth HC measure, or ideally multiple biometrics. Also, a discussion is missing on how good an estimation error of 8 days is. This is more than a week, which seems to be quite large.
    • Were results included without using the radiomics? It seems from Table 2 that the radiomics are not very good for estimating GA (MAE > 20 days). How useful are they actually?
    • Fig. 3 is difficult to understand, especially the cross-attention weights. I don’t see why the radiomics help in the interpretability. What is the advantage of knowing that the radiomic features Inverse difference and inverse difference moments are important in the first trimester (following description of Fig3). What about feature maps from the ConvNeXt model? This could be interesting.
    • “Experimental results demonstrate the robustness of our framework across different populations in diverse geographical regions”: robustness has not been assessed. It is unclear how diverse the data is. Spain and the netherlands might not be very diverse (both western europe)
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • How are the masks for the ROI definition obtained? This is unclear.
    • A separation of the results for the two datasets would have been better. Also, the standard deviation should be reported always.
    • Fig 3: please report the GA in the for week+days, which is the common way of reporting GA.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the aim of this study is valid, I have several open questions regarding the data used (only HC views, how to obtain ground truth GA), the novel aspects of the method (CA for fusion?) and the missing SOTA methods for comparison. Unfortunately, the paper is in my opinion not (yet) of enough quality to recommend acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Unfortunately, the authors did not sufficiently address my main concerns in the rebuttal. Some single misunderstandings resulting from the paper where addressed, but no overall justification of data and model choices.



Review #2

  • Please describe the contribution of the paper

    The main contribution of this paper is the use of a cross-attention mechanism to combine radiomics features and deep learning features.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed cross-attention combination of features is significantly better than simply concatenating the features.

    • The authors compared their proposal against numerous DL-based and ML-based baselines, proving the robustness of the combined-features approach across models.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The deep learning features used are only extracted from a ConvNeXt model pre-trained on ImageNet, limiting the conclusions that can be extracted from the results. A more thorough study could have compared different DL-based feature extractors and pre-training strategies.

    • The interpretability of the method is discussed very quickly (despite being one of the points for using radiomics) and the cross-attention weight visualization lacks a proper interpretation (and it’s very difficult to see anything in Figure 3).

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The results obtained and the ingenuity in the feature combination add value, although a more thorough comparison of methods would be desirable.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have pointed out that one of the weaknesses I raised was already included in the paper and I had missed it.

    Regarding the interpretation of the results, Figure 3 is very informative (radiomics start to play a greater role for 2nd and 3rd trimesters) but poorly explained, in my opinion.

    With all these considerations and my previous decision, I recommend to accept it.



Review #3

  • Please describe the contribution of the paper

    The authors propose a novel framework to combine radiomic features and representations of the image derived with deep learning models to estimate gestational age from ultrasound images.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method is claimed to be the first in field to use both radiomic features and deep learning in parallel to estimate GA.
    2. They have tested various alternatives for the convolutional deep learning architecture, and also for the fusion mechanism with radiomic features.
    3. They declare to release their code publicly, which will be valuable in reproducibility.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The use of data is presented in a slightly unclear manner.

    1. They mention 2 different datasets, but then report singular results. Were the 2 datasets concatenated and treated as one? If not, shouldn’t there be results for both datasets in the tables?
    2. As an extension of this, it’s not very clear whether the framework trained (finetuned) on a dataset with images from a specific machine/acquisiton protocol, can perform similarly well when tested on images from another machine/protocol. Would it first need to be finetuned at each new clinical setup?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Including the stdevs of the MAE scores in result tables could be informative for comparison, since some of the MAEs are very close.
    2. At section 2.2, the dimensions of XUS is given as 3hw. I assume the 3 is for the RGB channels? But aren’t the ultrasound images in grayscale? Maybe the grayscales are converted to RGB format because the pretrained models work with RGB?
    3. At table 3, the result for “ConvNext + concat” is given. But this result is missing in table 1. I assume it was by mistake?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed framework is tested with multiple configurations of the components, and also compared to traditional ML models using only radiomic features. It’s a novel method that performs well. However, the paper can be improved by clarifying how the 2 datasets are used, and how the framework can be applied under new conditions.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    They have addressed the previous concerns I raised in a satisfactory manner. Since the authors are not allowed to make changes to the manuscript during rebuttal, my opinion is that the answers they provided are sufficient.




Author Feedback

We sincerely thank all reviewers for their invaluable comments and suggestions. R1/Q7.1: We investigate our method, integrating different deep representations extracted by ResNet, EfficientNet, MaxViT, and SwinTransformer. The results are displayed between row 10 and row 13 in Tab 1. We compare the effectiveness of pre-training strategy in ablation study and present results in row 1 and row 4 in Tab 3. R1/Q7.2: We apologize for the confusion. Fig 3 shows the attention weight matrix for 1st, 2nd and 3rd trimester cases. The y-axis is radiomic features in six categories, and the x-axis is deep features. The more contributions radiomic makes, the darker red color, vice versa for darker blue color. We will update Fig 3 for a clearer visualization with explanations highlighting the most and least contributed features. R2/Q7.1: Thanks for recognizing our novel use of cross-attention for feature fusion. Our research enhances the balance between interpretability and predictive power in DL, thus advancing radiomic studies in fetal ultrasound analysis. R2/Q7.2: Only maternal age and fetal gender affect GA estimation when using HC (≤ 1 day, p = 0.001) [2]. HC18 and ESTT have no abnormal cases. AC and FL standard views have no annotations yet. Our clinicians are currently annotating ROI in AC and FL standard views. R2/Q7.3: HC18 dataset has true HC provided by clinicians. ESTT dataset has HC (mm) calculated using formula: HC = p_num × p_size. P_num is the total number of pixels along the edge of head, and p_size is a single pixel size in mm. GA is calculated with formula [4]: log(GA) = 0.05970 × (log(HC))^2 + 6.409e-9 × (HC)^3 + 3.3258. R2/Q7.4: Apologize. The baseline model uses image only to predict GA. The concatenation method concatenates deep features and radiomics directly. In Section 2.5, we explain our fine-tuning strategy. In Fig 1, the blue box with an open lock means fine-tuned, and the grey box with closed locks means frozen training parameters. R2/Q7.5: We compare to SOTA method (ResNet) [3], which used fetal head images to achieve MAE = 5 days. Ours has a more lightweight model integrated with an improved ability for interpretation. R2/Q7.6: Apologize. Tab 2 presents the results of ML methods with radiomics. These results are worse than ours (CNN + Cross-Attention), which combines deep features and radiomics. ML methods are still commonly used in radiomics [1]. We conclude that our method achieves the best result by fusing deep features with radiomics (row 14 in Tab 1). R2/Q7.7: Apologize. Our response is the same as R1/Q2. R2/Q7.8: Apologize. Our method is robust regarding diverse hospitals and US machines. HC18 uses 2 machines in 1 hospital, and ESTT uses 6 machines across 2 hospitals. R2/Q10.1: Apologize. The ROI represents the entire area covering the fetal head, annotated by sonographers. R2/Q10.2: Thanks. We will report standard deviation for two datasets. R2/Q10.3: Thanks. We will report the GA in the format of week+days. R3/Q7.1: Apologize. The two datasets are merged and treated as one for training (70%) and testing (30%). R3/Q7.2: Apologize. Our method is fine-tuned on a combined dataset with HC18 (2 machines) and ESTT (6 machines). Our method is not fine-tuned separately at each clinical setup. R3/Q10.1: Thanks. We will report the standard deviation in the result tables. R3/Q10.2: Apologize. ESTT images are published in the original paper with RGB channels. HC18 images are greyscale and converted RGB. Our purpose is to fit the input channels of pretrained models. R3/Q10.3: Thanks. The missing result will be added in Tab 1. [1] Defining normal and abnormal fetal growth: promises and challenges, AJOG,2010. [2] Fetal age assessment based on ultrasound head biometry and the effect of maternal and fetal factors, AOGS, 2004. [3] Machine learning for accurate estimation of fetal gestational age based on ultrasound images, npj Digit. Med.,2023. [4] Ultrasound-based gestational-age estimation in late pregnancy, ISUOG,2016




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposes a method that fuses radiomic features with deep representations using a cross-attention mechanism to estimate gestational age from fetal head ultrasound images. Reviewers 1 and 3 are more supportive, citing promising results and the potential utility of radiomic-deep fusion. While the topic is clinically relevant and the fusion strategy is well-intentioned, key concerns remain. Reviewer 2 questions the novelty and practical justification of the proposed method, particularly the reliance on a single anatomical view. The rebuttal did not sufficiently clarify the rationale for the approach, nor did it address deeper concerns around clinical generalizability or interpretability. Overall, I recommend rejection in the paper’s current form.



back to top