Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

This paper introduces a novel user-centered approach for generating confidence maps in ultrasound imaging. Existing methods, relying on simplified models, often fail to account for the full range of ultrasound artifacts and are limited by arbitrary boundary conditions, making frame-to-frame comparisons challenging. Our approach integrates sparse binary annotations into a physics-inspired probabilistic graphical model that can estimate the likelihood of confidence maps. We propose to train convolutional neural networks to predict the most likely confidence map. This results in an approach that is fast, capable of dealing with various artifacts, temporally stable, and allows users to directly influence the algorithm’s behavior using annotations. We demonstrate our method’s ability to cope with a variety of challenging artifacts and evaluate it quantitatively on two downstream tasks, bone shadow segmentation and multi-modal image registration, with superior performance than the state-of-art. We make our training code public.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4043_paper.pdf

SharedIt Link: https://rdcu.be/eHw8g

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_6

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{RonMat_Beyond_MICCAI2025,
        author = { Ronchetti, Matteo AND Göbl, Rüdiger AND Yesilkaynak, Bugra AND Zettinig, Oliver AND Navab, Nassir},
        title = { { Beyond Shadows: Learning Physics-inspired Ultrasound Confidence Maps from Sparse Annotations } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {55 -- 64}
}

Reviews

Review #1

Please describe the contribution of the paper

a. To use weak annotations by users to guide a CNN model’s training to generate confidence map for ultrasound images. This guidance is expected to roughly capture the most and the least confident regions. These weak annotations sparse and scribble-like so are not as difficult to obtain as dense maps.

b. To create target confidence maps for training a CNN, they create dense confidence maps using physics of ultrasound imaging across and within a scan line, along with the sparse annotations.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

a. The idea to injecting user’s input to get sparse labels (weak supervision) adds a layer of quick human supervision as opposed to methods solely based on assumptions without any human input.

b. Using concepts inspired by physics to estimate the likelihood of confidence maps based on inter as well as intra scanline neighbour pixels, rather than simpler aggregation approaches like averaging.

c. The authors try to present strength of their method qualitative as well as quantitatively.

d. The paper’s structure and mathematical formulation are generally easy to follow till the method part. The explanation of ideal confidence maps before getting to the method section make it easier to understand the method.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

a. The paper initially frames their approach as “user-centered” which naturally creates an expectation that users would have direct input during the inference process as well. However, the user annotations are only utilized during the training phase. So, there is some disconnection between this framing and actual implementation for the readers.

b. The Quantitative evaluation set up is not clear. The authors do not mention how they trained the confidence estimation model (did they retrain for that particular dataset or reuse the model they have mentioned training details about?). Additionally, the details of datasets used for evaluation is also missing.

c. The evaluation design seems weak. The training details they mentioned is followed by a qualitative evaluation on just 7 frames. The quantitative evaluation is also difficult to follow and understand(point b)

c. The paper lacks ablation study of individual components in equation 2.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

In equation 2, the second and the third terms depend on initial confidence values xi (of self ) and xj (of neighbours). Although it looks like the unary potentials(first term of the same equation) serves this purpose, a clarity on this would help the readers.

The paper refers to two colors (red and blue) in the confidence map images whereas the pictures contain yellow and blue colors.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method as well as the write up till the method seems promising. But the evaluation and reliability of the results is difficult to understand (based the confusions mentioned regarding evaluation).
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

It is still not clear why ablation study wasn’t there to see how much impact the individual components made to the overall results.

Review #2

Please describe the contribution of the paper

The paper proposes an innovative approach to predict confidence maps based on physical assumptions incorporated into a probabilistic graphical model for US image generation. They test their approach on bone shadow segmentation and multi-model registration.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Adding the graphical model seems to be intuitive to add given the spatial correlation in ultrasound images
- This research direction seems to be well under-presented in literature but improving the SOTA seems to be valuable The approach shows clear benefits over existing methods (although the compared methods are relatively old)
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

I miss some more details on segmentation with sparse annotations (because that topic is well studied and there are other alternative approaches) because essentially the task at hand is a segmentation task using sparse/noisy annotations. Maybe the authors can elaborate this further in their rebuttal.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- How did the authors come up with the confidence annotations? Is this something that can be annotated objectively? Is the approach prone to inaccuracies in the annotations?
- Fig. 4 It seems that the proposed approach provides more accurate confidence maps, in fact most of the others do not even make sense, which is surprising to me. Hungs method seem to underperform quite a lot. But I was wondering if the sharp edges of the proposed approach could be a problem? I assume it is due to the optimization of the CNN. What I am also missing in this image is some sort of ground truth (also I acknowledge that this is not trivial, but what would be the most correct confidence for those cases?
- How heavy is the approach in terms of execution time? You write in your conclusion that it can be used in clinical practice but does it mean also for US sequences in real-time?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper provides a solution to an important problem since confidence maps can be widely applied in image analysis for echo images (also potential quality assessment). However, some aspects were not quite clear regarding the annotations and the actual execution time to be applied in practice.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

After reading the rebuttal along with the other review comments and answers, I still regard the submitted work worthy to be published because it addresses an interesting topic relevant for the community even if there are some open questions and some weaknesses when it comes to the evaluation.

Review #3

Please describe the contribution of the paper

This paper proposes a deep learning-based method to generate confidence maps for ultrasound (US) images. The authors utilize a convolutional neural network (CNN) integrated with a probabilistic graphical model that incorporates ultrasound physics. The proposed approach is validated both qualitatively and quantitatively in two applications: bone shadow segmentation and ultrasound image fusion.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper makes a methodological contribution by proposing a novel approach for generating confidence maps in ultrasound images.
2. The authors effectively incorporate a probabilistic graphical model (PGM) to address ultrasound physics-related challenges, enhancing the model’s interpretability and robustness.
3. The validation strategy and experimental results are satisfactory, demonstrating the potential of the proposed method in relevant applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Comment on Fig. 1: The role and contribution of the probabilistic graphical model (PGM) within the overall method, as illustrated in Fig. 1, are not clearly explained. The authors should clarify how the PGM integrates with the CNN and contributes to the final output.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The authors could consider incorporating more common ultrasound artifacts, such as multi-reflection, into their model. Addressing these artifacts could improve the robustness and realism of the method, especially in challenging clinical scenarios.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

See Strengths and Weaknesses.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the reviewers for their thoughtful and constructive feedback, which we believe will significantly strengthen the manuscript.

Several suggestions were given regarding the design and clarity of the evaluation. Our model is trained a single time on a diverse annotated dataset and is not retrained for each test set. We used a validation set consisting of 72 frames (20% of the dataset) to assess generalization, from which we selected 7 representative examples for qualitative illustration. These were chosen to highlight the model’s behavior across typical cases and are not cherry-picked. Our model achieved a validation loss of 0.32, closely matching the training loss of 0.25. The validation loss may serve as a partial proxy for comparison to ground-truth confidence, as suggested by Reviewer #2. We will include this information in the revised manuscript. Evaluating confidence maps remains a fundamental challenge due to the absence of objective ground truth. We have designed our evaluation protocol to align with best practices in the field, referencing prior methodologies such as Karamalis et al. (2012) and Yesilkaynak et al. (2024).

The role of user annotations in our approach was also a key point of discussion. Reviewer #1 pointed out that labeling the method as “user-centered” could imply user interaction during inference. This is not the case as it would compromise real-time usability, and we will state this in the manuscript more clearly. This labeling scheme was chosen to highlight the ability of domain experts to define reliability through annotations, in contrast to existing methods that lack such control. Reviewer #2 is right on the subjectivity of annotations. It is worth noting that the definition of confidence in ultrasound imaging is complex and therefore, this work aims specifically to take this into account by combining the subjective anatomy- and process-specific definition of confidence with the one based on the pure physics of ultrasound imaging. While interpretation may vary with clinical context, our method integrates user-defined reliability with consistent physical cues derived from ultrasound physics. This combination takes advantage of machine learning to allow for both flexibility and robustness across use cases.

We also thank Reviewer #3 for the suggestion to consider a broader spectrum of ultrasound artifacts. While our method already accommodates diverse artifact types through user-driven annotation, we agree that a structured evaluation of such extensions would be valuable and will add this discussion into the paper.

The runtime metrics already reported in the manuscript (>360fps) confirm the real-time suitability of our approach for clinical use. We will make this information more visible in the final revision of the manuscript.

In response to the request for clarification regarding segmentation with sparse annotations, we acknowledge that although our task differs from traditional segmentation, it shares methodological similarities. We will expand on this relationship and add citations to related segmentation work to better position our contributions within the broader literature.

Once again, we thank the reviewers for their detailed feedback and insightful suggestions, which we are confident will help improve the clarity, rigor, and impact of the final manuscript. This paper presents the first approach to combine sparse expert-driven weak supervision with physics-inspired modeling through a probabilistic graphical model, providing both interpretability and robustness. We are hopeful that our contribution, offering a modern principled alternative to simplistic confidence aggregation methods, will help advance the state of the art in this important, yet under explored area of research.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The reviews remain varied following the rebuttal phase. While the experimental validation is still somewhat limited and underdeveloped, the topic itself and the proposed solution are of clear relevance and interest to the community. Therefore, I recommend acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Beyond Shadows: Learning Physics-inspired Ultrasound Confidence Maps from Sparse Annotations

Author(s):