Abstract

In free-hand ultrasound imaging, sonographers rely on exper- tise to mentally integrate partial 2D views into 3D anatomical shapes. Shape reconstruction can assist clinicians in this process. Central to this task is the choice of shape representation, as it determines how accurately and efficiently the structure can be visualized, analyzed, and interpreted. Implicit representations, such as SDF and occupancy function, offer a powerful alternative to traditional voxel- or mesh-based methods by modeling continuous, smooth surfaces with compact storage, avoiding explicit discretization. Recent studies demonstrate that SDF can be effectively optimized using annotations derived from segmented B-mode ultrasound images. Yet, these approaches hinge on precise annotations, overlooking the rich acoustic information embedded in B-mode intensity. Moreover, implicit representation approaches struggle with the ultrasound’s view-dependent nature and acoustic shadowing artifacts, which impair reconstruction. To address the problems resulting from occlusions and annotation dependency, we propose an occupancy-based representation and introduce Ultrasound Occupancy Network (UltrON) that leverages acoustic features to improve geometric consistency in weakly-supervised optimization regime. We show that these features can be obtained from B-mode images without additional annotation cost. Moreover, we propose a novel loss function that compensates for view-dependency in the B-mode images and facilitates occupancy optimization from multi-view ultrasound. By incorporating acoustic properties, UltrON generalizes to shapes of the same anatomy. We show that UltrON mitigates the limitations of occlusions and sparse labeling and paves the way for more accurate 3D reconstruction. Code and dataset is available at https://github.com/magdalena-wysocki/ultron.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2979_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/magdalena-wysocki/ultron

Link to the Dataset(s)

https://github.com/magdalena-wysocki/ultron

BibTex

@InProceedings{WysMag_UltrON_MICCAI2025,
        author = { Wysocki, Magdalena and Duelmer, Felix and Navab, Nassir and Azampour, Mohammad Farid},
        title = { { UltrON: Ultrasound Occupancy Networks } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {611 -- 620}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a framework named UltrON, which takes the combination of acoustic features and spatial coordinates as input and outputs the occupancy probability. This design improves the quality of reconstruction from B-images and also reduces the need for annotation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The attenuation-compensated optimization incorporates the visibility term into the training objective. This design helps guide the intensity to distribute near the zero-level set, which not only enhances reconstruction quality but also reduces the need for precise annotation.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. In general, NeRF-based methods are rather slow for training. We hope the authors report the training time for a better evaluation of algorithm efficiency.
    2. The evaluation is only performed on lumbar spine vertebrae. More evaluations on other biological structures are needed to support the effectiveness of the proposed method.
    3. The reconstruction results are not as smooth as those of RoCoSDF. Have the authors tried to solve this problem?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    please refer to the major strength.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The main contribution of this paper is the development of UltrON (Ultrasound Occupancy Networks), a novel implicit neural representation technique that integrates acoustic features derived from B-mode ultrasound images to improve 3D shape reconstruction. The authors specifically address three key challenges in ultrasound-based 3D reconstruction:

    1. They reduce the supervision required for learning implicit neural representations by 90% by leveraging acoustic information embedded in B-mode intensities rather than relying solely on manual annotations.
    2. They introduce an attenuation-compensated loss function that addresses the problem of partial observations caused by occlusions and acoustic shadowing in multiview ultrasound imaging.
    3. They demonstrate that by incorporating acoustic features into the occupancy function, their method generalizes effectively to different volumes of the same anatomical structure with minimal fine-tuning (requiring only 1% supervision).

    The authors validate their approach using vertebra phantoms and show a 26% improvement in surface reconstruction quality (measured by Chamfer Distance) compared to state-of-the-art methods like RoCoSDF, despite using 90% fewer annotations.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel integration of acoustic features into occupancy networks: The authors propose an innovative extension to traditional occupancy networks by incorporating acoustic properties (attenuation, reflection, scattering) derived from B-mode images. This approach is particularly well-suited to ultrasound imaging, as it leverages the intrinsic relationship between tissue types and their acoustic signatures.
    2. Significant reduction in annotation requirements: The paper demonstrates that by utilizing the rich acoustic information embedded in B-mode intensities, UltrON requires 90% fewer manual annotations than previous methods. This represents a substantial practical advantage for clinical applications, where annotation is time-consuming and expensive.
    3. Attenuation-compensated loss function: The authors introduce a novel loss function that accounts for the view-dependent nature of ultrasound and compensates for acoustic shadowing through a transmittance function. This mathematical formulation effectively addresses a fundamental challenge in ultrasound imaging that previous approaches have struggled with.
    4. Strong generalization capabilities: The paper demonstrates that UltrON can generalize to new shapes of the same anatomical structure with minimal fine-tuning (1% supervision and only 100 iterations). This is particularly valuable in medical applications where patient-specific variations must be accommodated.
    5. Thorough experimental validation: The authors provide comprehensive quantitative evaluation using multiple metrics (CD, HD, MAD, RMSE) and qualitative visualizations that clearly demonstrate the advantages of their approach over existing methods, particularly in preserving topology and handling annotation errors.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Limited anatomical diversity: The study focuses exclusively on vertebrae models. While this provides a good test case, it raises questions about how well the method generalizes to other anatomical structures with different acoustic properties and geometries. Including at least one additional anatomical structure (e.g., liver, kidney) would have strengthened the paper’s claims about generalizability.
    2. Phantom-based evaluation: The experiments are conducted using ballistic gelatin phantoms with paper pulp. While these provide more realism than water bath setups, they still don’t fully capture the acoustic complexity of real human tissues. The authors acknowledge this limitation implicitly but don’t discuss how performance might vary in clinical settings with real tissues.
    3. Limited comparison with state-of-the-art: While the authors compare against RoCoSDF and coordinate-based ON, there are other recent methods in ultrasound reconstruction that could have been included for a more comprehensive comparison, such as UNSR or other learning-based approaches for 3D ultrasound reconstruction.
    4. Unclear computational requirements: The paper doesn’t discuss the computational complexity or inference time of the proposed method. Given that neural implicit representations can be computationally intensive, information about real-time performance capabilities would be valuable for assessing clinical applicability.
    5. Lack of ablation studies on acoustic features: While the authors perform an ablation study on the attenuation-compensated loss, they don’t analyze the relative contributions of the three acoustic properties (attenuation, reflection, scattering). Understanding which features contribute most to the performance gains would provide deeper insights into the method.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Consider adding more details about the potential clinical applications and how the improved reconstruction might benefit specific clinical workflows.
    2. The generalization capabilities are impressive but could be better contextualized through discussion of how this might translate to clinical practice where patient-specific variations are common.
    3. The method’s ability to handle annotation errors is a significant strength that could be further emphasized, as this is a common challenge in clinical data.
    4. A discussion of limitations and future work would enhance the paper, particularly addressing how the approach might be extended to handle more complex anatomical structures or in vivo imaging scenarios.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a promising approach to ultrasound-based 3D shape reconstruction through the integration of acoustic features into occupancy networks. This innovative methodology shows potential for addressing significant challenges in the field, particularly by reducing annotation requirements by 90% while still achieving better performance than existing methods. The attenuation-compensated loss function represents a thoughtful solution to the inherent problems of view-dependency and occlusions in ultrasound imaging. The mathematical formulation is sound and the results demonstrate its effectiveness. However, several limitations prevent me from giving a stronger recommendation: First, the experimental validation is limited to vertebrae phantoms. While the experiments show improvement over state-of-the-art methods, the lack of diversity in anatomical structures raises questions about how well the approach generalizes to other tissues with different acoustic properties. The authors need to address whether their findings would translate to more complex anatomical structures or in vivo scenarios. Second, the phantom-based evaluation, while more realistic than water bath setups, still doesn’t fully capture the acoustic complexity of actual human tissues. The authors should discuss the potential challenges of applying their method in clinical settings with heterogeneous tissues and varying acoustic conditions. Third, the comparison with existing methods could be more comprehensive. Including additional baseline methods would strengthen the paper’s claims about advancement over the state-of-the-art. Additionally, the lack of information about computational requirements and real-time performance capabilities makes it difficult to assess the method’s practicality for clinical applications. The authors should clarify whether the approach is computationally feasible for real-time or near-real-time use. The paper would benefit from ablation studies on the relative contributions of different acoustic features and a more detailed discussion of limitations and future work. The authors should also clarify their plans for releasing code and data to ensure reproducibility. In their rebuttal, I would like to see the authors address these concerns, particularly regarding generalizability to other anatomical structures, performance in more realistic imaging conditions, and computational efficiency. If they can satisfactorily address these issues, the paper’s strong technical contribution and potential clinical impact would justify acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents UltrON, a novel approach that integrates acoustic features from B-mode intensities into a representation of occupancy. It further introduces an attenuation compensated loss function that tackles the problem of partial observations due to occlusions in multiview ultrasound. Acoustic features also enbable generalization ability to some degree. The results are also promising.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper proposes UltrON, a novel approach that integrates acoustic features from Ultra-NeRF into occupancy representation. The acoustic features could be more efficient in describing tissue distribution that coordinates.
    2. This paper introduces an attenuation compensated loss function that tackles the problem of partial observations due to occlusions in multiview ultrasound.
    3. Acoustic features also enbable generalization to the same anatomy across different volumes with limited supervision and fine-tuning.
    4. The results are promising, surpassing current SOTAs in visual quality and metrics.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are still some points that not clear or good enough as listed in follows.

    1. I dont really understand 10%/100% annotations. Do you mean UltrON is more label efficient (10%) thanks to acoustic features from Ultra-NeRF than coodinates (ON, 100%)? But I think you need to train Ultra-NeRF previous to your occupancy network training according to your implementation details. So Ultra-NeRF’s acoustic features provide your occupancy network with some shape-specific prior knowledges. When you train Ultra-NeRF, do you use 100% or 10%? When you train occupancy network, do you use 100% or 10%? There are four combinations. I guess, you train Ultra-NeRF using 100% and you train occupancy network using 10%. If so, Ultra-NeRF already capture the data distribution well, and of course your occupancy network could be trained with limited annotations. It is not clear, you must clarify.
    2. In Fig. 4 caption, you mention that “larger annotation errors”, but you didnt show the annotation errors anywhere. It is confused.
    3. In Fig. 4 caption, you mention the generalization experimental setting “trained on L3 and fine-tuned on L2”, you’d better tell readers this information in the main text rather than image or table caption. Otherwise, the readers would be confused when read the main text.
    4. Fig. 2 is not looking-well especially the layout, hope you can re-draw it.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes UltrON, a new approach in reconstructing tissue occupancy from multiview ultrasound by incorporating acoustic features as input. It also proposes an attenuation compensated loss to tackle observations occlusions. This paper has its insight and novelty. But some points are not clear and confused especially for 10%/100% annotations as I listed in the weakness part. Hope the authors could clarify and improve these things in the rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We would like to thank all the reviewers for their thorough review, positive reception of the submission, and constructive feedback. We appreciate the recognition of our novel integration of acoustic features into occupancy networks (R3, R4), the attenuation-compensated loss (R2, R4), and the strong generalization capabilities (R2, R3, R4). As noted, our method significantly reduces annotation requirements (R2, R3) (by up to 90%) and demonstrates thorough validation across multiple metrics, showing clear advantages in the shape reconstruction. We are glad the reviewers found the results promising and surpassing current state-of-the-art (SOTA) in both visual quality and quantitative metrics. We provide the following clarifications to improve the clarity of our contributions.

Regarding evaluation using vertebra phantoms (R2, R3), we selected the vertebra as a representative example of a bony structure, as we found it to be the most challenging to reconstruct due to reverberation artifacts and occlusions. We used a phantom model for accurate evaluation against a known CAD model. While CT imaging can provide a reconstructed shape for in-vivo and ex-vivo data, it would not be as precise as a CAD reference due to segmentation errors. In future work, we extend our experiments to include evaluations on both ex-vivo and in-vivo acquisitions of the spine (a high-density structure), as well as other anatomical shapes such as the aorta and thyroid, which are characterized by lower density. These evaluations will include comparisons to shapes obtained from CT imaging with the discussion of limitations of using CT as the reference.

Regarding the question whether Ultra-NeRF uses 10% or 100% annotations (R4), the training of Ultra-NeRF is entirely self-supervised and does not rely on any annotations. We use 100% of B-mode data but zero annotations. In contrast, optimizing the reconstruction network requires annotations therefore we test with 10% and 5 % of annotations to demonstrate that using acoustic features from self-supervised Ultra-NeRF significantly reduces the number of annotations needed for the reconstruction network to optimize the target shape. In these experiments, we show that even with a reduced number of annotations (10% of B-mode images annotated) we achieve above SOTA reconstruction accuracy of the method trained on 100% of annotations.

While computational efficiency was not part of the scope of the current submission, we thank the reviewers for suggesting that investigating and addressing the trade-off between computational efficiency and reconstruction quality would be a valuable extension of this work.(R2, R3). One reason NeRF methods for RGB imaging are computationally expensive is that image rendering involves sampling dozens to hundreds of points along each camera ray intersecting the 3D scene. Vanilla NeRF samples points along all rays indiscriminately, including those passing through empty or low-density regions. This inefficiency contributes to the high computational cost of both training and rendering. In contrast, ultrasound imaging typically lacks such empty regions. As a result, Ultra-NeRF can employ equidistant sampling rather than random sampling. While this change accelerates the optimization process of the acoustic features, it also reduces their spatial resolution and does not prioritize regions that are difficult to reconstruct.

In conclusion, the proposed integration of acoustic features into occupancy networks, combined with the attenuation-compensated loss, enables strong generalization with minimal supervision. The method achieves up to a 90% reduction in annotation requirements and demonstrates consistent improvements over baselines across multiple quantitative metrics and visual quality benchmarks. These results highlight the potential of our approach for efficient and accurate shape reconstruction from ultrasound B-mode imaging while reducing annotation cost.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top