Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Accurate registration between intraoperative 2D images and preoperative 3D anatomical structures is a prerequisite for image-guided minimally invasive surgery. Existing approaches for 2D/3D rigid registration, particularly those for X-ray to CT image registration, primarily rely on grayscale-based image similarity metrics. However, such metrics often fail to capture the optimal projection transformation due to their limited contextual information. To address this issue, we propose a novel and intuitive correspondence representation: the overlap of multiple corresponding Regions of Interest (ROIs). By introducing the differentiable Dice coefficient computed on the projection image, we establish a direct link between segmentation and registration within our weakly supervised 2D/3D registration framework. This framework comprises two stages—a learning-based preoperative stage and an optimization-based intraoperative stage—both of which leverage the ROI-based Dice score as a differentiable supervision signal. Additionally, we integrate automatic segmentation methods (e.g., UNet) and prompt-based methods (e.g., MedSAM) into the framework to investigate the impact of different segmentation approaches on registration performances. Furthermore, we validate the generalization ability of the proposed framework by integrating the ROI-based similarity with various similarity measures. Extensive experiments conducted on the DeepFluoro dataset yielded an mTRE of 0.67$\pm$1.34 mm, with rotational and translational error values being 0.2$\pm$0.5$^{\circ}$ and 1.6$\pm$2.9 mm respectively, outperforming existing state-of-the-art methods. The codes are available at \url{https://github.com/CYXYZ/WSReg}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2022_paper.pdf

SharedIt Link: https://rdcu.be/eHwZp

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04984-1_62

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/CYXYZ/WSReg

Link to the Dataset(s)

https://github.com/rg2/DeepFluoroLabeling-IPCAI2020

BibTex

@InProceedings{CuiYux_WeaklySupervised_MICCAI2025,
        author = { Cui, Yuxin AND Meng, Max Q.-H. AND Min, Zhe},
        title = { { Weakly-Supervised 2D/3D Image Registration via Differentiable X-ray Rendering and ROI Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {648 -- 658}
}

Reviews

Review #1

Please describe the contribution of the paper
This paper introduces a weakly-supervised 2D-3D registration approach for aligning intraoperative X-ray images with preoperative CT volumes. The method is designed to be more efficient and scalable than traditional registration approaches that rely heavily on precise anatomical landmarks or dense manual annotations.

The key contributions can be summarized as:
1. The use of differentiable ROI-based supervision, where segmentation overlap (Dice score) is used as a proxy for spatial alignment. This allows pose estimation to benefit from semantic structures in the images without requiring exact pixel-level correspondences or rigid landmark annotations.
2. Integration of automatic segmentation methods, specifically U-Net and the promptable MedSAM model, into the registration pipeline. This makes the approach flexible and adaptable to varying clinical needs and annotation styles.
3. A comprehensive evaluation using the DeepFluoro dataset, demonstrating performance gains over established baselines (DiffDRR, DiffPose, PSSS), with a reported SRR of 87%, highlighting the effectiveness of the approach even in a weak supervision setting.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The idea of using segmentation-based supervision as a differentiable registration signal is practical. It avoids the need for large-scale annotated landmark datasets and instead leverages semantic consistency across projections.
2. The two-stage design—separating preoperative learning and intraoperative optimization—is relevant for clinical translation. This structure supports real-time deployment while retaining the benefits of offline training, making the system potentially usable in real-world intraoperative workflows.
3. The reported performance, particularly the 87% sub-millimeter registration success rate (SRR), is strong. This is especially impressive considering the model is trained with weak supervision and evaluated on challenging simulated-to-real tasks.
4. The inclusion of both U-Net and MedSAM segmentation models adds robustness to the pipeline. This demonstrates that the method is agnostic to the type of segmentation and can accommodate future segmentation advancements.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The DeepFluoro dataset, while appropriate for method validation, is limited in scope. It is based on ex-vivo cadaveric data and consists of only six specimens. This small sample size, along with limited variation in anatomy, imaging devices, and acquisition protocols, restricts the generalizability of the conclusions.
2. The paper does not sufficiently discuss the clinical implications of the reported accuracy metrics. While sub-millimeter registration accuracy is promising, it is unclear how these metrics translate to clinical relevance for specific interventions (e.g., spinal injections, hip navigation). A discussion comparing the performance to known clinical thresholds would be beneficial.
3. Although the Dice score is an intuitive and widely-used metric for segmentation overlap, it is inherently limited. It does not effectively capture structural correspondence of internal or occluded anatomical features. Thus, its reliability as a proxy for registration quality—especially in the presence of anatomical deformities, pathologies, or surgical artifacts—remains uncertain.
4. The methodology section, particularly the parts describing intraoperative registration and the multi-branch network architecture, is dense and at times difficult to follow. Figures 2 and 3, while comprehensive, lack a clear visual guide for the reader to follow the sequential steps. A more intuitive narrative, possibly with a simplified schematic or flowchart to guide the reader through the stages of the method, would greatly improve clarity.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a technically novel and clinically relevant approach to weakly-supervised 2D-3D image registration by integrating differentiable segmentation overlap into both learning-based and optimization-based registration workflows. The contributions are well-timed and align with current trends in reducing reliance on densely annotated data while still achieving robust intraoperative performance. The method demonstrates impressive registration accuracy, especially given the weak supervision setup, and offers a flexible framework that supports different segmentation tools, such as U-Net and MedSAM. The results are benchmarked against competitive baselines and suggest strong potential for further development and clinical application. However, a notable weakness lies in the complexity and readability of the methodology section. While the technical depth is appreciated, the description of the registration pipeline—particularly the intraoperative phase and multi-branch architecture—is presented in a dense and highly technical manner, which may hinder understanding for both technical and clinical audiences. Figures 2 and 3, intended to aid comprehension, are visually overloaded and lack clear narrative flow or visual hierarchy. As a result, it is difficult for the reader to follow the sequence of operations or understand how different components interact over the course of the pipeline. This lack of clarity significantly detracts from the accessibility and reproducibility of the work, which is particularly important for a method aiming for real-world applicability. Simplifying the visual aids, providing step-by-step guidance, and clearly separating the training and inference workflows would significantly improve the paper.

Despite this, the core contributions and results are strong, and with moderate revision focused on restructuring and clarifying the methodology section, this paper would represent a valuable addition to the literature on weakly-supervised image registration and surgical navigation.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

My initial concerns regarding the clarity of the methodology and the quality of the accompanying illustrations remain unresolved. Both elements are essential for ensuring that the proposed methods can be clearly understood, accurately reproduced, and critically evaluated by peers. Unfortunately, despite the authors’ rebuttal, there has been little improvement in these areas. The explanations continue to lack the precision and accessibility expected for scientific transparency, which, in my view, significantly limits the reproducibility and evaluability of the work.

Review #2

Please describe the contribution of the paper

This work proposes a learning-based 2D/3D registration framework based on the prior work, DiffDRR ref.[21]. The main contribution is the integration of the anatomical segmentation mask ROI as an additional loss target to the registration objective.

This performance of the proposed framework is validated on the public pelvis CT/X-ray cadaveric dataset. The study shows that the segmentation-assisted registration generally improves the registration accuracy compared with using only the conventional similarity metrics.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The idea of using differentiable Dice score to integrate the segmentation mask ROI as an additional loss target to the differentiable 2D/3D registration is a novel contribution. Essentially, it uses the powerful medical image segmenation network to extract semantic information from the raw intensity images and then regularizes the existing similarity functions. It would be interesting to see the registration performance using only the segmentation loss.

This paper is very-well written. The methodology is clear, the evaluation experiments are comprehensive, and the delta performance of the key contributions are highlighted with ablation studies.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

It is not quite clear of the advantage of using MedSAM compared to U-Net like architectures. Although MedSAM is a more powerful foundation model and it shows superior segmentation performance, it adds an additional user interaction to give prompt and impacts the projection image ROI accuracy as discussed in the paper. Additionally, this specific task has been studied to achieve From the reviewer’s point of view, it does not add much additional information to the main story by comparing with MedSAM, but it complicates the workflow.

Although the idea is interesting, the majority of the work and experiments are established on the prior existing work, DiffDRR ref.[21]. Thus, the breakthrough contribution is marginal.

It would be great if the registration runtime and memory usage can be discussed in the manuscript.

Fig.4 misses the caption of the comparison baseline method. Please revise Fig.4. The caption of Table.3 messed up “with” and “without”.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The overall quality of this manuscript is excellent. The idea of adding segmentation loss is interesting. However, this work is heavily dependent on the prior publications, in terms of registration framework, dataset, evaluations, etc. The delta contribution is considered marginal.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

In the rebuttal, the authors justified this work’s contribution against the existing work in the literature and responded well to the requested additional information. The reviewer supports this paper’s acceptance.

Review #3

Please describe the contribution of the paper

The manuscript proposes a segmentation-based method for 2D/3D registration of X-ray images. By training a pose-update network to predict an update to the pose of a 3D model, the method can be used to register a 3D CT scan to a 2D X-ray image. A focus is placed on an “ROI-based loss,” which is DICE loss on the predicted mask of both the original image and the DRR from the current pose. The method is evaluated in a cross-fold strategy consistent with prior work on the DeepFluoro dataset. It achieves a mean target registration error (mTRE) of 0.67±1.34 mm, which is a significant improvement over the state-of-the-art.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- mTRE of 0.67±1.34 mm is impressive for the 2D/3D registration task.
- The addition of a DICE loss on the predicted mask is a novel contribution that improves performance without requiring significant additional data.
- mTRE of 0.67±1.34 mm is impressive for the 2D/3D registration task.
- The addition of a DICE loss on the predicted mask is a novel contribution that improves performance without requiring significant additional data.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
No major scientific weaknesses, but the paper has significant issues in terms of clarity and positioning:
1. Prior work is not as limited as suggested.
2. Figures are non-linear and difficult to follow.
3. A discussion on the similarity between the proposed method and prior work (especially [2,3]) is missing. This is basically differentiable-rendering-based 2D/3D registration using GradNCC (well-known) with an added DICE loss on the masks.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Regarding weakness (1), manual landmark annotation or large dataset collection is not required for prior work. [1] does both fully automatically, using a statistical shape model to propagate landmarks across hundreds of CTs and then simulate 200k X-rays.

Regarding (2), Fig 2 and 3 are so tangled as to be unreadable. Linearize and streamline the figures. In many cases, it is not clear what is being passed between each block.

Mathematical details are presented in a very dense manner with inline equations that should be in display mode.

Minor comments:
- ROI is a strange term to use here. Usually, an ROI is a box, as in a R-CNN, but here it is a mask. I suggest using “mask” instead.
- “Still focus on grayscale-based similarity metrics” should be more precise. All metrics are based on the grayscale X-ray image. I presume this means non learning-based metrics, such as NCC.
- List translation then rotation error in that order, for consistency with the literature.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, the manuscript makes a solid contribution to the field of 2D/3D registration by incorporating mask-based loss functions into differentiable-rendering-based registration methods. The biggest thing I would like to see in the final version is a thoughtful discussion of the similarities and differences between this work and prior work, especially [2,3], and some conjecture on why the proposed method works so well. Are the learned features within masks somehow encoding spatial information, which can propagate through the gradients with respect to DICE? Perhaps in a journal extension, additional experiments could be performed to explore this, but for MICCAI the paper is solid, aside from the issues mentioned with the clarity of figures and discussion of prior work, which has some inaccuracies that should be corrected.

References:

[1] Killeen, Benjamin D., et al. “Medical Image Computing and Computer Assisted Intervention – MICCAI 2023.” Pelphix: Surgical Phase Recognition from X-Ray Images in Percutaneous Pelvic Fixation. Springer, 1 Oct. 2023, pp. 133-43, doi:10.1007/978-3-031-43996-4_13. [2] Gao, C., Liu, X., Gu, W., et al.: Generalizing spatial transformers to projective geometry with applications to 2D/3D registration. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23 (pp. 329-339). Springer International Publishing (2020) [3] Gopalakrishnan, V., Dey, N., Golland, P.: Intraoperative 2D/3D image registration via differentiable X-ray rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11662-11672). (2024)
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

While I agree that the paper would benefit from editing for clarity, I would respectfully push back on some of the other weaknesses raised.

Respectfully, R1 is holding this paper to MIC standards for review, which are not appropriate for the application. This is a CAI paper, implementing 2D/3D registration for X-ray image-guided surgery, and should be evaluated in that context. The DeepFluoro dataset, while not large, is the largest publicly available dataset for this task, and has been widely used in the literature.

Likewise, I am satisfied with the motivation. Achieving sub-millimeter registration accuracy in 2D/3D registration is a significant achievement.

The authors report mTRE as their final metric, not DICE, which is used as an intermediate loss function.

I agree with R2: the advantage of MedSAM is not totally clear, but it does give some context to the performance achieved. I think it’s reasonable to include in the paper.

I also agree that the method is very similar to DiffDRR, but I think that the addition of segmentation-based loss functions is a significant enough contribution to warrant publication. It’s a simple trick that could be widely useful, given the results.

Overall, I think the paper is a solid contribution and should be accepted with the promised changes.

Author Feedback

We thank the reviewers for their valuable feedback. Please find the clarifications and responses as follows.

R1: About the DeepFluoro dataset. A: We appreciate and will discuss the limitation of generalization/dataset in the manuscript as suggested. There are few public orthopedic datasets with accurate pose annotations (for training or evaluation). Hence, recent studies [12,15,16] also only rely on a small cadaveric or phantom datasets for 2D/3D registration. However, our core contribution is compatible with any rigid 2D/3D registration pipeline and has been well validated. R1: Lack of clinical relevance of accuracy threshold. A: As indicated in [a], registration is considered successful if mTRE≤1 mm, with a maximum tolerance of 2-3 mm. Some orthopedic procedures, e.g., femoral osteoplasty, require submillimeter accuracy for robotic guidance [24]. This will be discussed in the manuscript. [a] Use of image registration and fusion algorithms and techniques in radiotherapy: Report of the AAPM Radiation Therapy Committee Task Group No. 132. R1: About the Dice’s limitations as a registration metric. A: Our study focuses on rigid 2D/3D registration, and there is no significant anatomical deformation. Surgical artifacts or pathology can affect both segmentation and images, and thus we believe they are not mere adverse factors for the Dice metric. We retain intensity similarity metrics but aim to enhance accuracy via including segmentation information (cf. Sec. 2.1). R1: Dense content of methodology. A: We will revise the methodology section as suggested (detailed descriptions & math), without altering core content. R1/R4: About Figs. 2 ＆ 3. A: We will simplify Figs. 2 ＆ 3 by removing unnecessary elements and crossing lines, re-arranging the algorithm’s sequential steps (i.e. stages) from left to right for clarity (R1 ＆ R4).

R2: Complexity introduced by MedSAM needs justification. A: MedSAM initially showed better segmentation than U-Net, motivating its inclusion. But it doesn’t yield better registration accuracy—though both outperform the version without segmentation guidance. This shows our framework is not dependent on a specific segmentation model. More importantly, it supports our core hypothesis: semantic consistency, not segmentation accuracy, drives performance. Hence, we retain MedSAM in our study. R2: Registration time and memory usage. A: Intraoperatively, the GPU memory per registration is 949.27MB. All tests ran on an RTX 4090 GPU, with an average time of 18.23 ± 1.71 seconds. Time varies with hardware: [15] reports 12.22 ± 2.16 s on a 4090, but 2.2 ± 1.2 s using six 2080 Ti GPUs. We will add this in the manuscript. R2: About Fig. 4/Tab. 3. We will add legends/captions of the methods in Fig. 4 (R2), correct the “+”/”-“ notation in Tab. 3. R2/R4: Contributions against [15,18,21]. A: [15, 18] ([3][2] in R4) also use two-stage projection-based strategies. [18] first applied inverse differentiable propagation to projection geometry. [15] further investigated SE(3)-based driving strategies. Our key novelty lies in differentiable segmentation-guided registration, which to our knowledge hasn’t been explored in 2D/3D registration. Classical methods with fixed segmentation cannot directly apply geometric transformations in 2D/3D registration due to dimensional mismatch. Also, non-differentiability projection image masks hinders gradient-based optimization in learning-based approaches. [21] is a fast DRR synthesis method serving a different purpose. We use it as one of many replaceable DRR generators. Thus, we believe our contributions do not overlap directly with theirs.

R4: About the prior work’s limitation. A: We will clarify the limitation of prior landmark-based registration methods, taking into account Ref [1] and related work that adopts automatic landmark annotation/detection methods. R4: About the math equations. A: The key inline equation will be modified to the display mode, without significant changes to the paper.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The authors addressed most of the important concerns and the work can be accepted.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Two reviewers who are very knowledgeable of the field recommend acceptance due to its CAI contribution and the importance of the problems. The observations of the the reviewer are mostly valid and should be addressed.

back to top

Weakly-Supervised 2D/3D Image Registration via Differentiable X-ray Rendering and ROI Segmentation

Author(s):