List of Papers Browse by Subject Areas Author List
Abstract
Multi-modal medical image registration integrates complementary information from various modalities to deliver comprehensive visual insights for disease diagnosis, treatment planning, surgical navigation, etc. However, current methods often suffer from artifacts, computational overhead, or insufficient handling of modality-specific interference. Moreover, they still rely on specialized modules, such as generative trans-modal units, additional encoders, or handcrafted modality-invariant operators, without fully exploiting the inherent potential of registration features. To address these drawbacks in multimodal medical image registration, we propose a novel registration framework. First, a plug-and-play architecture is proposed to directly process multi-scale heterogeneous features, with active guidance only during deformation field generation stage. Second, we introduce a multi-view feature reorganization module that dynamically optimizes feature distributions via adaptive relation computation and global calibration. Finally, an in-network modality removal module is introduced to leverage multi-scale adaptive convolutions to explicitly eliminate modality-specific interference. Extensive experiments on the BraTS2018 and Learn2Reg2021 datasets confirm that our proposed method achieves state-of-the-art performance on multiple multimodal medical image registration metrics. (https://github.com/St-Antonio/DGMIR)
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1691_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/St-Antonio/DGMIR
Link to the Dataset(s)
N/A
BibTex
@InProceedings{LeGao_DGMIR_MICCAI2025,
author = { Le, Gao and Shu, Yucheng and Qiao, Lihong and Yang, Lijian and Xiao, Bin and Li, Weisheng and Gao, Xinbo},
title = { { DGMIR: Dual-Guided Multimodal Medical Image Registration based on Multi-view Augmentation and On-site Modality Removal } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15960},
month = {September},
page = {155 -- 165}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposes a multi-modal deformable image registration method. The method focuses on the manipulation of CNN-extracted features of the two different modalities within the deformation decoding process, via the attenuation (“feature reorganization”) and adaptive mean-filtering (“modality on-site removal”) of the features. The feature processing modules are embedded in a mulit-resolution deformation modeling scheme. The method is evaluated on brain T1-T2 registration and abdominal MR-CT registration using public challenge datasets, showing some improvements over baselines.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The analysis and reasoning on the challenge in multi-modal image registration due to different visibility of semantic information is well-written.
-
The experiments include multiple datasets and anatomical regions. Good amount of experimental details are provided. A good range of baselines are included. A detailed ablation study is also included.
-
The paper is well-written in general, with very clear wordings and easy-to-understand figures.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The two main feature processing components in the deformation decoding, namely the Multi-view Feature Reorganization Guidance (MFRG) and Modality On-site Removal Guidance (MORG), are designed with somewhat arbitrary or questionable reasoning. The paper claims certain benefits from the proposed components. But it is unclear how the modules will learn to realize them.
- For MFRG, the module seems to be akin to attention-gating using correlation between global and average pooling. It is unclear to me how this global attenuation of the multi-modal features can address the problem of “huge gaps in feature distributions between channels” and representation enhancement. Even though mechansim-wise the operations can attenuate the features in each channel (by scalar-scaling?), there is no guarantee that the image similarity driven training is addressing the distribution difference.
- For MORG, it is unclear what the authors mean by “uniformly distributed modality features”, which is regarded as a major problem of multi-modal registration by the paper. The proposed module essentially learns a set of adaptive mean-filters with their output removed from the features, which means high-frequency features are retained over low-frequency data. This carries the inherit assumption that low-frequency data is modality-specific and high-frequency data is modality-invariant. It’s unclear to me that this assumption is valid and consequentially how the proposed design is beneficial to multi-modal registration.
- Although the paper includes extensive experiments with two different datasets, some of the detailed settings seem to significantly limit the effectiveness of the evaluation for me.
- The authors only provided the values of the hyper-parameters in the loss functions for thproposed method but not for the baseline methods. The regularization parameter in particular can affect the relative performances between methods significantly, since it often controls the trade-off between Dice and negative Jacobian. From Table 1, we can see a few methods that are marginally worse in Dice but has better negative Jacobian (e.g. CorrMLP). This makes the performance not directly comparable. To me, the benefits of the proposed method can only be shown when the negative Jacobian is in a very comparable range vs. baselines.
- The size of the MR-CT dataset (Learn2Reg 2021) is very small, with only 2 scans for testing. This means the results can be very random. To alleviate this issue, the unpaired dataset can be utilized for training and even for testing for a theoretical evaluation of the proposed method via inter-subject registration.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
-
Fig 1: should consider a different notation for composition of deformation instead of “warp”. While the operations can be similar, the functional meaning is a bit different between warping images/features and deformation field.
-
The baslines in this paper only includes representation learning methods that focus on manipulation of features. Since the authors argued against other multi-modal registration approaches such as image translation, disentanglement, cross-attention, it would be good to include one or two baselines in those directions.
-
Losses of the baselines should be detailed. For example, which similarity loss is used for the baselines?
-
The BraTs2018 dataset only contains segmentation for tumour structures, which has very differently visibility on T1 and T2. Evaluating with the tumour segmentation only also doesn’t evaluate the rest of the brain structures. Since the original T1 and T2 images are pre-algined and the elastic deformation are synthesised, one can evaluate registration results of the whole brain with intensity-based metrics by applying the same synthetic deformation to the T1 image and recover with the T1-T2 registration.
-
Standard deviation when statistically meaningful (e.g. BraTs) should be provided in quantitative results.
-
The authors discussed the reason that LNCC also works for T1-T2 registration. The argument given is the features after “modality removal” contains structural features resembling a unimodal distribution. This is not a valid argument to me because the LNCC is applied on the image intensities, not the features, according to the details on loss functions provided in the paper. LNCC has been found effective for T1-T2 registration in a few papers in the field. The likely reason is that the T1-T2 intensities can be approximated linearly correlate in a local window. However, this is unlikely to be true for all multi-modal registration.
-
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(2) Reject — should be rejected, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While the paper is well-presented, the design of the feature processing components for decoding/deformation modelling appears arbitrary with unconvincing reasoning and assumptions. Empirical results do not make them more convincing due to the major issues with the datasets and the comparability of the hyper-parameter tuning process.
I do not believe these major issues can be adequately addressed during the rebuttal phase. Therefore I recommend rejection.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I appreciate the efforts and interesting comments from the authors in their rebuttal.
The authors used the ablation studies to justify the reasoning behind both the MFRG and MORG modules. However, the ablation studies only indicate that these module improves performance over baseline. This is not direct evidence of the effect of the specific modules in the way that the authors claim, i.e. distribution difference reduction and “modality removal”. Ablation studies like this can also be affected by things such as training procedural and difference in network capacity, which make the results further noisy.
However, the additional experiments that the authors mentioned in rebuttal are actually very insightful and important to me. 1) the multi-modal feature distribution difference measured by Wasserstein-2 distance actually shows some link between distribution difference and registration performance; 2) the frequency decomposition associated with modality-specific reconstruction. These are actually very interesting results for feature-engineering in multi-modal registration research. The inclusion of these results are critical to provide at least intuitive justification for the feature manipulation designs.
I therefore recommend acceptance of the paper, on the condition that the authors scale back the direct causal claims on how the feature-engineering is benefiting the registration in writing, and include the insightful discovery results to provide intuition for model design.
Review #2
- Please describe the contribution of the paper
This paper proposed a new framework for multi-modal registration. The key contribution is to propose a Multi-view Feature Reorganization module to adjust the feature distribution and a Modality On-site Removal module to remove modality-specific feature.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- A novel way to extract modality-invariant feature.
- The paper is well written and structured.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The authors did not clarify how the modality specific parameters mα and mβ are used and learned. (Are these two parameters are vecors or matrix? And how is the adjustment process is performed?)
- Small variations in contrast are considered in the experiments where only T1w to T2w registration is performed.
- The ablation studies show limited justification for the detailed design for MFRG and MORG. (e.g. for max pool and avg pool in MFRG, different type of convolutions in MORG).
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Ablation studies on the detailed designs of MFRG and MORG would strengthen the justification for their architecture as there are so many components within the two modules.
- Statistical tests and standard deviations should be provided.
- There seems to be a small error in Fig. 3 (Right), where ‘Wi’ should likely be ‘Mi’?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a novel way to extract modality-invariant feature but lack of justification for their specific design. The experiments are confined with small contrast variation (T1w-T2w) instead of different imaging modalities (e.g., CT, ultrasound, MR).
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I thank the authors for their feedback.
The authors addressed my concerns and I am happy to accept the paper.
Review #3
- Please describe the contribution of the paper
- Presents a flexible plug-and-play architecture that directly processes multi-scale heterogeneous features.
- Introduces the MFRG module, which dynamically optimizes feature distributions. It combines global max pooling and average pooling to comprehensively characterize feature distribution, and uses a global calibration factor to guide holistic feature expression.
- Proposes the MORG module to explicitly perform modality removal within the registration backbone.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Introduces the MFRG module, which dynamically optimizes feature distributions.
- Proposes the MORG module to explicitly perform modality removal within the registration backbone.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- MORG may not be the first attempt to explicitly perform modality removal. Such as《Enhancing medical image registration via appearance adjustment networks》, this paper adjust the modality via appearance transformation map. Both use addition or subtraction to transform the modality.
- is the DCS socre same as L_dice? Evaluation methods and loss functions should not be the same.
- The structural design of the MFRG module is not explained very clearly, especially the col-wise sum and row-wise sum, is there a source for such a design?
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- in ablation study Baseline + MORG’s performance is close to Baseline + MFRG + MORG? Poor performance of MFRG modules?
- Does the loss function be applied to each stage (i=1, 2, 3, 4)?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The article’s experiments are complete and the process is clear. But the innovativeness of the article is limited, and there have been articles that have attempted multimodal alignment using a similar approach (similar to MORG).
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
On behalf of my co-authors, we appreciate the reviewers for the constructive comments. However, there are some issues and misunderstandings that we would like to clarify:
About MFRG [R1 (Reviewer 1), R3]: Thank you for raising this question. Although the basic operating logic of MFRG can indeed be viewed as an attention-gating mechanism, we tailored it specifically for cross-modal registration. We believe that the mean and maximum, two key statistics within each channel, can offer different views to guide feature modulation. Therefore, cross-modal channel interaction is performed by calculate the association matrix between the two views. Inspired by 10.1109/CVPR.2019.00314 (CVPR, 2019) and 10.1016/j.neunet.2024.106314 (Neural Networks, 2024), we calculate sum over the rows and columns of the association matrix, which “efficiently captures second-order correlations between the channels“. In addition, we add a global learnable scalar factor to help extract complementary information from both views, providing a more discriminative source for channel modulation. This mechanism does help mitigate the feature gap between different modalities, as confirmed by our ablation studies. In fact, in preliminary tests we computed the 2-Wasserstein distance between the two modal features to reflect their distributional difference. We observed that the distance was effectively reduced after this module. Due to space and policy constraints, these results were omitted from the manuscript and rebuttal. However, we look forward to releasing all related materials after the double-blind review.
About MORG [R1, R2, R3]: R1: We strongly agree with your insightful view, which aligns with our preliminary tests. By decomposing the image into high and low-frequency components and reconstructing it only with the high-frequency part, we found that the modality-independent features can be effectively preserved. We also found the modality feature exhibits smaller gradient variations and a relatively uniform distribution. Based on these observations, we propose the MORG, which is simple yet highly effective, by leveraging adaptive mean convolution across multiple scales. In the manuscript, we have validated the effectiveness of the MORG and adaptive mean convolution via ablation study, demonstrating the rationality of the module. R2: mα and mβ are two learnable parameters to guide global feature extraction by Hadamard product with features. R3: Thank you for the reminder. Our intention was to emphasize that this work is the first to explicitly perform modality elimination in the flow decoding process, without relying on any auxiliary network modules. We will clarify this point in future revision.
About the Loss function [R3]: In fact, using the L_dice as optimization objective has become a standard practice in many recent works (e.g. TransMorph, GroupMorph, etc.). More importantly, to ensure fairness, we applied same losses to all methods in our experiments. We hope this clarification addresses your concern.
About the experiments [R1, R2]: Thanks for pointing it out. For fairness, all compared methods are implemented using their original source code without any modifications, and all hyperparameter settings are kept unchanged across all tests. Therefore, we did not intentionally manipulate the trade-off you mentioned in any way. More importantly, this trade-off phenomenon is very interesting, and indeed, most existing works, included this work, have not explored it in depth. It has inspired a new research direction, and we plan to investigate it in future work. For R2, we also provided experiments on CT-MRI in the manuscript. However, we only provided the quantitative results due to space limitations.
Thanks again for the valuable comments. We hope that our clarifications could resolve your concerns, and we believe our work is beneficial to the community and will bring inspirations to multi-modality medical image registration researches.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
All three reviewers are positive to accept this work after the rebuttal. Following these ratings, I think this work can be published in MICCAI 2025.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A