Abstract

In clinical practice, imaging modalities with functional characteristics, such as positron emission tomography (PET) and fractional anisotropy (FA), are often aligned with a structural reference (e.g., MRI, CT) for accurate interpretation or group analysis, necessitating multi-modal deformable image registration (DIR). However, due to the extreme heterogeneity of these modalities compared to standard structural scans, conventional unsupervised DIR methods struggle to learn reliable spatial mappings and often distort images. We find that the similarity metrics guiding these models fail to capture alignment between highly disparate modalities. To address this, we propose M2M-Reg (Multi-to-Mono Registration), a novel framework that trains multi-modal DIR models using only mono-modal similarity while preserving the established architectural paradigm for seamless integration into existing models. We also introduce GradCyCon, a regularizer that leverages M2M-Reg’s cyclic training scheme to promote diffeomorphism. Furthermore, our framework naturally extends to a semi-supervised setting, integrating pre-aligned and unaligned pairs only, without requiring ground-truth transformations or segmentation masks. Experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset demonstrate that M2M-Reg achieves up to 2× higher DSC than prior methods for PET-MRI and FA-MRI registration, highlighting its effectiveness in handling highly heterogeneous multi-modal DIR. Our code is available at https://github.com/MICV-yonsei/M2M-Reg.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0172_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MICV-yonsei/M2M-Reg

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ChoKyo_MonoModalizing_MICCAI2025,
        author = { Choo, Kyobin and Han, Hyunkyung and Kim, Jinyeong and Yoon, Chanyong and Hwang, Seong Jae},
        title = { { Mono-Modalizing Extremely Heterogeneous Multi-Modal Medical Image Registration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {435 -- 445}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    I think the main contribution of this paper is first it provides a multi modality based registration framework;within this framework, the author proposed a technique for consistency regularization called GradCyCon. In th experiment section, the author reveals imrpoved performance for Pet-MRI and FA-MRI registration.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I think the main novelty comes from the novel formulation part. For example, the author proposes a mono-modal cycle similarity loss in equation 2. And the novel gradient cycle consistency gradcycon in eqaution 3.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    I think the main weakness is the novelty of the propsoed framework is low, the strucuture of the proposed M2M-Reg looks similar to the CycleGan[1], and esepecially the design of the mono-modal cycle similarity loss is similiar to the cycle-consistency loss in CycleGan.

    Other weakness include the figure2 to describe the overview of the M2M-Reg framework is very confuse to me, I cannot figure out the input and the final output of each step to understand the whole pipeline. Other weakness include the only few methods comparasions in the experiment section, such as transmorph, Corrmlp and GradICON has been compared.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors to reject this paper is first the novelty of this paper is limitied, second the writing, figures and tables seem confused to me. Third is the experiments conducted in this paper is limited.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents a novel approach for solving the multimodal deformable image registration problem by introducing a cyclic structure to compute similarity between images from different modalities. This effectively reduces the problem to a mono-modal one, allowing the use of standard similarity measures such as the correlation coefficient.

    A second contribution is the introduction of a regularization term applied to this cyclic structure, extending the previously proposed GradICON regularization [paper 29] to better constrain the deformation field.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a novel formulation for multimodal image registration by reducing it to a mono-modal problem using a cyclic structure. This contribution appears original and is well-motivated.

    The cyclic regularization introduced extends the previously proposed GradICON regularization [29], adapting it to handle the additional transformations introduced by the cyclic architecture.

    Experimental results support the effectiveness of the proposed similarity formulation (M2M), particularly in the context of PET/MRI registration, where standard similarity metrics often struggle. While the added value of the cyclic regularization is less consistently demonstrated, some cases do show improved performance with it.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The presentation lacks clarity, particularly regarding the concept of the “bridge pair” of images. The paper states that an additional pair (S′,T′) is “sampled,” but does not explain what this sampling entails. It is unclear whether these images are related to the original pair (S,T) or are entirely independent. The notation suggests a relationship, which adds to the confusion.

    Claims about the generalizability of the proposed method to other multimodal scenarios should be softened, as the evaluation is limited to brain imaging data, specifically PET/MRI.

    The ablation study is insufficient. The current similarity formulation includes four components (two forward passes: S→S′, T→T′; and two backward passes: S′→S,T′→T). It is unclear whether all four are necessary. An analysis of the impact of each component—e.g., testing only forward passes—would provide better insight into the formulation.

    The idea of using surrogate or intermediate images for registration resembles symmetric registration strategies, such as those proposed by Beg and Khan (2007), which also regularized transformation fields via intermediate mappings. discussion connecting the proposed method to this prior line of work maybe valuable.

    Beg, Mirza Faisal, and Ali Khan. “Symmetric data attachment terms for large deformation image registration, IEEE transactions on medical imaging 26.9 (2007): 1179-1189.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The title “Mono-Modalizing Extremely Heterogeneous Multi-Modal Medical Image Registration” is somewhat confusing. The term mono-modalizing is may suggest approaches like extracting representation or modality-independent descriptors (e.g., Wachinger & Navab, 2012), which is not directly what the paper proposes. A more precise title could better reflect the cyclic similarity framework for brain image analysis introduced in this work.

    Wachinger, Christian, and Nassir Navab. “Entropy and Laplacian images: Structural representations for multi-modal registration.” Medical image analysis 16.1 (2012): 1-17.

    Heinrich, Mattias P., et al. “MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration.” Medical image analysis 16.7 (2012): 1423-1435.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel and well-motivated approach to multimodal image registration by reducing it to a mono-modal problem via a cyclic similarity formulation. This idea is original and promising, especially for challenging cases like PET/MRI. However, the paper suffers from presentation issues, particularly a lack of clarity around key components such as the “bridge pair” . The cyclic regularization, while conceptually sound, is an extension of the previous gradICON work. Overall, the contribution is interesting but requires clearer exposition to reach its full potential.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a solution for multi-modal registration in the context of very distant domains, in particular PET-MR or FA-MR. In such case, they argue that established multi-modal similarity measure are not robust enough to such large domain gaps. Hence, they propose to leverage an additional “bridge pair” (S’, T’) so that, by performing 3 extra registrations (T->S’, S’->T’, T’->S) and composing deformations, the deep registration network can be supervised solely via monomodal similarity measures. In addition, the authors propose GradCyCon, a regularization term building on GradICON exploiting the introduced circular registration (S->T->S’->T’->S). Finally, the authors propose a way to leverage already registered pairs of images for the bridge pair, formulating as such a semi-supervised framework. The performance of the proposed methods is evaluated on the ADNI dataset. Each contribution is shown to yield improved alignment with more regular deformations, i.e. improve the registration.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and easy to follow. This paper tackles a challenging and clinically relevant problem. I found the proposed solution simple, original and elegant. M2M and GCC are introduced into different recent deep registration models and shown beneficial on 2 brain registration cases where standard approaches fail. The semi-supervised setting is a nice “bonus” to the paper. Overall, I found the paper very interesting and convincing.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper is overall very strong and I can only mention here minor weaknesses.

    I found that the paper lacks a bit of clarity regarding the relationship between the different deep registration networks 2.2 and 2.3. Is it the same network performing the registration from domain S to domain T and from domain T to domain S or are these 2 different uni-directional networks jointly optimized?

    The proposed M2M trick and GCC is also relevant for traditional optimization based registration methods. Even though I am aware that this would probably require a fair amount of work, I would have find it very interesting to incorporate this approach into NiftyReg or ANTS or just even a simple FFD model re-implemented in python.

    I believe that the paper would benefit from a few comments on the limitation of the approach. For instance, as the bridge pair has to come from another patient, I assume this approach can only be relevant for parts of the anatomy that can be well registered across different patients. Healthy brains meet this criteria but in the presence of tumors or considering the abdominal cavity would probably be more problematic due to the challenging edges T->S’ and T’->S. Also, could this also improve smaller domain gaps (e.g. MR->CT)?

    Finally, while I found the proposition of the semi-supervised approach interesting, it would be nice to give the reader a bit more intuitions regarding why it helps the model. It was not super clear to me.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (6) Strong Accept — must be accepted due to excellence

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I found this paper very interesting as it proposes a simple, yet original and effective solution to a clinically relevant problem.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors clarified the confusing points mentioned in my review in their response. I believe that the mentioned modifications will further improve the submission. I reaffirm my belief that this paper is valuable for the MICCAI community and the medical image registration community in general.




Author Feedback

[R1]

How are bridge pairs sampled?

In multimodal DIR, an unaligned cross-modal pair (S, T) is drawn at each iteration. Similarly, another unaligned pair (S’, T’) is independently drawn from the training set as the bridge pair. The notations S and S’ simply denote the same modality. We’ll clarify this further.

Need all 4 mappings? (e.g., using 2 only)

Thank you for highlighting this point. We tested such strategies before, but they underperformed or failed to converge. Due to the rebuttal guideline, we can’t show results, but explain why such options lead to unstable gradients. 1) 2 forward passes (S→T→S′, T→S’→T′): Edge contribution to loss is imbalanced. T→S′ is used twice, T′→S is never used. 2) 1 forward + 1 backward pass: There are two options. (S→T→S′, T′→S→T) uses edges unevenly, as in (1); (S→T→S′, S′→T’→S) evaluates only one modality (S) per iteration randomly. In contrast, M2M-Reg ensures balanced updates and utilizes all 4 edges computed for GradCyCon, promoting stable training (Sec. 2.2).

Relation to symmetric strategies?

The mentioned method, Consistent Midpoint Cost (CMC), compares a single matching (S→M←T) at an implicit midpoint (M) of the flow. In contrast, M2M-Reg jointly optimizes multiple asymmetric matchings (S→T, T→S’) via an explicit image (T). Still, presenting M2M-Reg as matching via a surrogate image helps convey the intuition. We’ll add a discussion on CMC.

Scope beyond brain imaging? “Mono-modalizing” unclear?

We’ll limit claims to brain imaging and revise the title, e.g., “Mono-modal Cycle Similarity for …”

[R3]

Fig. 2 (input/output) is confusing?

The network takes a source-target pair and outputs a deformation—a well-established standard in this field (Sec. 2.1). Based on this, Fig. 2 shows a conceptual training overview with 4 input images and mappings via the network, not a sequential pipeline. We’ll add clarification.

Similar to CycleGAN?

M2M-Reg’s core idea is to avoid direct cross-modal comparison (S→T) and instead learn from mono-modal comparisons via an extra mapping (S→T→S′). This can be achieved without a cyclic structure, but we designed a new cyclic framework for stable training and diffeomorphism (Sec. 2.2), fundamentally different from CycleGAN. If our similarity were implemented in a CycleGAN-like way, we’d apply a looped mapping (e.g., S→T→S) to S and compare it with itself—causing the model to collapse to identity mappings (Sec. 2.2). Instead, we use separate images S′ and T′ to compare S↔S′ and T↔T′, avoiding self-comparison and collapse—making M2M-Reg clearly distinct from CycleGAN.

Too few baselines?

We boost all models built on major SOTA architectures (ViT, CNN, MLP) and regularizers (Sec. 3). Suggestions for expected additional baselines would be appreciated.

[R4]

Separate networks for S→T, T→S?

Like prior works, e.g., MultiGradICON, we use a single bi-directional network that handles both efficiently. We’ll clarify this.

Applying to traditional methods?

Thanks for the suggestion. While not included in this DL-focused work, we agree it’s a promising direction for instance optimization and plan to explore it in future work.

1) Anatomical variations? 2) smaller domain gap?

1) We agree this is an edge case. It remains an open challenge for existing methods as well in inter-subject registration. We’ll add it as a limitation. 2) Some improvement is likely, as similarity metrics may not fully capture registration quality, though the gain may be modest since MR↔CT is already handled to some extent.

Semi-sup.: more intuition?

M2M-Reg learns via indirect comparison through two mappings. In the semi-sup. setting, the bridge mapping (S′→T′) is fixed as identity, enabling a direct comparison (e.g., T→T′) with just one mapping (e.g., T→S′), simplifying optimization. Since the cross-modal mapping T→S′ can be evaluated using its same-modality counterpart T′, the model can learn with a reference for cross-modal correspondence. We’ll add this to Sec. 2.4




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviews for this paper are quite varied and contradictory. One of the reviewers (#3) seems to have completely misunderstood the paper, and does not offer constructive criticism (contrary to what they say, this paper has nothing to do with CycleGan). The two other reviewers agree that the proposed formulation is novel, useful and effective. The main criticism of R#1 lies in the clarity of one particular point of the presentation. I personally found this point quite clear, and I am sure that it can be made even clearer during poster/oral presentation. I found the proposed formulation extremely smart and will definitely give it a try. I am in strong favor of acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Despite some minor clarity issues, the paper introduces a methodologically novel, clinically relevant, and well-validated approach. The rebuttal further confirms that the authors are capable of addressing these concerns in the final version.



back to top