Abstract

The multi-scan imaging procedure, involving both multi-modal and multi-timepoint scans, captures temporal changes and complementary cross-modality information, playing a key role in clinical diagnosis. Multi-scan image restoration (IR), which leverages high-quality reference scans to aid in restoring degraded current scans, holds significant potential for reducing the cost of the multi-scan procedure. However, misalignment between scans, arising from patient physiological or posture changes, impacts the ability of networks to exploit cross-scan correlations and leads to declined restoration performance. To this end, we propose a plug-and-play Bridge-Based Module for Misalignment Estimation and Elimination (BME2), which adopts a coarse-to-fine strategy to estimate cross-scan misalignment. Specifically, a lightweight misalignment estimation (ME) network first predicts the initial deformation fields, which are then iteratively refined via a latent Schrodinger bridge-based model to obtain the final estimation. Notably, BME2 can be added to arbitrary backbones and only introduces mild computational costs. Validated on brain MRI and abdominal CT datasets, BME2 universally enhances four baselines, achieving average PSNR gains of 0.54 and 0.65 dB on brain and abdominal data, respectively.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1012_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/ChenWenxuan2021/BME2

Link to the Dataset(s)

https://brain-development.org/ixi-dataset/

BibTex

@InProceedings{CheWen_BME2_MICCAI2025,
        author = { Chen, Wenxuan and Jiang, Caiwen and Song, Xiaolei and Shen, Dinggang},
        title = { { BME2: A Plug-and-Play Bridge-Based Module for Misalignment Estimation and Elimination in Multi-Scan Image Restoration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {64 -- 73}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors introduce a ‘Bridge-Based Module for Misalignment Estimation and Elimination’ for the alignment of various scans of the same subject. This would allow to perform e.g. one high-quality scan of a patient while the consecutive scans could be performed with a lower dose and less image quality. Using this module could match these two scans and increase the image quality of the second scan. The authors first introduce a Misalignment estimation network estimating the deformation fields between one scan to the other. A Schrödinger-bridge-based model is the employed to refine the estimated deformation fields.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method of using the Schrödinger bridge to improve the previous estimated deformation field is new and an interesting approach. The paper is well structured and written. In general, the results underline the usefulness of the approach. However, I am a bit concerned on the datasets used (see below)

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    A more detailed explanation of the performed steps would help to increase the understanding of the paper. Especially the explanation of the Schrödinger bridge would benefit from more details: What exactly is the hyperparameter sigma?

    It is not clear to me where the denoising step is integrated in the Schrödinger bridge and how this is done exactly. Please explain this step more in detail.

    The method is used on T2/T1 MR images and on CT images of the arterial/venous phase. T2 Images are down-sampled and disturbed, and images from the venous phase are degraded downsampling the sinogram. I feel that these datasets are not very well suited for the task as the biological information in all images remains the same. However, the reason to perform multiple scans of one patient has often the reason to assess e.g. therapy success. This means that e.g. a tumor should change its form and size from one image to another. Also often patients loose weight from one scan to another what has of course an impact on the image. I would therefore recommend to use another dataset that contains images from real different time points where a change in e.g. tumor size can be observed. See e.g. here: https://www.cancerimagingarchive.net/collection/rhuh-gbm/

    Moreover, the method is compared with SwinIR and Restormer. Are both networks translated to 3D before applying them? Otherwise the comparison would be somewhat unfair. Especially the restormer is straight-forward to translate to 3D (see e.g. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0318992).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel and interesting. However, I miss some details to fully understand and reproduce the approach. Also the datasets used could be extended to a third dataset showing biological changes between the images.

    I would also recommend the authors to calculate SSIM and PSNR on some image parts only (on image parts containing important information such as brain tissue) as these metrics tend to be over-optimistic for image with a large amount of background such as brain MR scans.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors provide a novel contribution to the multi-view registration problem. Especially in case of a reference image and distorted follow-up images, registration is key to compare pre-post or temporal differences. Here, the authors provide an end-to-end drop-in replacement for registration models based on the Schrödinger bridge formulation. The authors can show that using their method outperforms naive implementations (quantitatively using PSNR, SSIM and qualitatively).

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I like the idea behind the paper, using Schrödinger bridge to the multi-view problem. The authors not only use multiple datasets (namely IXI and Abdominal CT), but also different registration backbones, namely MINet, McMRSR, SANet and DANCE. The authors provide an ablation study showcasing the importance of the additional loss terms (cross-correlation and smoothness) and the removal of the intermediate Schrödinger bridge. Only in case of both elements present, they outperform the baseline implementation.

    The Schrödinger bridge is a nice concept that has been around in the ML domain lately and shows nice appeal, esp. in Diffusion-based approaches. It is, however, barely used in an image registration context, therefore I like to see it here and think it is nice to see around.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    In general, the paper is nicely structured and build up logically. However, I actually miss in the introduction, why the Schrödinger bridge is a solution to the problem. I do see related works, but the application of Schrödinger’s bridge does not come with a good argument.

    In addition, Fig. 1 is very hard to comprehend, as a lot of abbreviations and characters are shown without clear description. I was looking in the text and re- and cross-referencing until I found all elements. This can be largely enhanced as the reader by seeing Fig. 1 maybe does not understand what is going on.

    Unfortunately, the delta between the baselines and the presented method is relatively low, SSIM ~ 0.001 - 0.004 is very common, also in PSNR ~0.4-0.5. I do not see any statistics (e.g. SD or other error bounds). I assume in Table 1 I see the mean across the test dataset of 100 volumes or 16 stacks, respectively, but also this hasn’t been clearly mentioned.

    The ablation study is crucial for the paper to showcase the importance of the loss terms, as well as the Schrödinger bridge itself. However, I am missing this analysis for the IXI dataset and for other methods. At least, the authors should state the results are comparable across methods and datasets. Again, without SDs, especially SSIM seems to be insignificant in Table 2.

    The efficacy study in Table 3 seems to be OK, but not largely relevant to this study. I would have loved to see the effect of the implementation, how many levels, ablation of elements in the Schrödinger bridge U-Net, and so on. Also the relative setting of tau (tau = 8 => 1/8 sampling step) seems to be arbitrary chosen and no motivation has been given, similar to the loss coefficients being both 0.1.

    In addition, I am missing referencing some closer related work of Schrödinger bridge w.r.t. image registration, even though these studies are not doing “the same”, but maybe go into similar directions, e.g. https://arxiv.org/abs/2501.14171 and https://link.springer.com/chapter/10.1007/978-3-031-66958-3_21

    This reviewer is not affiliated with any of the studies mentioned above and only hints the authors in a specific direction and does not expect the authors to cite any of the given literature.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    In Figure 2, I do not know what the lower inset shows. It ranges from 0-0.3, but there is no description. Please indicate and also reconsider the color scheme. The “jet” colormap is very unfortunate for scientific purposes.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the idea of the paper is nice and novel, and the community would benefit from probabilistic methods in image registration. However, there are a couple of points, such as the missing statistical power, that hinders me to be fully convinced by the authors’ presentation. I think the authors can make a stronger point by testing more their model and its dependence on hyperparameters than looking at the FLOPs.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is the proposed BME² module, which addresses the problem of misalignment between reference and degraded scans—a key factor contributing to artifacts and suboptimal performance in multi-scan image restoration.

    The BME² module adopts a coarse-to-fine strategy and can be integrated into arbitrary multi-scan image restoration backbones with mild additional computational cost. The first component is a lightweight U-shaped misalignment estimation network that predicts an initial (coarse) deformation field, making it well-suited for multi-scale backbone architectures. The second component performs iterative refinement using a latent Schrödinger bridge-based model, which overcomes the limitations of applying diffusion models directly for misalignment estimation—namely, high computational overhead and hallucinated deformation fields.

    In summary, the BME² module effectively handles misalignment between degraded and reference scans, is easily embedded into various backbone networks, and the authors demonstrate consistent performance improvements when the proposed module is embedded into different backbone architectures.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A novel approach to deformation field prediction The authors introduce the concept of the Schrödinger bridge to predict deformation fields, addressing two key limitations of diffusion model (DM)-based methods: high computational cost and the generation of hallucinated deformation fields. This appears to be the first application of the Schrödinger bridge framework for deformation field prediction, offering a novel and promising direction in the field.

    2. Comprehensive evaluation The paper presents a thorough evaluation to support both the effectiveness and the computational efficiency of the proposed modules. Experimental results demonstrate consistent performance improvements across four recent backbone networks on two datasets, covering both rigid and non-rigid misalignment scenarios—highlighting the method’s versatility. Furthermore, the ablation study confirms the contribution of the Schrödinger bridge-based refinement, and the visualizations effectively illustrate the progressive improvement of the deformation fields from coarse to fine.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Unclear motivation for using diffusion model (DM)-based In the introduction, the authors state that “advancements in diffusion models (DMs) provide new insights into overcoming the limitations of current multi-scan IR methods,” and mention that several studies have successfully applied DMs to predict misalignment fields. However, the advantages of DM-based approaches over other categories of methods are not clearly explained. While the reader might be able to find justifications in the cited references, it would be more helpful if the authors explicitly discussed the benefits of using DMs in this context. For instance, do DMs yield more accurate deformation field predictions? Do they offer computational advantages? Or are they particularly effective in handling large or complex misalignments?

    2. Unclear rationale behind the design of the misalignment estimation (ME) network In Section 2.1, the authors provide a detailed description of the ME network architecture, but the reasoning behind specific design choices remains unclear. In particular, it is not evident why the upsampled deformation field from the previous level needs to be concatenated with the features at the current level. For comparison, the baseline method DANCE generates deformation fields independently at each level without such interaction. Are there specific advantages or motivations for this architectural decision? If this design is inspired by or similar to prior work, it would be helpful to include relevant citations to clarify this point.

    3. Limited qualitative results on the brain dataset While the authors present convincing qualitative results for the abdominal CT dataset, the qualitative evaluation on the brain dataset is limited. Given that the brain dataset involves T1-weighted (T1w) and T2-weighted (T2w) images with distinctly different contrasts—unlike the abdominal CT dataset, which appears to have consistent contrast across scans (as shown in Figure 2)—it would be beneficial to include more qualitative results that demonstrate the effectiveness of the proposed method in multi-contrast settings. Although I understand that this may be due to page limitations and that additional results cannot be provided during the rebuttal phase, I encourage the authors to include more balanced qualitative evaluations in future revisions or follow-up publications.

    4. Ambiguity regarding single-scale vs. multi-scale compatibility of the BME module At the beginning of Section 2, the authors mention that “both single-scale and multi-scale backbones are applicable, and herein we mainly introduce the more complicated case with multi-scale features.” However, since the proposed ME network is U-shaped, a single-scale setting might lose some benefits of this architecture, such as integrating hierarchical features. Upon reviewing the architectures of MINet and SANet—both of which are single-scale but multi-stage—the experimental results show that the BME² module performs well when integrated into these networks. To avoid confusion, it would be helpful for the authors to clearly specify whether each baseline network is single-scale or multi-scale. This would clarify how the experiments support the claim that the proposed module is applicable to both types of architectures.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Please describe the limitations of the proposed module.
    2. Please ensure that the running title is displayed correctly. Currently, it appears as “Title Suppressed Due to Excessive Length.”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main factors that led me to recommend a weak accept are as follows:

    1. The proposed method offers sufficient novelty, appearing to be the first work to apply the Schrödinger bridge framework to deformation field prediction. Although the motivation for using latent diffusion model-based methods could be more clearly articulated, the experimental results demonstrate the effectiveness of the approach.
    2. The evaluation is comprehensive and supports the paper’s core claims.
    3. Overall, the writing, figures, and organization are mostly clear and well-structured, although the qualitative results are somewhat unbalanced.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We would like to thank the reviewers for their constructive comments. Herein, we response to the main concerns and will revise the paper correspondingly in the camera-ready version.

[Common Concerns]

  1. Motivation for using Schrodinger bridge Diffusion model (DM) and its derivative models, e.g., diffusion bridge, have gained recognition for their exceptional fitting capabilities. Existing studies have successfully employed DMs for multi-scan IR tasks to synthesize high-quality outputs. This also inspires us: Can we use DMs to synthesize a refined deformation field? Based on this assumption, we further explore two main aspects: (1) adopting Schrodinger bridge rather than traditional DDPM starting from white noise to reduce the number of iterations, and (2) detailed design for each sampling step. In the camera-ready version, we will revise the “Introduction” part and incorporate relevant citations to better elucidate our rationale for selecting diffusion models.

  2. About the datasets Our experiments utilize a brain MRI and an abdominal CT dataset, representing rigid and non-rigid misalignment scenarios, respectively. We appreciate Reviewer 1 for the valuable suggestions regarding the inclusion of long-term temporal variations, such as tumor position and size changes across different time points, rather than just consecutive scans from single sessions. We will acknowledge this perspective in the limitation part and would like to explore such scenarios in future work.

[Other Issues] Q1 (for Review 1): Are SwinIR and Restormer translated to 3D? A1: They are not. Actually, all experiments are conducted on 2D images.

Q2 (for Review 2): Figure 1 is too complicated and unclear. A2: We will carefully modify Figure 1 to make it easier to understand.

Q3 (for Review 2): Why is the hyper-parameter \tau set to 1/8, i.e., eight iteration steps? A3: Our pre-experiments suggest that 8 iteration steps are the best balance between performance and efficiency. Further increasing the number of iteration steps (e.g., from 8 to 16) only introduces less than 0.05 dB PSNR gains on both datasets.

Q4 (for Review 3): Rationale behind the design of the ME network. A4: The design of ME network heavily refers to CrossNet (Zheng et al., CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping, in ECCV 2018). Most modules and detailed architectures, e.g, concatenation of upsampled deformation field from the previous layer, are similar to those in CrossNet.

Q5 (for Review 3): Single-scale vs. multi-scale compatibility of the BME module. A5: The single-scale and multi-scale settings share the same network architecture; however, the middle-layer prediction of deformation field of the single-scale setting is not used as outputs, and only the shallowest-layer prediction is used. Therefore, the single-scale setting will not lose the benefit of hierarchical features. We will release the link for the source code in our camera-ready manuscript.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The paper presents a clearly new method that received positive reviews. All three reviewers gave it a “weak accept” recommendation, acknowledging the work’s technical merits while identifying several areas needing improvement before final publication. In my own reading, I also appreciated the work and I think it is a clear ‘provisional accept’. I still strongly recommend the authors to address the reviewers’ comments fully in the final version.



back to top