Abstract

Deformable image registration aims to precisely align medical images from different modalities or times. Traditional deep learning methods, while effective, often lack interpretability, real-time observability and adjustment capacity during registration inference. Denoising diffusion models present an alternative by reformulating registration as iterative image denoising. However, existing diffusion registration approaches do not fully harness capabilities, neglecting the critical sampling phase that enables continuous observability during the inference. Hence, we introduce DiffuseReg, an innovative diffusion-based method that denoises deformation fields instead of images for improved transparency. We also propose a novel denoising network upon Swin Transformer, which better integrates moving and fixed images with diffusion time step throughout the denoising process. Furthermore, we enhance control over the denoising registration process with a novel similarity consistency regularization. Experiments on ACDC datasets demonstrate DiffuseReg outperforms existing diffusion registration methods by 1.32% in Dice score. The sampling process in DiffuseReg enables real-time output observability and adjustment unmatched by previous deep models. The code is available at https://github.com/KUJOYUTA/DiffuseReg

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1193_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/KUJOYUTA/DiffuseReg

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zhu_DiffuseReg_MICCAI2024,
        author = { Zhuo, Yongtai and Shen, Yiqing},
        title = { { DiffuseReg: Denoising Diffusion Model for Obtaining Deformation Fields in Unsupervised Deformable Image Registration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes DiffuseReg, a novel registration method that utilizes DDPMs (Denoising Diffusion Probabilistic Models) to iteratively denoise the deformation field. The authors suggest that the progressive denoising process can be used to increase the transparency/interpretability of registration models. The method is evaluated on 3D cardiac MRI registration (ACDC dataset: 150 image pairs), finding that it outperforms two other diffusion-based registration models (DiffuseMorph and FSDR) by 1.32% in Dice score.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-organised, readable, and the contribution is well-stated. The idea to combine diffusion models and deformable image registration is interesting since they may potentially complement each other. Compared to existing methods, this method denoises the deformation field rather than the images. The implementation seems elegant, i.e., building on the Swin transformers (for its long-range dependencies and multi-scale hierarchical structure) and conditioning the noisy features on the fixed/moving image features.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are three main weaknesses that I would like to address.

    1. Iterative denoising process for increased transparency. I am hesitative about the iterative denoising process benefitting model transparency or interpretability (compared to having a single prediction). One reason for this is that the differences in the produced outputs at different time points are minimal (Fig. 3). How could these results benefit transparency? Even if some progression of the result is seen, does it represent how the images are deformed, i.e., moving, progressively. Or could it provide a “false sense” of interpretability? Could the authors design an experiment to study what the progression actually represents?

    2. Low speed The authors mention that the registration of their method expectedly takes longer due to the sampling method (333.2 seconds). This is substantially slower compared to other DL-methods (sub-second) and similar to the speed of iterative registration methods. Considering that most applications are time-sensitive (image-guided interventions, surgery, radiotherapy), in what applications do the authors believe their approach is valuable?

    3. Insufficient comparison to other methods. The method is compared to diffusion-based registration methods only. What about comparing to the best achievable, i.e., iterative registration and non-diffusion models like the state-of-the-art cLapIRN model? This is crucial to place the results into a bigger context. Alteratively, the findings could be compared to other reported findings on this dataset or type of data in the discussion.

    In addition, recent methods apply test-time optimization, either direct (updating the dense deformation field directly) or neural optimization (updating the networks trainable parameters) [1-2]. These methods provide results that are iteratively progressed at test time, similar to the iterative denoising process described in this paper. It would be interesting to see how these results compare to one other.

    [1] Zhu, Wentao, et al. “Test-time training for deformable multi-scale image registration.” 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021. [2] Mok, Tony CW, et al. “Deformable medical image registration under distribution shifts with neural instance optimization.” International Workshop on Machine Learning in Medical Imaging. Cham: Springer Nature Switzerland, 2023.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Important details are missing (Swin transformer patch size, image size and resolution, pre-processing steps)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The pre-trained Volume Tweening Network (VTN) provided the initlalized DVFs. I would suggest to add the result of the initialized DVFs to Table 1 or 2. This may help the reader to understand what is the contribution of DiffuseReg.
    2. What did the authors mean with ‘Users can observe registrations in real-time’? The denoising takes 333.2 seconds (Table 1), which is not considered real-time.
    3. Was there a validation set used and why (not)?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors propose iterative denoising process for increased transparency, yet, I believe their results do not demonstrate increased transparency. In addition, the evaluation could be improved, i.e., comparing the results at least to iterative registration and a deep learning model not based on diffusion.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I appreciate the explanation on the authors’ view on transparency, which helped me to understand the contribution of this paper. Also, I can accept that the registration speed can be optimized in the next steps, e.g., using the steps proposed by the authors in their rebuttal.



Review #2

  • Please describe the contribution of the paper

    The paper introduces DiffuseReg, a novel framework for image registration using diffusion models. Compared to standard learning-based algorithms that directly compute the deformation field, DiffuseReg iteratively denoises an initial guess (noisy initialisation). This differentiates from other diffusion-based methods for image registration that were focused on iterative denoising of the inputs images instead of the deformation field.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Novel idea: they reformulate the DDPM framework to apply to deformation fields instead of images.

    (2) They provide strong evaluation of the method, with ablation studies for multiple modelling choices.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The comparison with diffusion-based methods reporting only Dice metrics and excluding other learning-based algorithms seems rather limited.

    The long computation times compared to other learning-based algorithms (reported and not reported here)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Main comments:

    (1) The initialisation of the deformation field needs to be clarified: on the one hand, in the methods section, it seems that the deformation field is initialised using Gaussian noise. On the other hand, in the experiments section, it seems that it is initialised with an initial guess using standard learning based registration algorithm (at least during training). Moreover, if that is the case also for testing, is this counted when computing the execution time?

    (2) In registration not only the accuracy is important but also the smoothness (e.g., percentage of negative voxels of the Jacobian determinant). Reporting other metrics than Dice seems relevant when comparing to other methods.

    (3) It is not clear what the authors mean by “image space” and “deformation space”. Aren’t they the same? Deformation fields should lie on the same space as the fixed image, shouldn’t it?

    Minor comments: (1) alpha_t value is undefined. (2) in the abstract, the authors claim that the improvement is 1.32%, but it seems that it is 1.32 points (not %).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The formulation of diffusion models for denoising deformation fields seems novel to the best of my knowledge. It is an interesting application for the MICCAI comunity. Nonetheless, the comparative against other methods is somewhat limited – not so the strong and extensive evaluation of their method and design choices.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My opinion is to accept the paper, even if it’s just in the line between accept and reject. While I agree that the work can be improved, I think that the manuscript will be of interest for many people in the community.



Review #3

  • Please describe the contribution of the paper

    The main contribution of the paper is the proposed network DiffuseReg that generates the deformation filed using progressive denoising, which is different from prior works that manipulate noise in image space.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and the figures clearly explains the core idea of the proposed network.

    Good experiments with sufficient ablation studies that shows the performance of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One of the main weaknesses of the proposed method is that the inference time is too long (>330sec vs 0.2sec in prior works)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As mentioned above, the inference time of the proposed method is over 330 sec (>6.5min) whereas other learning-based methods only take less than 1 sec at the inference stage. One of the main motivation of using learning-based methods as oppose to traditional CV techniques is that once the model is trained, the inference would be efficient (e.g., see VoxelMorph in Ref 3). This is one of the major drawback of the proposed method.

    The paper mentions that the final results may not surpass earlier intermediate states obtained during sampling (Fig. 3) and argues that it allows user to observe the process in real-time to curate ideal outputs. While this might be true, this “feature” also adds burden to the user to “curate ideal output”, not to mention the inference time is very long. We may want to train and fine-tune a model in the way that can automatically run registration on a large number of images at inference stage.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While there are some concerns about the proposed solution, the paper has made good contributions (novel method, good experiments)

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I’ve checked other reviews and feedback from authors and will keep my original rate as weak accept.




Author Feedback

  1. Initialization of deformable field (R1): The initialization of the deformation field during the training stage follows the standard DDPM settings. In the inference stage, the initialization of the deformation field is pure Gaussian noise.

  2. Comparison of smoothness (R1) Besides higher dice, our method also achieves better smoothness than the compared methods. In terms of the percentage of the negative Jacobian determinant (NJD, the lower the better), our method achieves 1.74%, whereas FSDR and DM achieve 1.80% and 1.78%. A lower NJD metric is better. In terms of the standard deviation of the Jacobian determinant (SDJ, the lower the better), our method achieves 0.51, FSDR 0.54, and DM 0.53.

  3. Difference between image vs. deformation space (R1) For 3D MR images, image space is three-dimensional, describing the values of the voxels. Deformation space is five-dimensional, describing the sampling direction (3D) of each voxel in 3D space.

  4. Long inference time (R3&4): Our method is in parallel to the efficient diffusion sampling method eg DDIM, which can reduce the inference time from 6.5 mins to 0.5 mins without compromising the performance. For further acceleration, we can adopt a distillation-guided method to reduce the sampling time to 3s, which becomes competitive with the end-to-end DL methods of 0.20s without any performance decline. Importantly, rather than focusing on the inference speed, our major scope is to provide end users (eg clinicians, radiologists) with the opportunity to control the registration process by adding prior information to guide the sampling process via a classifier-guided diffusion scheme (eg control the smoothness of the deformation field, the general direction of deformation, the target organ for registration), where previous end-to-end DL model fails to achieve. Therefore, when the generated deformation field is relatively stable, the inference process can be manually stopped to further save the inference time. As shown in Fig. 3, the inference can be chosen to be terminated when the sampling has only been performed for 400 steps (T=1600).

  5. More compared methods (R1&4): For a fair comparison, we focus on comparing with diffusion-based or -relative registration methods, i.e. FSDR and DM. Both of them claim to be a diffusion-based method. Moreover, FSDR and DM have already compared themselves with non-diffusion-based DL registration methods, i.e. VM, VM-diff, and CycleMorph, and show significant improvements. Since our model can surpass FSDR and DM, it can also outperform those DL methods. We also didn’t compare with iterative registration methods since the sampling stage in diffusion-based methods is different from the iteration of deformation fields. Specifically, the sampling process is a gradual denoising of a deformation field, and it does not involve deforming the moving image during the process. In contrast, iterative methods require deforming the moving image at each iteration and then using the warped moving image for the next iteration. In other words, iterative methods need to deform the moving image multiple times and aggregate multiple deformation fields, hence they are complementary to each other.

  6. Transparency (R4): We follow the previous work [1] by defining the transparency of our method as the ability that it can accept external control or intervention from users during inference (as elaborated in our fourth response). Non-diffusion-based end-to-end DL methods, such as VM, and CycleMorph directly output a deformation field, lacking the kind of process that diffusion-based models have, where external control can be applied. Therefore, traditional DL methods lack transparency. [1] The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16(3). 2018

  7. Implementation details (R4): Swin Transformer patch size is 2, image size is 32128128, preprocessing steps is center crop and z-score norm.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers recommended weak accept after the authors’ rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers recommended weak accept after the authors’ rebuttal.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal process has helped alleviate concerns of the reviewers, and now all reviewers agree on acceptance.

    I encourage the authors to use the feedback to improve the CR with the comments from the reviewers, and in their presentation at the conference.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal process has helped alleviate concerns of the reviewers, and now all reviewers agree on acceptance.

    I encourage the authors to use the feedback to improve the CR with the comments from the reviewers, and in their presentation at the conference.



back to top