Abstract

Deformable image registration is fundamental to many medical imaging applications. Registration is an inherently ambiguous task often admitting many viable solutions. While neural network-based registration techniques enable fast and accurate registration, the majority of existing approaches are not able to estimate uncertainty. Here, we present PULPo, a method for probabilistic deformable registration capable of uncertainty quantification. PULPo probabilistically models the distribution of deformation fields on different hierarchical levels combining them using Laplacian pyramids. This allows our method to model global as well as local aspects of the deformation field. We evaluate our method on two widely used neuroimaging datasets and find that it achieves high registration performance as well as substantially better calibrated uncertainty quantification compared to the current state-of-the-art.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1433_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1433_supp.pdf

Link to the Code Repository

https://github.com/leonardsiegert/PULPo

Link to the Dataset(s)

https://sites.wustl.edu/oasisbrains/home/oasis-1/ https://www.med.upenn.edu/cbica/brats-reg-challenge/

BibTex

@InProceedings{Sie_PULPo_MICCAI2024,
        author = { Siegert, Leonard and Fischer, Paul and Heinrich, Mattias P. and Baumgartner, Christian F.},
        title = { { PULPo: Probabilistic Unsupervised Laplacian Pyramid Registration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a pyramid deep learning structure for probabilistic unsupervised registration. The results show that the proposed method produces uncertainty that has higher correlation with image intensity variance than VoxelMorph.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Pyramid structure in probabilistic unsupervised registration.

    2. Comparison with VoxelMorph in terms of registration performance and uncertainty estimation.

    3. Non-hierarchical ablation of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Insufficient literature research.

    2. Flaws in prior modeling.

    3. Flaws in network design.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors propose a pyramid deep learning structure for probabilistic unsupervised registration. The results show that the proposed method produces uncertainty that has higher correlation with image intensity variance than VoxelMorph. However, there are some fundamental flaws in the method and network design.

    Major comments:

    1. In introduction, “Dalca et al. [8,9] is the only work exploring uncertainty quantification in this context”. This is not true. The authors should do a better literature search and find a list of work on uncertainty quantification in deep learning based registration.
    2. In Eq. 2, the covariance matrix in the Gaussian prior p(z) is assumed an identity matrix, which neglects the fact that deformation field should be spatially smooth. The authors can refer to Dalca et al. [8,9] in the manuscript reference, where the covariance matrix is modeled as a graph Laplacian matrix to encourage the spatial smoothness of the deformation field.
    3. In Fig. 2, if z is already a deformation field, as the authors define in the method section, why does z need to go through another network to obtain velocity field and then again a new deformation field? This makes the uncertainty modeling in Eq. 1 to Eq. 3 useless.
    4. In training, how do the authors downscale the image and calculate loss at different resolution. This is quite important, because if the image is not properly downsampled, aliasing may occur and this will affect the loss calculation.
    5. It is not clear how both NCC_{VX} and NCC_{LM} are calculated based on the description in the manuscript.
    6. In Table 1, although the results show that the proposed method produces uncertainty that has higher correlation with image intensity variance, the registration performance drops. The authors do not explain clearly how this uncertainty measurement useful in practice, that should compromise the registration performance.

    Minor comments:

    1. In Eq. 2 the second equation, do the authors miss a term q(z_{L-1} m,f) in the probability factorization?
    2. In Eq. 3, “min” should be “argmin”. The authors should also specify what parameters the function is optimized over. Moreover, in the left hand side, KL(p   p), should the first p be q?
    3. In Eq. 4, what is the value of \gamma and what is the value of w_l?
    4. Fonts in Fig. 3 can be larger.
    5. In Supplementary Eq. 1, back parentheses and brackets are missing in the first line left hand side.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There are some fundamental flaws in the method and network design.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors provide a satisfactory explanation about the latent variables z.



Review #2

  • Please describe the contribution of the paper

    The authors have proposed a multi-scale probabilistic approach to address the problem of deformable image registration, with the capability to estimate uncertainty.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is, as they mentioned in the abstract, is the multi-scale probabilistic approach for unsupervised registration. The authors later show how this approach can help measure uncertainty and the high uncertainty region matches the non-common parts of the moving and fixed image. Furthermore, the paper is easy to follow and well written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses are as follows:

    The authors claim that Dalca et al. is the only work exploring uncertainty quantification in this context; however, these works are very related or exactly on this topic:

    • Grzech, Daniel, et al. “Uncertainty quantification in non-rigid image registration via stochastic gradient Markov chain Monte Carlo.” Machine Learning for Biomedical Imaging 1.UNSURE2020 special issue (2021): 1-25.

    • Smolders, A., Lomax, T., Weber, D., Albertini, F., 2022. Deformable image registration uncertainty quantification using deep learning for dose accumulation in adaptive proton therapy, in: Biomedical Image Registration: 10th International Workshop, WBIR 2022, Munich, Germany, July 10–12, 2022, Proceedings, Springer. pp. 57–66.

    • Luo, J., Sedghi, A., Popuri, K., Cobzas, D., Zhang, M., Preiswerk, F., Toews, M., Golby, A., Sugiyama, M., Wells, W.M., et al., 2019. On the applicability of registration uncertainty, in: MICCAI19, Springer. pp. 410–419.

    • Xu, Z., Luo, J., Lu, D., Yan, J., Frisken, S., Jagadeesan, J., Wells III, W.M., Li, X., Zheng, Y., Tong, R.K.Y., 2022. Double-uncertainty guided spatial and temporal consistency regularization weighting for learning-based abdominal registration, in: MICCAI22, Springer. pp. 14–24.

    • Chen, J., Frey, E.C., He, Y., Segars, W.P., Li, Y., Du, Y., 2022b. Transmorph: Transformer for unsupervised medical image registration. Medical Image Analysis 82, 102615.

    Furthermore, regarding the results in Table 1:

    In the specific case of registration of OASIS images (please check the Learn2Reg challenge), the leaderboard for DSC is now at 91%. Therefore, a score of 77% is not good at all.

    Additionally, you have a large number of voxels with negative Jacobian determinant compared to the baseline, so the Displacement Deformation Field (DDF) is not even smooth.

    Also, what is the standard deviation for the values that are reported? Is the improvement for NCC_LM or NCC_VX a significant one over the NH version?

    Regarding the results reported in Figure 3, what is voxel variance, and what is the range of the uncertainty values that are shown in the figure?

    And finally, what do the authors mean by global uncertainty, and how is that modeled?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The architecture is not difficult to reproduce and if all the hyper parameters are known, it should be possible to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    There are a couple of points to work on, first is the clinical motivation behind this uncertainty estimation, and it is necessary to differentiate if the estimated uncertainty shows epistemic or aleatoric uncertainty. Next is about the results. As they have shown, the baseline outperforms in terms of accuracy metrics, and even the baseline itself has been beaten by many methods afterwards. Considering the results, and the fact that the novelty in the network architecture is not groundbreaking, leads to the conclusion that authors need to work on improving their methodology. In addition, it is necessary to see the performance of the model, when the uncertainty is low, meaning there are no non-corresponding tissue between the two images, e.g. when testing the model on images of two healthy subjects. In such a scenario, it is expected that the model would outperform the baseline.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is similar to the constructive comment section. Firstly, the paper should clarify the clinical relevance of the uncertainty estimation. It’s important to specify whether the uncertainty measured is epistemic or aleatoric. Regarding the results, the baseline model has shown better accuracy, and several subsequent methods have already surpassed this baseline. This suggests that the novelty of the network architecture is limited and that further improvements in the methodology are necessary. Additionally, it would be useful to evaluate the model’s performance in scenarios where the uncertainty is low, such as when comparing images from two healthy subjects. In these cases, it’s expected that the model should perform better than the baseline. And finally, the authors did not do a deep search regarding existing methods, which worked on a very similar problem.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper tackles uncertainty quantification in registration algorithms by using a probabilisitc unsupervised learning-based approach. They showed competitive performance in registration while present really nice calibration and quantification of the uncertainty at the image level.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Really nicely written, self-contained paper with clear goals, contribution and methodology. Experiments are well designed and explained.

    (2) Novel formulation for uncertainty quantification in 3D medical imaging registration.

    (3) Nice derivation of the principle of Laplacian pyramids to combine the deformation fields from different resolutions (i.e. each resolution provides incremental displacements) using neural networks.

    (4) Strong qualitative and quantitative evaluation against the only competitor in the field.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors mention a method developed by Krebs et. al., [13] that could be used for uncertainty quantification even if the authors did not use it to this end. Due to the shortage of competitors (and novelty of the formulation), comparing with similar methods such as the one mentioned would be good.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I only have two minor comments (1) the gamma value in page 5 is unset (maybe gamma/sigma_I = 0.25? (2) a reference to the supplement in the main text would be great.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty of the formulation that also shows really nice results for the task of uncertainty quantification, an underexplored but important task in medical image registration.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    I’m keeping the same opinion. I agree with some additional comments of the other reviewers; nonetheless the authors have properly justified their decisions and/or will modify part of the text accordingly.




Author Feedback

We thank the reviewers for their valuable feedback. We are pleased that they found the paper well-written [R3, R4], our multi-scale probabilistic registration approach novel [R3, R4], and the qualitative and quantitative evaluation strong [R3].

We note that R1’s recommendation may stem from misunderstandings of our method, which we will clarify below and in the paper. We further stress that the two main contributions of our work are: 1) demonstrating that Diffeomorphic VoxelMorph (DIF-VM), likely the most widely used learning-based registration technique, produces highly uncalibrated and pathological uncertainty estimates; and 2) introducing a novel method (PULPo) that provides significantly better uncertainty quantification while maintaining similar registration performance.

The main contribution lies in the uncertainty quantification not the registration performance. We are convinced that improved uncertainty quantification for DL-based registration is a crucial research avenue.

[R1] Flaws in prior modeling and network design The latent variables z are not velocity fields as stated in the review. They are transformed by several network layers to obtain a velocity field, as shown in Fig. 2 and described in Sec. 2.1. Using a diagonal covariance matrix for the latents is a common simplification in VAE-based techniques. We did explore a non-diagonal covariance matrix as in DIF-VM, but it led to worse registration and calibration performance. Instead, we regularize our deformation fields using an external diffusion regularizer.

[R1] Unclear calculation of NCC_{VX} and NCC_{LM}​ The calculation is described in the second paragraph of Sec. 3.2. We do not calculate the NCC of the uncertainty with the image intensities, which would not make sense. We calculate the NCC between the errors and the uncertainties. We will clarify a potentially ambiguous sentence at the end of Sec. 3.2.

[R1, R4] Incomplete related work section We acknowledge our related work section was incomplete and thank R4 for suggesting additional papers. We will discuss these works in the final version of the paper if accepted. Many of these papers build on DIF-VM, underscoring the need to address its flaws. Interestingly, Grzech et al. confirm our findings that the uncertainty magnitudes of DIF-VM are very small and underline “the need for further research into calibration of uncertainty estimates for image registration methods based on deep learning”.

[R1, R4] Performance vs uncertainty, OASIS results worse than Learn2Reg leaderboard Our registration results are not directly comparable to the L2R leaderboard due to different data splits. This work primarily explores our hierarchical approach for uncertainty quantification, where we show substantial improvements. Optimizing for performance will be the focus of future work. However, our results show comparable performance to an established baseline technique, DIF-VM.

[R3] Experimental comparison to Krebs et al. We agree that more comprehensive empirical validation is important and will explore this in future work.

[R4] Standard deviation of NCC_{VX} and NCC_{LM}​ The standard deviations of our NCC scores will be reported in the final version.

[R4] Uncertainty ranges for Fig. 3 The uncertainty ranges (min, max) for DIF-VM/PULPo are: var(f): (0, 3.975e−6)/(0, 0.042) var(ϕ): (0, 3.371e−5)/(0, 1.407) The low max values for DIF-VM underscore the pathological nature of its uncertainty estimates. We will add a legend to this figure in the final version.

[R4] Performance of the model when uncertainty is low Our experiments on OASIS and BraTS show that our technique provides better-calibrated uncertainty scores in both low and high uncertainty settings. An evaluation on completely healthy images with lower uncertainty is an interesting addition for future work.

[R4] Epistemic vs aleatoric uncertainty Our method estimates aleatoric uncertainty. We will clarify this in the text.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper received widely different reviews, but I think it has enough support post-rebuttal that it should be accepted. There is insight here that can be presented at the conference. Unofruntately the more negative reviewer did not come back to a post-rebuttal opinion.

    I strongly encourage the authors to build on the thorough feedback from the reviewers, which is hard to get nowadays. This is good feedback to strengthen the CR version of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper received widely different reviews, but I think it has enough support post-rebuttal that it should be accepted. There is insight here that can be presented at the conference. Unofruntately the more negative reviewer did not come back to a post-rebuttal opinion.

    I strongly encourage the authors to build on the thorough feedback from the reviewers, which is hard to get nowadays. This is good feedback to strengthen the CR version of the paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    R1 changed their rating from Reject to Weak Accept after rebuttal. I must say that my opinion on this paper is high and in line with that of R3: I find the solution elegant and very worthy of discussion. I believe that the major issues raised by R1 were due to a misunderstanding of some of the point; namely, that the posterior’s variable does not represent the displacement field but some “latent” (although not in the VAE sense) representation of it, which is why it’s prior distribution does not assume smoothness. The reminaing issue with the paper is that it ignored recent uncertainty papers (importantly, Grzech et al), but the proposed method has enough novelty and elegance to be accepeted nonetheless.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    R1 changed their rating from Reject to Weak Accept after rebuttal. I must say that my opinion on this paper is high and in line with that of R3: I find the solution elegant and very worthy of discussion. I believe that the major issues raised by R1 were due to a misunderstanding of some of the point; namely, that the posterior’s variable does not represent the displacement field but some “latent” (although not in the VAE sense) representation of it, which is why it’s prior distribution does not assume smoothness. The reminaing issue with the paper is that it ignored recent uncertainty papers (importantly, Grzech et al), but the proposed method has enough novelty and elegance to be accepeted nonetheless.



back to top