Abstract

Deep image registration has demonstrated exceptional accuracy and fast inference. Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner. However, due to the cascaded nature and repeated composition/warping operations on feature maps, these methods negatively increase memory usage during training and testing. Moreover, such approaches lack explicit constraints on the learning process of small deformations at different scales, thus lacking explainability. In this study, we introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales, utilizing the wavelet coefficients derived from the original input image pair. By exploiting the properties of the wavelet transform, these estimated coefficients facilitate the seamless reconstruction of a full-resolution displacement/velocity field via our devised inverse discrete wavelet transform (IDWT) layer. This approach avoids the complexities of cascading networks or composition operations, making our WiNet an explainable and efficient competitor with other coarse-to-fine methods. Extensive experimental results from two 3D datasets show that our WiNet is accurate and GPU efficient. Code is available at \url{https://github.com/x-xc/WiNet}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2384_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2384_supp.pdf

Link to the Code Repository

https://github.com/x-xc/WiNet

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Che_WiNet_MICCAI2024,
        author = { Cheng, Xinxing and Jia, Xi and Lu, Wenqi and Li, Qiufu and Shen, Linlin and Krull, Alexander and Duan, Jinming},
        title = { { WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work proposed a model-driven multi-scale registration network by embedding the discrete Wavelet transform as prior knowledge.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work introduced a differentiable DWT layer in image registration network. Authors claimed this work can achieve comparable accuracy with related works while using less memory.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Limited novelty, paper [13] introduced discrete Fourier transform (DFT) in image registration network, but the indicators (such as Dice, HD, GPU training memory etc.) of this work did not significantly exceed those in [13], lacking a thorough comparison with [13]. And please descript the motivation for using wavelet transforms instead of Fourier transforms.

    [13] Jia, X., Bartlett, J., Chen, W., Song, S., Zhang, T., Cheng, X., Lu, W., Qiu, Z., Duan, J.: Fourier-net: Fast image registration with band-limited deformation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 1015– 1023 (2023)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Major: In related work, [13] is most closely related to this work, please conduct a comprehensive comparison experiment with [13] and clearly state the advantages and differences of this work compared to [13].

    1. Currently, this work only provides experimental results when C=32. At C=32, the performance metrics such as Dice and HD are almost the same as [13], and the GPU training memory is slightly lower than [13]. This advantage is not significant, or the current experimental results do not show a significant advantage. Please provide the curves showing changes in Dice and HD under different GPU training memory configurations to demonstrate that this work can effectively reduce memory usage.
    2. The introduction mentioned that [13] only used the low-frequency information of the DFT, whereas this paper uses both high and low-frequency information of the DWT. 2.1 Is DWT more advantageous in extracting high-frequency information compared to DFT? Please describe the motivation for using DWT instead of DFT, and use experiments to prove it. 2.2 Why introduce high-frequency information? Compared to only introducing low-frequency information, does the simultaneous introduction of both high and low frequencies significantly increase accuracy? Please demonstrate this with ablation experiments.

    Minor In Figure 1, the forward propagation flow is unclear. Are there any skip connections between the encoder and decoder?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of this work is limited, and the experiments have not been demonstrated that this work has a significant advantage over related work [13] in terms of accuracy or GPU training memory.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The experimental metrics are not significantly improved compared to [13]. And please elaborate your motivation for using DWT instead of DFT in your paper.



Review #2

  • Please describe the contribution of the paper

    The paper proposes inserting wavelet transforms and inverse wavelet transforms in a standard pyramidal registration U-Net.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written, and compares to a good choice of state of the art methods. The method has significantly less memory consumption than some of the compared methods.

    The paper compares performance on brain and cardiac registration, which is a strong choice of problems and shows the generality of the method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main reason for the reduced memory consumption compared to voxelmorph is that voxelmorph has a 64 channel and numerous 16 channel feature maps at full resolution and the proposed approach wisely chooses not to do this, but this does not seem closely connected to the wavelet layers.

    An ablation directly replacing the wavelet layers with standard learnable stride 2 3x3x3 convolutions or transpose convolutions (comparing memory consumption and performance) is missing, which would have made this clear (or possibly would have proved me wrong and that the wavelet feature is actually brilliant).

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    This paper would especially benefit from releasing code, as sections 2.1 and 2.2 are necessarily dense and examining the code would allow the reader to verify that their understanding of the paper is correct.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It is not obvious to me why it is important that the DWT layer is implemented differentiably, as since it is applied directly to the images, I cannot see any weights upstream of it for gradients to propagate to. This would be clarified by released code. It is of course crucial that the iDWT layer be differentiable, and this contribution is appreciated.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I am not sold that the wavelet transform is the future of 3-D U-Nets, but the presentation is solid and the approach appears novel.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors introduce WiNet, a wavelet-based registration model designed to enhance explainability and efficiency relative to cascade or pyramid-based registration methodologies. They evaluate the registration performance using DICE, HD, and Jacobian determinant metrics on two datasets, surpassing 13 tested baselines. Additionally, they illustrate the GPU efficiency of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Facilitation of multi-frequency multi-scale Registration: The integration of Discrete Wavelet Transform (DWT) and an incremental module enables the model to account for multi-frequency and multi-scale deformation estimations.
    • Memory efficiency: The incorporation of a parameter-free DWT layer enhances the model’s training efficiency (compared with cascade or pyramid-based registration methods).
    • Comprehensive validation: The authors rigorously validate the proposed method on two datasets and compare it against 13 methods from the existing literature.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Absence of discussion on explainability: While the authors assert that the proposed model enhances explainability, they do not provide experimental evidence to support this claim. - Lack of clarity: Parameter “C” is not adequately explained, and the rationale behind making a parameter-free layer “differentiable” is not articulated.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    ● Summary: “The authors introduce WiNet, a wavelet-based registration model designed to enhance explainability and efficiency relative to cascade or pyramid-based registration methodologies. They evaluate the registration performance using DICE, HD, and Jacobian determinant metrics on two datasets, surpassing 13 tested baselines. Additionally, they illustrate the GPU efficiency of the proposed method.” ● Strength: “- Facilitation of multi-frequency multi-scale Registration: The integration of Discrete Wavelet Transform (DWT) and an incremental module enables the model to account for multi-frequency and multi-scale deformation estimations.

    • Memory efficiency: The incorporation of a parameter-free DWT layer enhances the model’s training efficiency (compared with cascade or pyramid-based registration methods).
    • Comprehensive validation: The authors rigorously validate the proposed method on two datasets and compare it against 13 methods from the existing literature. ● Weaknesses: “- Absence of discussion on explainability: While the authors assert that the proposed model enhances explainability, they do not provide experimental evidence to support this claim. - Lack of clarity: Parameter “C” is not adequately explained, and the rationale behind making a parameter-free layer “differentiable” is not articulated.”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The incorporation of wavelet transform into deep learning registration presents a novel concept to me, and the experimental validation appears comprehensive, encompassing two large-scale datasets and comparison with 13 existing methods.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all the reviewers for their valuable feedback. Below we address their concerns.

R1-Q1: The improvement over [13] is not significant and WiNet lacks a thorough comparison to [13].

We compared our WiNet with [13] on two 3D datasets, IXI and 3D-CMR. As shown in Table 1 and Fig. 3, our WiNet-diff outperforms [13] on IXI by 0.4% in DICE while using 9% less memory. On 3D-CMR, our method also outperforms [13] by 0.5% in DICE and 0.22mm in HD with 4% less memory. Additionally, our WiNet has 66% fewer parameters than [13] (1,418,573 vs 4,198,352). These results demonstrate that the improvements brought by our WiNet are significant.

R1-Q2.1: Why is DWT more advantageous over DFT in extracting high frequencies?

The DFT provides a global, full-resolution analysis in the frequency domain, making it straightforward to extract low-frequency signals using central patches or crops, as shown in [13]. However, learning high frequencies with DFT is challenging because, unlike low frequencies that can be centralized in the frequency spectrum, high frequencies are dispersed and not confined to a rectangular area. In contrast, the DWT naturally decomposes images into both low and high frequencies across different sub-bands. These sub-bands have lower spatial resolution and retain the overall structural information. Consequently, the decomposed DWT coefficients can be seamlessly integrated into a network without any loss of frequency information.

R1-Q2.2: Why should high-frequency information be introduced?

Displacement fields are not necessarily smooth, therefore, low-frequency signals alone may not be able to fully and accurately reconstruct the entire displacement field. Our WiNet theoretically overcomes this problem by preserving both low-frequency and high-frequency information of the displacement field.

R1-Q3: Figure 1. Our network has skip connections.

R3-Q1: Reproducibility. We will make our code and trained models public, as stated in the abstract.

R3-Q2: Replacing DWT with standard learnable stride convolutions.

Compared to the DWT layer, 1) a conv layer with a stride of 2 leads to loss of image information, resulting in less accurate registration performance, and 2) the conv layer introduces additional parameters and increases GPU memory usage.

R3-Q3: Why is the DWT layer differentiable?

Both DWT and IDWT layers are differentiable. Although it is feasible to apply DWT as a pre-processing step, a better way is to build a general end-to-end framework without any pre/post-processing. Additionally, popular wavelet libraries, like PyWavelets, only support CPU. Our DWT layer supports 3D GPU computation, which significantly increases speeds, and provides a general tool for the research community.

R4-Q1: Explainability.

Black-box methods like [4,6] take the images as input and predict displacements directly without considering the immediate mechanism. In contrast, displacement fields in WiNet are incrementally composed, as shown in Fig. 2. Conv-0 learns the 1/8 eight sub-bands, which can be composed as 1/4 low-frequency coefficients. Sequentially, RB-1 uses 1/8 high-frequency to incrementally learn high-frequency at 1/4 scale, while RB-2 uses 1/4 high-frequency to incrementally learn high-frequency at 1/2 scale. Therefore, this entire learning process is explainable, and the method of composing deformation is also explainable due to the properties of DWT & IDWT.

R4-Q2: Explain parameter “C” and why making a parameter-free layer “differentiable”?

“C” controls the model size, which is a trade-off between performance and efficiency. Our WiNet uses a U-Net backbone similar to that in [13, 25]. When C=32, our WiNet presents a similar configuration with the recommended architecture in [13, 25]. As responded in R3-Q3, the IDWT should be differentiable since it needs backpropagation when training. Although the first DWT layer is not necessarily differentiable, we make it so to provide a general 3D GPU DWT tool.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers recommended WA/A after the authors’ rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers recommended WA/A after the authors’ rebuttal.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers all agree on acceptance, post-rebuttal. There is lingering concern about novelty, and I advise the authors to articulate the novelty in the CR and in their poster/presentation – what is the key insight over [13] what your paper provides?

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers all agree on acceptance, post-rebuttal. There is lingering concern about novelty, and I advise the authors to articulate the novelty in the CR and in their poster/presentation – what is the key insight over [13] what your paper provides?



back to top