Abstract

Current deep learning approaches in medical image registration usually face the challenges of distribution shift and data collection, hindering real-world deployment. In contrast, universal medical image registration aims to perform registration on a wide range of clinically relevant tasks simultaneously, thus having tremendous potential for clinical applications. In this paper, we present the first attempt to achieve the goal of universal 3D medical image registration in sequential learning scenarios by proposing a continual learning method. Specifically, we utilize meta-learning with experience replay to mitigating the problem of catastrophic forgetting. To promote the generalizability of meta-continual learning, we further propose sharpness-aware meta-continual learning (SAMCL). We validate the effectiveness of our method on four datasets in a continual learning setup, including brain MR, abdomen CT, lung CT, and abdomen MR-CT image pairs. Results have shown the potential of SAMCL in realizing universal image registration, which performs better than or on par with vanilla sequential or centralized multi-task training strategies. The source code will be available from https://github.com/xzluo97/Continual-Reg.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0150_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0150_supp.pdf

Link to the Code Repository

https://github.com/xzluo97/Continual-Reg

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_Toward_MICCAI2024,
        author = { Wang, Bomin and Luo, Xinzhe and Zhuang, Xiahai},
        title = { { Toward Universal Medical Image Registration via Sharpness-Aware Meta-Continual Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed to combine Meta Experience Replay (MER) with Sharpness-Aware Minimization (SAM) to address the problem of catastrophic forgetting in training a universal medical image registration network.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method addresses the problem of continual learning when training a universal image registration network with multiple datasets from different domains.
    2. Experiments are done on OASIS brain T1 MRI, Abdomen inter-patient CT-CT, Lung intra-patient CT-CT and Abdomen intra-patient MR-CT dataset
    3. Multiple Continual Learning methods are compared
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The use of notation is confusing. In Alg.1, what is the definition of j? Is it the index within the batch?
    2. In Alg.1, it seems like it only draws ‘1’ samples from the memory buffer to be trained alongside with ‘s’ current samples (current task). In the original MER paper, the batches includes ‘k-1’ random samples from the buffer and ‘1’ samples from current. I wonder whether ‘1’ memory sample will be enough to prevent forgetting of all the previous task.
    3. The backward transfer (BWT) in evaluation metrics should be “backward interference”. Referring to the MER paper, transfer-interference trade-off is generalization-forgetting trade-off.
    4. Lack of explanation of the results. The numbers in Table 1 is hard to understand in the first sight. What is multi-task learning? Is it in each iteration pairs from all tasks are concatenated as a batch of data? It is good that other previous continual learning methods are compared but sadly no details of them are introduced, thus making the rows of these methods in the table less informative.
    5. The proposed method does not stand out compared to multi-task training. Besides, the metric of the smoothness of the deformation field is missing.
    6. For Abdomen MR-CT, LNCC is used as the similarity loss. I wonder whether LNCC is suitable for registration across modalities. MER paper: Riemer M, Cases I, Ajemian R, Liu M, Rish I, Tu Y, Tesauro G. Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910. 2018 Oct 29.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    All dataset are open-sourced and public. Peuso-code is provided, and the implementation of sharpness-aware minization (SAM) can be easily found on github.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The following points corresponds to the previous mentioned weakness

    1. Make clearer definition of the notations used in the paper.
    2. Elaborate more about the MER in the training procedure.
    3. Clarify when and how is the metric Backward Interference computed, especially regarding the proposed method SAMCL. How do you measure the forgetting of task caused by training with other tasks.
    4. And 5. Include the metric of the smoothness of the deformation field (e.g. the non-positive Jacobian determinant, standard deviation of log JacDet).
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method brings in the interesting topic of continual learning when training a universal registration network. However, the paper does not manage to get the reader to a sound conclusion that the proposed SAMCL solves the forgetting problem. Therefore, my decision is ‘reject’. But I will highly recommend the author to reformulate the paper in a more proper way and present to the registration community in the future.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Thanks for the author’s replies. I think the author’s answers to Q1 and Q3 contradict each other a bit. The benefits of CL over MTL lie in the avoidance of centralizing data from different centers. However, the method tries to achieve universal 3D registration and, indeed, uses a centralized dataset with multiple public datasets. Overall, I agree that SAMCL is a good start in universal 3D registration with Continual Learning setup. Therefore, I raise my score from 2 to 3.



Review #2

  • Please describe the contribution of the paper

    The paper introduces a universal model for image registration tasks that simultaneously adapts to new, unseen data and avoid catastrophic forgetting. In addition to this being a highly significant problem, methodological innovation uses sharpness-aware meta-continual learning (SAMCL) to demonstrate effective results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • Experimental setup is rigorous by using 4 public datasets from the Learn2Reg challenges encompassing different anatomies and image modalities (OASIS brain MRI, Abdomen CT, NLST lung CT, and Abdomen MR-CT). • Evaluation is performed with respect to single-task models, multi-task learning using a single model, sequential learning over all tasks, and 4 other contrastive learning (CL) approaches. • The proposed approach outperforms 3 of 4 benchmark CL approaches, which shows the benefit of continual learning. • Ablation study shows the effect the different memory sizes (to preserve knowledge) and the inclusion of the SAM benefits training.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • Performance of the proposed approach is substantially lower than single-task and multi-task models in some cases while marginally better in others (Table 1), which limits the utility and impact of universal models in their current form. • Registration results are not substantially different than the benchmark MER CL approach (Table 1).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Public datasets are used for training and validation. The dataset and imaging information used in this study are well described. Code will be made available upon publication.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Sec. 2: While this is beyond the scope of this paper, in the future it would be interesting to test how the choice of backbone network effects model performance.

    Sec. 3: CT image intensities are clipped to a fixed range, but how is intensity normalization handled for MRI? Are these intensity values harmonized across MR and CT imaging?

    Sec. 3: The sentence “SAMCL performed comparable to independent training and centralized multi-task learning” may be a bit generous considering the results presented in Table 1 where the Independent and Multi-task methods substantially outperform the proposed approach across the population of the test set. You might consider relaxing this statement and instead focus on that the proposed approach begins to approach these other methods.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea for a universal registration is very exciting and represents significant novelty. However, overall enthusiasm is lessened due to the overall registration performance continuing to be below task-specific registration results. Overall, the idea is interesting, and represents early progress to address this task.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Thank you for your responses to reviewer critiques. I have raised my score to a 5 (Accept) in support of this work because it makes initial, impactful steps towards addressing a significant problem in biomedical image analysis - creating a universal model for registration. Even though not all reviewer questions can be addressed in full detail, given the space constraints of MICCAI submissions, I think the authors did a reasonable job to support their work through the experiments provided in the manuscript. Overall, this work would be interest to the MICCAI community.



Review #3

  • Please describe the contribution of the paper

    The paper addresses the goal of “Universal Medical Image Registration”, i.e. to create a model that can register different anatomies in different modalities. Towards that end, authors propose a continual learning approach, which is well suited to deal with the multi-center data issue, i.e. the difficulty to share data among different centers. The proposed method is based on the Sharpness-aware minimization concept [5], and it is extended to the continual learning framework. Experiments on 4 datasets show the advantages and limitations of the proposed approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the integration of the sharpness aware minimization in the continual learning is very interesting, as it seeks to find only one local minimum, but rather a plateau of good solutions

    • the fact that different modalities and different anatomies are addressed together in a learning based framework is a big plus.

    • the ablation on the buffer size is very informative (Fig 3 left)

    • the paper is well written and easy to follow

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The impact of the order in which datasets are observed is not discussed. If the accuracy varies depending on the order in which datasets are seen, this would diminish the claims of “stability in learning”.

    • no classical registration algorithm (such as DEEDS as in [27]) is used for comparison. It would be good to report these values to have another baseline.

    • the experimental setting (train/val/test) is not very detailed. See detailed comments.

    • links to the federated learning literature should be established, as “Federated learning (FL) allows multiple institutions to collaboratively develop a machine learning algorithm without sharing their data.”

    • a focus discussion on the comparison to MER [21] would help to understand its key differences. Results are relatively similar.

    • supplementary material could provide richer qualitative results, only on image reported in Fig 2.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Although the pseudo code of the algorithm is clearly described, reproducing the results without the code would be difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Experimental setting: how are the hyperparameters tuned? It is not clear in the text if there is a validation set used for parameter tunning, or if the values were set empirically. For instance, Fig 3a and 3b, are these values from the test set? As the size of the buffer is an hyperparameters, they should be reported on the validation set.

    • In Tab 1: what does multi-task stand for?

    • The paper mentions that no data augmentation was used for a fair comparison. It is not clear to the reader why using data augmentation (for all approaches) would hinder the comparison.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the methodology is a combination of existing methods (SAM applied to the CL context), the work merits to be published as it provides a rather simple solution to a difficult problem. Some details about the evaluation and baseline comparisons would be most welcome in a rebuttal. The fact that DICE scores are still very low (in [27] for instance) show that the registration problem is far from being solved.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    After reading the other reviews and the rebuttal I vote for accepting the paper. It tackles an important problem - universal registration - and lays down a very good baseline for the community to improve on it. Although some major questions were not addressed in the rebuttal (e.g. order of datasets in training) it would be good to mention them in the final paper if it gets accepted.




Author Feedback

We sincerely thank all the reviewers for providing valuable comments and suggestions. Our response is as follows.

Q1: Regarding the performance comparison between SAMCL and multi-task learning. (R1, R5)

  • We present an initial step towards universal 3D registration in a continual learning (CL) setup, which is more realistic and challenging than multi-task learning (MTL) due to the difficulty of centralizing data from different centers and the problem of distribution shifts in real-world clinical scenarios. MTL realizes universal registration through centralizing all tasks into a single dataset and is free from the forgetting problem. Therefore, the performance of MTL can serve as the upper bound of CL methods in terms of average performance of all tasks, also in the CL research community (see GPM[24] paper for reference). In Tab1, we show that SAMCL closely approached MTL on the first three tasks with 0.046, 0.006 and 1.193 (TRE) performance gaps. Since the usage of SAMCL as a new training strategy is parallel to the choice of network backbones and training losses, the proposed SAMCL provides a potential solution for universal 3D registration applied in real-world medical environments, and could serve as a baseline for future works on this topic.

Q2: Regarding the difference between MER and SAMCL, and their performance comparison. (R1, R3, R5).

  • We would like to emphasize that universal registration methods should have good generalizability on unseen tasks. This motivates us to integrate sharpness-aware training with MER in SAMCL. While both SAMCL and MER obtained similar performance in mitigating forgetting, as shown in Fig. 3(b), SAMCL performed consistently better in terms of in- and out-of-distribution generalization compared to MER. In addition, SAMCL differs from MER in its sampling approach: SAMCL draws one sample from the buffer and more samples from the current task, whereas MER uses k-1 buffer samples (see Alg.1). Our empirical findings suggest that one buffer sample is sufficient to mitigate forgetting in SAMCL, while additional current data samples enhance performance on the current task.

Q3: The paper does not manage to get the reader to a sound conclusion that the proposed SAMCL solves the forgetting problem. (R5)

  • The purpose of this paper is not to reach a sound conclusion that SAMCL solves the forgetting problem, but rather to provide an initial attempt to achieve universal 3D registration in a CL setup and mitigate forgetting. As shown in Tab1, SAMCL significantly reduced forgetting compared to vanilla sequential learning, and performed better or on par with existing CL methods.

Q4: Regarding the details on the experimental settings, e.g. preprocessing, hyperparameter tuning, and evaluation metrics (R1, R3, R5).

  • R1: All images were linearly normalized to [0,1] as network input (for CT images this was after intensity clipping).
  • R3: The meta-learning rate and the deformation regularization coefficient was chosen empirically, while the buffer size and the \rho parameter of SAM was set to produce the best performance on the validation set. Results in Fig. 3a were reported on the validation set while Fig. 3b on the test set.
  • R3: Random data augmentation will result in different intensity distribution and appearance of the retrieved data of the memory buffer, which may lead to an unfair comparison of those memory-based CL methods. Therefore, no data augmentation is utilized in our experiments.
  • R5: The definition of j in Alg.1 is the index of data batch in total s batches. (See the Input Part and line 6 in Alg.1)
  • R5: Backward Transfer (BWT) is widely used in CL research, which shares the same definition with Backward Interference in MER paper. The detailed definition of BWT has been shown in Eq. (5).
  • R5: LNCC is a local contrast-invariant similarity metric and it was found to work on the Learn2Reg MR-CT registration by some challenge entries (e.g. see [9] the entry named PIMed).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers raised final ratings after the authors’ rebuttal. The average score of the recommendation leans towards accept (A/A/WR).

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers raised final ratings after the authors’ rebuttal. The average score of the recommendation leans towards accept (A/A/WR).



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviews and rebuttal were quite involved, with a lot of feedback and adjusting of scores. This is great, it’s what we want to see in research.

    Overall, there is definitely enough support to present the ideas in this work at MICCAI. I recommend acceptance.

    The reviewers have lingering concerns and I encourage the authors to clarify in text as much as they can in the CR.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviews and rebuttal were quite involved, with a lot of feedback and adjusting of scores. This is great, it’s what we want to see in research.

    Overall, there is definitely enough support to present the ideas in this work at MICCAI. I recommend acceptance.

    The reviewers have lingering concerns and I encourage the authors to clarify in text as much as they can in the CR.



back to top