Abstract

The primary goal of continual learning (CL) task in medical image segmentation field is to solve the “catastrophic forgetting” problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus on generating pseudo-labels for old datasets to force the model to memorize the learned features. However, the incorrect pseudo-labels may corrupt the learned feature and lead to a new problem that the better the model is trained on the old task, the poorer the model performs on the new tasks. To avoid this problem, we propose a network by introducing the data-specific Mixture of Experts (MoE) structure to handle the new tasks or categories, ensuring that the network parameters of previous tasks are unaffected or only minimally impacted. To further overcome the tremendous memory costs caused by introducing additional structures, we propose a Low-Rank strategy which significantly reduces memory cost. Fortunately, for task-level CL, we find that low-rank experts learned in previous tasks do not impair subsequent tasks but can assist. For class-level CL learning, we propose a gating function combined with language features, effectively enabling the model to handle multi-organ segmentation tasks in new and old classes. We validate our method on both class-level and task-level continual learning challenges. Extensive experiments on multiple datasets show our model outperforms all other methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1160_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1160_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Che_LowRank_MICCAI2024,
        author = { Chen, Qian and Zhu, Lei and He, Hangzhou and Zhang, Xinliang and Zeng, Shuang and Ren, Qiushi and Lu, Yanye},
        title = { { Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper focuses on the task of achieving continuous learning in medical image segmentation. On the one hand, a data-specific Mixture of Experts (MoE) structure is proposed to deal with the catastrophic forgetting problem in this task. On the other hand, Low-Rank strategy is proposed to deal with the computational cost and parameter overhead issues. The proposed method was verified through the segmentation results of continuous learning at the class level and task level respectively.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1、The design of the framework is clear and reasonable. 2、The results look positive compared to other methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The novelty of the paper is limited. The method proposed by the author uses the Mixture-of-Experts proposed in the paper [14], and uses the Low-Rank Adapter involved in the paper [8]. The authors lack to enumerate the differences between the proposed method and these methods and the uniqueness of transfer to continuous learning tasks.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The method is clearly described, so the paper should be reproducible in terms of implementation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1、The author can investigate and add more descriptions of related work on the MoE method in continuous learning tasks. 2、The author summarizes that currently in the continuous learning task of medical image segmentation, researchers usually use regularization-based and architecture-based methods. The method proposed by the author is more inclined to the architecture-based method, but the author only compared the regularization-based method in the quantitative analysis. Can you explain why the architecture-based method is not compared. 3、There is a lack of explanation for i in Eq.(1). 4、The author can add more details about the offline classifier. If the offline classifier already knows all tasks, will it affect the expansion of continuous learning? 5、There is a lack of explanation for some graphics in Fig. 3, such as graphics corresponding to matrix multiplication, etc. The “Emb” mentioned in the legend is not present in the figure. And the gray rectangular box lacks a description. In order to improve the readability of the paper, it is recommended to make modifications and additions. 6、The data division ratio of COVID-19 CT lacks description. And it should be explained why BTCV is divided into training and test sets, while LiTS is divided into training, verification, and test sets. 7、It is difficult to see the improvement brought by using only MoE without using low rank in Table 1.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to the strength(5), weaknesses (6), comments (10) for justification.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author has solved my problem.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a novel class/task incremental learning framework using pretrained base model and low-rank adapter to continually extend the model to new tasks/datasets with small memory cost and less forgetting.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper has a clear introduction and motivation for using low-rank MoE (LoRA) in continual segmentation with both lightweight and cost-efficient.

    • Although LoRA has been widely used in parameter-efficient finetuning (PEFT), but using LoRA to avoid forgetting and keep lightweighted parameter growth rate still has some level of novelty in continual learning.

    • The paper evaluates the method on both CIL and TIL, and also uses both 2D and 3D transformer-based segmentation network structures, which shows the generalizability of the method.

    • Task-level evaluation is comprehensive, which contains various continue orders and trainable parameter number.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The supplementary material exceeds 2-page limitation of the conference guideline, therefore the supplements are ignored during review to maintain the fairness. Please go through the guideline and make adjustment to the supps.

    • In the main method Eq 5, the LoRA parameters for new task is trained on the frozen pretrained weights and all the previous trained LoRA AB pairs. I’d like to know why not simply train each LoRA weights directly based on pretrained W0. LoRA should be able to adapt the base model to new tasks even without other previous LoRA AB pairs from other old tasks, which are not necessarily related to the new task. Ablation study of whether to use previous LoRA weights should be added to the experiments in order to better understand the effectiveness of these previous LoRA params.

    • The design of class-level experiment is too simply so that it cannot demonstrate the effectiveness and robustness of proposed method. Only two datasets are included, which is merely a 1-step continual segmentation, the dataset sequence are too short and not able to evaluate the potential forgetting rate and order sensitivity of the method. Furthermore, the two datasets are even from the same abdomen body parts, thus, when continually training on new task LiTS, although the previous organ labels of BTCV is not available, however, all these abdomen organs are actually appearing in LiTS, which makes the continual segmentation not challenge and easy to keep old knowledge from forgetting. The citation [11] of the paper actually shows that continually learning new organs from different body-parts are more challenge and prone to large forgetting. Based on my understanding of LoRA’s ability, I think it’s able to adapt the model to new datasets even from different body-parts or modalities, but I recommend the author following citation [11] experiment setting and evaluate the proposed method on some more complicated datasets, such as TotalSegmentatorV2 (117 whole-body organs), Flare22 (13 abdomen organs), StructSeg (22 head-neck organs), in order to better demonstrate the generalizability and robustness of the method.

    • The paper includes citation [11], which is the latest leading method for continual multi-organ segmentation and achieves high performance over 100 organs, but the author does not make comparison with this method in Table 2. The author should include more datasets in class-level evaluation and compares the performance and parameter increasing rate with [11] to demonstrate the advantage of low-rank MoE in complicated continual settings.

    • There is no details about the scalable matching-based offline classifier used in task-level gating. Please explain how this module works.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper uses public datasets, but not providing source code. Considering the easy implementation of LoRA modules, the reproducibility should be acceptable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Please adjust the supplementary material to match the 2-page limitation of the conference guideline.

    • The current 1-step class-level continual segmentation is not enough to evaluate the proposed method. Please add more comprehensive public datasets from different body-parts in class-level evaluation, e.g. TotalSegmentatorV2 (117 whole-body organs), Flare22 (13 abdomen organs), StructSeg (22 head-neck organs) to show the generalizability. Also, when there are longer dataset sequence, the author should try different learning orders to further demonstrate the order sensitivity.

    • Please compare with citation [11] in Table 2, as its the latest leading method in class-level continual segmentation.

    • Please further explain the reason to include all the frozen previous LoRA params when learning new LoRA on new task, especially considering that LoRA should be able to adapt to new task only using W0 without other old LoRAs. An ablation study with/without using previous LoRAs when learning new task may also help.

    • Please further explain the scalable matching-based offline classifier in task-level gating in method section. Currently there’s no way to follow this module.

    • The method sections is a little redundant and the results section is not enough in the main paper. Since adding LoRA to transformer layers is already widely used in PEFT and is easy to understand, I think the author can slightly reduce the length of method section, especially the equations 2-4, and save more space for results demonstration, as continual learning requires a large amount of experiments to evaluate its effectiveness and robustness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a novel continual segmentation framework with LoRA, and shows its effectiveness on both TIL and CIL settings. However, some unexplained parts of the method and the naive design of class-level continual segmentation setting reduces the experiments quality, which makes the paper on borderline. Therefore, I would recommend this work as “weak accept”.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks for the authors response. My concerns about sharing LoRA weights between different tasks has been solved. However, the short dataset sequence of class incremental learning experiment is still a weakness. The overall quality of the paper is above the borderline and I’d like to rate the paper as “weak accept”.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a Low-Rank MoE framework for continual learning to handle the new tasks or categories. Each low-rank expert is assigned to a specific task/category.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper is clear and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) There are no compared continual-learning methods in Table 1. It is confusing that the method proposed in [27] is designed for lesion segmentation while the authors regard it as the SOTA for the ACDC dataset. 2) In Table 2, the backbone of compared methods and the proposed Low-Rank MoE are not consistent, resulting in unfair comparison. 3) The authors mentioned for times that the details are available in the Appendix, which has not been uploaded. 4) Transformers typically require substantial data for training. However, BTCV possesses only 24 training data, which is unsuitable for training transformers. Why not employ convolutional networks for segmentation, please explain it.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) Please explain the reason for using transformers rather than convolutional networks. 2) Please provide compared results in Table 1 and re-implement the compared methods using the same backbone as this paper. 3) Please conduct experiments to discuss how the value of rank and hidden dimension with the low-rank layers influence the performance. 4) Please conduct an ablation study to discuss the contribution of the gating strategy. 5) Please provide additional results to elucidate the effects of changing the sequence of continual training.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1) Poor innovation and insufficient experiments. 2) Comparison is lacking in Table 1 and unfair in Table 2. 3) The Appendix mentioned in this paper is absent.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have adequately addressed my concerns. Therefore, I recommend accepting this paper.




Author Feedback

We thank all the reviewers for their time and comments on our manuscript.

R1,R4

Q1: The absence of supplementary material: A1: We have revised the supplementary material to ensure compliance and will include it in the updated submission.

R1,R3

Q1: Specific details of off-line classifier. A1: The classifier uses an image feature extractor (e.g., CLIP) to extract features from k randomly selected images per category as exemplars. During inference, the category is determined by the highest cosine similarity between the test image and the exemplar s of each category. Its high accuracy supports continuous learning expansion.

R1

Q1:More substantial experiments on class-level task. A1: Due to time constraints and unavailability of the StructSeg test set, we conducted preliminary evaluations on three datasets: TotalSegmentator->Flare22->BTCV, achiving Dice scores of 87.65, 83.22, and 81.3.

Q2:Why not compare the method of [11] in table2? A2: [11] lacks available code, preventing a short-term comparison. We plan to reproduce [11] and include comparisons in the revision.

Q3:Consider LoRA using only W0. A3:Our design aims to transfer useful knowledge across datasets. As shown in Table 1, the results show that sharing LoRA weights does not degrade performance on distinct datasets and can improve it for similar ones. Incorporating LoRA weights from ACDC and ISIC increased accuracy by 2% on the COVID-19 CT dataset (compared with using only w0), demonstrating the benefits of using LoRA parameters.

R3

Q1:Investigate more MoE methods on CL tasks and more details about the dataset. A1:We will add the following to the article. “In [1], the author proposes Lifelong-MoE, an extensible MoE that dynamically adds model capacity via adding experts with regularized pretraining. “ [1] “Lifelong language pretraining with distribution-specialized experts.” The COVID-19 CT dataset consists of 100 images, with 50% for train and 50% for test. The partitioning of the BTCV and LiTS datasets followed the guidelines specified by the original dataset authors.

Q2:Architecture-based methods to be compared. A2:Current architecture-based methods in CL are mainly focused on classification tasks. It would be inappropriate to migrate it to a segmentation task.

Q3:A lack of explanation for i in Eq.(1). A3:The letter i refers to pixels.

Q4:Experiments using only MoE w/o using low rank. A4:The results (with MoE w/o LoRA) on ACDC->ISIC->COVID-19 CT show improvements of 0.13%, 0.16%, and 0.55% over the single-task models listed in Table 1. These findings demonstrate MoE’s effectiveness and beneficial interactions across the datasets.

R4

Q1:The reason for using transformers rather than CNN. A1:The leading medical image segmentation networks mainly utilize transformers, guiding our choice based on established practices. We have developed a method to transfer weights, pre-trained on ImageNet, to 3D medical images.

Q2:Provide compared results in Table 1 and re-implement the ways using the same backbone in Table2. A2:We provide methods using the same SETR in Table1, such as on LwF, ILT and PLOP, our method outperforms the best of them about 5% average Dice. As presented in Table 2, all methods employ the same Swin-UNETR.

Q3:The lack of results of the rank value and the lack of changing the sequence of CL. A3:We observe that rank 4 yielded Dice scores of 90.05, 89.00, and 72.68 across the ACDC->ISIC->COVID-19 datasets. Rank 16 improved results to 92.06, 90.68, and 74.02. Changing the sequence of datasets to ISIC->ACDC->COVID-19 and COVID-19->ACDC->ISIC, the dice changes slightly (only about 0.1%), demonstrating robustness.

Q4:Conduct an ablation study of the gating strategy. A4:In the class-level task, we replaced the text guided way with a fully connected layer for gating. Compared with the original LoRA MoE, the slightly worse (2% average Dice) ablation experiment results reflect the effectiveness of the text guided gating way under the CLIP model.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers found that the work has a clear introduction and motivation for using low-rank MoE (LoRA) in continual segmentation, a reasonable framework, and positive results compared to other methods. The rebuttal successfully addresses most of the comments. This is interesting work and of interest to the MICCAI community; thus, I suggest acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers found that the work has a clear introduction and motivation for using low-rank MoE (LoRA) in continual segmentation, a reasonable framework, and positive results compared to other methods. The rebuttal successfully addresses most of the comments. This is interesting work and of interest to the MICCAI community; thus, I suggest acceptance.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All three reviewers agreed to accept this paper after the rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All three reviewers agreed to accept this paper after the rebuttal.



back to top