Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Deep segmentation networks achieve high performance when trained on specific datasets. However, in clinical practice, it is often desirable that pretrained segmentation models can be dynamically extended to enable segmenting new organs without access to previous training datasets or without training from scratch. This would ensure a much more efficient model development and deployment paradigm accounting for the patient privacy and data storage issues. This clinically preferred process can be viewed as a continual semantic segmentation (CSS) problem. Previous CSS works would either experience catastrophic forgetting or lead to unaffordable memory costs as models expand. In this work, we propose a new continual whole-body organ segmentation model with light-weighted low-rank adaptation (LoRA). We first train and freeze a pyramid vision transformer (PVT) base segmentation model on the initial task, then continually add light-weighted trainable LoRA parameters to the frozen model for each new learning task. Through a holistically exploration of the architecture modification, we identify three most important layers (i.e., patch-embedding, multi-head attention and feed forward layers) that are critical in adapting to the new segmentation tasks, while retaining the majority of the pre-trained parameters fixed. Our proposed model continually segments new organs without catastrophic forgetting and meanwhile maintaining a low parameter increasing rate. Continually trained and tested on four datasets covering different body parts of a total of 121 organs, results show that our model achieves high segmentation accuracy, closely reaching the PVT and nnUNet upper bounds, and significantly outperforms other regularization-based CSS methods. When comparing to the leading architecture-based CSS method, our model has a substantial lower parameter increasing rate (16.7\% versus 96.7\%) while achieving comparable performance.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0900_paper.pdf

SharedIt Link: https://rdcu.be/dZxdM

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_35

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0900_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zhu_LowRank_MICCAI2024,
        author = { Zhu, Vince and Ji, Zhanghexuan and Guo, Dazhou and Wang, Puyang and Xia, Yingda and Lu, Le and Ye, Xianghua and Zhu, Wei and Jin, Dakai},
        title = { { Low-Rank Continual Pyramid Vision Transformer: Incrementally Segment Whole-Body Organs in CT with Light-Weighted Adaptation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {371 -- 381}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper presents an efficient framework designed to integrate the segmentation of 121 organs and one tumor. To achieve this, the authors employ a sequential training approach using four datasets. At each stage, the previously trained backbone is frozen, and new low-rank-based modules are introduced to adapt the model to the current segmentation task.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The proposed fine-tuning method effectively enables the model to adapt to a target task by increasing the parameter count by approximately 5%.
2. This paper is well-written and well-organized.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The proposed method appears to diverge from typical continual learning approaches and aligns more closely with an efficient fine-tuning method. It utilizes a frozen, well-trained model to which multiple groups of LoRA modules are added, each corresponding to a specific task. Despite the four-stage sequential training process, each stage operates independently with respect to both the training data and learnable parameters. Consequently, I find it challenging to classify this method as continual-based since no learnable parameters are shared across tasks and the model does not leverage stream data for incremental learning.
2. The utilization of LoRA modules for fine-tuning is a well-established practice, and its effectiveness is thoroughly documented in existing literature. Consequently, this paper does not seem to offer new insights into the use of LoRA modules for model adaptation.
3. This paper employs three in-house datasets to demonstrate the effectiveness of the proposed method. However, the justification for using these specific datasets is not convincingly described.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

No.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. Refer to weaknesses.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Reject — could be rejected, dependent on rebuttal (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Given the ambiguity surrounding whether the proposed method qualifies as continual-based, its limited novelty, and the reliance on private datasets for evaluation, I recommend a ‘Weak Reject’ for this submission.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Weak Accept — could be accepted, dependent on rebuttal (4)
[Post rebuttal] Please justify your decision

I appreciate the authors’ efforts in addressing the concerns raised. I agree that the proposed framework is an architecture-based continual learning (CL) method. However, I maintain that it should also be considered an EFT method, given the identical settings between the two topics. Moreover, it is important to recognize that while architecture-based CL methods typically avoid forgetting, they can be limited by issues of scalability and inter-task generalizability. Although the proposed method demonstrates a significant advantage over the SUN model in terms of the rate of parameter increase, additional comparisons with other EFT methods would strengthen the paper. Finally, I raised my score to a ‘Weak Accept’ based on the authors’ rebuttal.

Review #2

Please describe the contribution of the paper

The author proposed Low-Rank Continual Pyramid Vision Transformer where they utilize LoRA to fine-tune the model based on a trained model to reduce the parameters needed significantly. They proposed to solve the catastrophic forgetting problem with light weighted low-rank adaptation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method is technically sound, the paper is overall well-written and easy to follow. Ablation studies and comparison results with other state-of-the-arts demonstrate effectiveness of the method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Althought the problem is formulated in a coutinal learning setting, but the LoRA implementation in this paper does not display significant correlation with coutinal learning. The LoRA in this paper is vanilla LoRA like in some previous paper SAMed[1] or SurgicalDINO[2] where the task is adapting foundation model to single surgical scene. Is there any difference in utilizing LoRA between this paper and paper I mentioned? This is my main concern about this paper.
2. In the abltion study, I think it would be beneficial to conduct a experiment where no LoRA is used to demonstrate the importance of LoRA.
[1] Zhang K, Liu D. Customized segment anything model for medical image segmentation[J]. arXiv preprint arXiv:2304.13785, 2023. [2] Cui B, Islam M, Bai L, et al. Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery[J]. International Journal of Computer Assisted Radiology and Surgery, 2024: 1-8.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

The main concern is that I did not see correlation between your implementation of LoRA and other non-countial learning method’s implementation of LoRA where it was designed to more common adaptation problems. This makes me think this paper lack some novalty. If you could explain my concern I would like to change my rating. Other things like writing, experiments seem reasonable to me.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Reject — could be rejected, dependent on rebuttal (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

LoRA is commonly used for many other previous surgical applications already. The LoRA in this paper seems not specific for countial learning where the utilization is same as some previous paper I mentioned. So I think this paper lack a bit of novalty in the design of network.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Weak Accept — could be accepted, dependent on rebuttal (4)
[Post rebuttal] Please justify your decision

The rebuttal from the author explained my concer about the relationship between CL and other common LoRA fine-tuning method. Although the implementation of LoRA is the same as in FFN layers, it do have relationship with CL. And also the experiments are well designed. Therfore I would change my opinion to Weak Accept.

Review #3

Please describe the contribution of the paper

This work proposes a novel continual learning approach for pyramid vision transformer models based on low rank adaptation (LoRA). The idea is to extend a frozen pre-trained model with learnable LoRA parameters for each new task in a continual learning pipeline. They show that their model does not suffer from catastrophic forgetting compared to regularization based approaches, while needing much less additional parameters compared to another architecture based approach.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- sufficient experiments, including ablation study and more detailed results in supplementary material
- pre-trained their model on large scale dataset
- promising results – low catastrophic forgetting
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- the proposed method basically trains a separate network for each task/dataset (in a very light weight and parameter efficient way) – but comparing to regularization approaches, that don’t need task labels at test time, and are therefore trying to solve a slightly different problem, is not an entirely fair comparison
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- the baseline methods should be explained in more detail in the introduction/ related work
- Are the upper bound models for CHO, HNO and Eso Tumor also pre-trained on the TotalSegmentator dataset?
- page 6: SUN is once referenced as [11] and once as [28] – probably a typo
- In the proposed approach, does one need to know the dataset/task the input image comes from at test time?
- It would be interesting to evaluate the robustness to domain shift of the proposed model. For example if I want to segment organs that were learned from the CHO data but my test data comes from a different data distribution (e.g. different hospital, scanner..) - how well would the network perform? As usually models trained on the totalSegmentator data are quite robust, I wonder if the proposed network looses that robustness when finetuning to specific tasks (CHO, HNO).. Maybe this would be interesting to look into for future work
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well written and the proposed method is well validated in the experiments.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Accept — should be accepted, independent of rebuttal (5)
[Post rebuttal] Please justify your decision

The reviewers answered my remaining questions.

Author Feedback

We thank all reviewers for their comments, especially in noting our paper is well-written (R1, R3, R4); proposes a novel (R1) and technically sound (R4) method; demonstrates effectiveness with sufficient ablation and comparison experimental evaluation (R1, R3, R4); achieves promising results with low catastrophic forgetting (R1).

Q1: Task label/ID at test-time (R1). A: We’d like to clarify that our method does not require the dataset/task ID of the input image in testing. Similar to SUN[11], we also have an output merging module, which uses an entropy-based ensemble to combine the prediction from all tasks (Model Inference in Methods).

Q2: Single dataset upper bound (R1). A: Yes, they use pre-trained weights on the TotalSegmentator dataset.

Q3: (R4, R3) New insights & differences between our implementation of LoRA and other non-CL’s implementation of LoRA, such as SAMed and SurgicalDINO. A: SAMed and SurgicalDINO adopted vanilla LoRA to the linear attention layer. In contrast, we implement LoRA to both FFN and 3D Conv layers in the PVT network to solve the more challenging CL task (obtaining the ability to segment new organs while preserving all previously learned organs), we conduct a holistic exploration of the architectural modification. We are the first to use LoRA in continual segmentation and design an effective architecture-based CL framework, which can minimize forgetting, maintain low params increasing rate, and achieve high performance on all sequentially learned tasks.

The rationale/insights of implementing LoRA to the feed-forward network (FFN) is to provide extra feature aggregation capability for segmenting new unseen organs. Further extending LoRA to 3D Conv (Methods 2.3) could help handle the spacing variation in sequential medical datasets, (LoRA matrices in 3D Conv would have different shapes from the original LoRA). The ablation studies of these two new LoRA components are shown in Table 2.

Q4: Difference between CL task and EFT task (R3). A: EFT’s goal is to adapt a pretrained model to a new task yet forget the previously learned knowledge. In contrast, CL sequentially expands the model to an arbitrary number of new tasks while “no forgetting” for all old tasks – more challenging than EFT. R3 is concerned that our method is not CL-based as “no learnable parameters are shared across tasks and the model does not leverage stream data for CL”. However, this is not true. According to [S1], CL is defined as “incrementally learning new information from a non-stationary stream of data” and is not related to shared parameters ([S1] van de Ven, G.M., et al. Three Types of Incremental Learning. Nat Mach Intell 2022). Ours is an architecture-based CL method, where it shares the base PVT and adds new LoRA incrementally for new tasks. Our experiment strictly follows the stream data like other CL works (SUN[11] ICCV23).

Q5: Ablation results where no LoRA is used (R4). A: Without LoRA, it degrades to a vanilla fine-tuning model when learning a new task – causing severe catastrophic forgetting on old tasks. This “fine-tuning” model performs even worse than the regularization-based CL methods, such as MiB[2] and PLOP[6]. If LoRA is applied only to the attention layers (as SAMed and SurgicalDINO), results are shown in the ablation Table 2 (referred to as “base”), indicating a 5%+ Dice drop as compared to our proposed framework.

Q6: In-house dataset (R3). A: We use the private chest organ dataset (CHO) because the public chest organ dataset is rare with very few labels, e.g., 4 organs and 60 CTs in SegTHOR. In contrast, our private CHO has 16 organs and 153 CTs. Note that similar private data are observed in other CL work, e.g. SUN [11] (ICCV23) also uses 3 in-house datasets. Yet, we appreciate this comment and will add more public datasets in the future.

Meta-Review

Meta-review #1

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The reviewers found the proposed method to be technically sound, with sufficient experiments, including an ablation study and more detailed results in the supplementary material. The rebuttal addresses most of the comments. I think the paper is in an acceptable state and would make an interesting addition to MICCAI.
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

The reviewers found the proposed method to be technically sound, with sufficient experiments, including an ablation study and more detailed results in the supplementary material. The rebuttal addresses most of the comments. I think the paper is in an acceptable state and would make an interesting addition to MICCAI.

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

All the reviewers agree to accept it, and I also believe that this is a good article
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

All the reviewers agree to accept it, and I also believe that this is a good article

back to top

Low-Rank Continual Pyramid Vision Transformer: Incrementally Segment Whole-Body Organs in CT with Light-Weighted Adaptation

Author(s):