List of Papers Browse by Subject Areas Author List
Abstract
Class-incremental learning (CIL) in medical image-guided diagnosis requires models to retain diagnostic expertise on historical disease classes while adapting to newly emerging categories—a critical challenge for scalable clinical deployment. While pretrained foundation models (PFMs) have revolutionized CIL in the general domain by enabling generalized feature transfer, their potential remains underexplored in medical imaging, where domain-specific adaptations are critical yet challenging due to anatomical complexity and data heterogeneity. To address this gap, we first benchmark recent PFM-based CIL methods in the medical domain and further propose Conservative-Radical Complementary Learning (CRCL), a novel framework inspired by the complementary learning systems in the human brain. CRCL integrates two specialized learners built upon PFMs: (i) a neocortex-like \textbf{conservative learner}, which safeguards accumulated diagnostic knowledge through stability-oriented parameter updates, and (ii) a hippocampus-like \textbf{radical learner}, which rapidly adapts to new classes via dynamic and task-specific plasticity-oriented optimization. Specifically, dual-learner feature and cross-classification alignment mechanisms harmonize their complementary strengths, reconciling inter-task decision boundaries to mitigate catastrophic forgetting. To ensure long-term knowledge retention while enabling adaptation, a consolidation process progressively transfers learned representations from the radical to the conservative learner. During task-agnostic inference, CRCL integrates outputs from both learners for robust final predictions. Comprehensive experiments on four medical imaging datasets show CRCL’s superiority over state-of-the-art methods.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0019_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/CUHK-BMEAI/CRCL
Link to the Dataset(s)
N/A
BibTex
@InProceedings{WuXin_ConservativeRadical_MICCAI2025,
author = { Wu, Xinyao and Xu, Zhe and Lu, Donghuan and Sun, Jinghan and Liu, Hong and Shakil, Sadia and Ma, Jiawei and Zheng, Yefeng and Tong, Raymond Kai-yu},
title = { { Conservative-Radical Complementary Learning for Class-incremental Medical Image Analysis with Pre-trained Foundation Models } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15973},
month = {September},
page = {56 -- 66}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper proposes a new CL method utilizing a pretrained foundation model and inspired from the human brain.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well-written, clear language and flow.
- The CL topic is very relevant and important.
- The Conservative-Radical Complementary Learning (CRCL) framework is an interesting attempt at mimicking biological memory systems in deep learning.
- Excellent experiments on 4 medical benchmarks to validate the effectiveness of the proposed method.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
[1] The core novelty of the paper lies in its two-learner system but this has been proposed in the literature; at least the computer vision field (e.g., [i]). [2] If the radical learner already adapts to new classes, why is knowledge transferred back to the conservative learner? [3] The authors use an ImageNet-pretrained ViT, but why not leverage medical foundation models (e.g., MedSAM, Uni-MedCLIP)? These are trained on millions of medical images and often outperform ImageNet-based backbones in domain-specific tasks. [4] Related to [3], there’s no justification or comparison provided against these models. A key missing experiment is to evaluate the architecture with a medical-pretrained ViT to see if ImageNet pretraining is suboptimal.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a promising approach with comprehensive experiments and ablation studies. However, the choice of using an ImageNet-pretrained backbone instead of leveraging medical foundation models trained on millions of domain-specific images raises a fundamental question. Wouldn’t utilizing a medical foundation model be more relevant and potentially more effective for this task?
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I still believe that the novelty of the paper isn’t significant and similar works have been published in the computer vision field, but the authors did a good job addressing most of the concerns raised by the reviewers.
Review #2
- Please describe the contribution of the paper
Authors develop a new CIL method based on training adapters for ImageNet-pretrained ViT model. Their approach is inspired by brain structures and utilizes multiple adapters with merging, RanPac, new losses that ensure compatibility between the adapters’ representation. Authors perform validation on four medical classification datasets and proposed CLCR greatly outperforms the baselines.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The authors compare their method against dozens of others (including the most recent ones) and achieve a significant improvement over all.
- The ablation study clearly demonstrates the impact of each innovation proposed by the authors.
- The proposed method appears to integrate elements from various state-of-the-art approaches, although it is not absolutely clear what is the contribution in each design choice (see a comment below).
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The paper lacks references and comparative analysis of the design with other methods. For example, with the exception of extra losses and collaborative inference, CLCR is MOS [19] with single adapter for K prior tasks (instead of K adapters in MOS).
- The authors conduct experiments on multiple medical datasets, consistently reporting substantial to dramatic performance gains. Table 1 says that every dataset is split into four tasks. My concern is the improvement may be due to CLCR is tuned to fewer tasks than other methods. For example [20] uses 10-20 sessions with more classes in each. Did the author tried their approach on conventional CIL datasets like CIFAR-100? Is CLCR a general purpose approach or it is optimized for medical images in some way?
Additional comments:
- Formula (2) applies the Hadamard product between a matrix and an embedding vector, which is not mathematically valid due to mismatched dimensions.
- It is not clear how weights are updated and if they are subject to consolidation. On page 5 authors describe extension of the weight matrix, seemingly amusing |Y_t| > |Y_{t-1}|, while it is not true.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The authors propose an interesting method that incorporates elements from various modern approaches. The ablation study is comprehensive, and the method is thoroughly compared against state-of-the-art techniques. I would be inclined to recommend acceptance, but I expect authors to discuss CLRL dependence on the current validation setup and potential for generalization to a larger number of classes and tasks. Also there should be clarification of how their approach relates to MOS (and maybe other approaches).
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors have adequately addressed all concerns, and most responses were convincing. However, the claim of following [4, 31] for the 4-task CIL setting is questionable—e.g., [31] uses only one dataset split into four tasks, with additional 7- and 10-task splits. Taking into account author feedback, I consider this to be a minor issue.
Review #3
- Please describe the contribution of the paper
This paper introduces a framework inspired by complementary learning systems in the human brain, using two learners: a conservative learner that retains accumulated knowledge, and a radical learner that rapidly adapts to new classes. The approach is built on pretrained foundation models (PFMs), which are first adapted to the medical imaging domain through lightweight adapters composed of a down-projection, ReLU, and up-projection. During this adaptation phase, the PFM backbone remains frozen while only the adapters and classification heads are trained.
After adaptation, the radical learner is fully optimized to learn new classes, while the conservative learner serves as an exponential moving average (EMA) version of the radical learner, gradually integrating its knowledge to maintain long-term stability. This consolidation process ensures the system retains previous knowledge while learning new tasks. During inference, the outputs (logits) from both learners are combined to produce robust, task-agnostic predictions.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
While the use of two learners is not a novel idea in itself (see weakness), the paper presents a well-executed and thoughtful implementation of this concept.
-
The results table is comprehensive, including comparisons with a wide range of state-of-the-art class-incremental learning methods.
-
The ablation study is detailed and demonstrates that each component contributes meaningfully to the overall performance, with the exception of collaborative inference, whose impact appears relatively limited (see weaknesses).
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
The paper does not report explicit forgetting metrics or analyses. While average and final session accuracies are presented, there is no direct evaluation of how much previously learned knowledge is forgotten over time.
-
There is a lack of detail regarding the continual learning task setup. Although the authors mention using the same data splits as in [4, 31], the paper should be more self-contained. Clear information about the number of tasks, classes per task, and whether classes are disjoint is crucial for understanding and reproducing the evaluation protocol.
-
The paper shares conceptual similarities with existing dual-network approaches such as DualNet [A], which also employs complementary fast and slow learners inspired by the Complementary Learning Systems (CLS) theory. However, this connection is not acknowledged or discussed.
-
The results lack standard deviation reporting. Most class-incremental learning studies report both mean and standard deviation over multiple runs to account for variability. This omission is particularly relevant in the ablation study, where some variants (e.g., Abla-5) achieve performance very close to the full model, making it unclear whether observed differences are statistically significant.
[A] Pham, Quang, Chenghao Liu, and Steven Hoi. “DualNet: Continual Learning, Fast and Slow.” Advances in Neural Information Processing Systems 34 (2021): 16131–16144.
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I’ve found the core idea of the paper interesting and potentially a strong foundation for future research. However, I have some doubts and I would like these to be clarified in the rebuttal to confirm my rating.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We are glad that reviewers find our work “well-written, clear language and flow” (R2), “important topic” (R2), “strong foundation for future study” (R3), “interesting, promising, well-executed method” (All), “excellent experiments on four medical benchmarks” (R2), “comprehensive ablation study” (All), and “significant improvement over SOTA” (R1, R3). The only negative score came from R2. Our responses to major concerns are as follows.
Q1: Clarify novelty and relationship with MOS (R1) and DualNet (R3). A1: Regarding MOS, key differences exist. CRCL achieves stronger performance with a single adapter (Table 2), offering better scalability. Using single adapter presents a challenge: how to prevent forgetting without task-specific adapters, which motivates our regularized learning strategy. In contrast, MOS uses multiple task-specific adapters and the main challenge lies in effective task-adapter retrieval during inference. CRCL provides a task-agnostic solution and avoids task-ID retrieval, where misretrieval can lead to cascading errors. We will discuss DualNet. While conceptually inspired by CLS theory as well, DualNet predates PFM-era methods and requires replay. We focus on more recent replay-free methods like LAE [7], which can be viewed as a PFM-era counterpart to DualNet. It uses calibrated learning rate to support its online learner to avoid forgetting. CRCL outperforms LAE clearly (Table 2).
Q2 (R2): Why use ImageNet-pretrained ViT rather than medical-specific PFMs (eg., MedSAM, Uni-MedCLIP)? A2: An interesting question. While we initially expected medical-specific PFMs to provide better domain-specific features, our empirical findings suggested otherwise. A feature quality test using SimpleCIL [33] showed that BiomedCLIP transfers poorly across our medical sub-domains (LastAcc: 21.49), whereas ImageNet-based PFMs perform much better even without first-session adaptation (LastAcc: 38.30). MedSAM is mainly optimized for segmentation and is suboptimal for classification. Thanks for suggesting Uni-MedCLIP (arXiv24). Although not peer-reviewed, it reportedly outperforms BiomedCLIP. In our quality test, it slightly improves over the general PTM by 3.1%. We are conducting a more comprehensive study on medical PFMs in journal extension.
Q3 (R2): If the radical learner already adapts to new classes, why transfer knowledge back to the conservative learner? A3: The conservative learner not only preserves long-term memory but also participates in training-time regularization to alleviate forgetting through feature alignment and cross-classification constraints. Consolidating the radical learner’s updates helps gradually adapt its representation, preventing abrupt divergence as new tasks arrive. This consolidation also echoes the hippocampus-neocortex interaction.
Q4: Evaluation setup concerns: (R1) fewer incremental tasks and medical-specific tuning? Generalizability to conventional datasets? (R3) Self-contained: task number, classes per task, and if classes are disjoint. A4: For R1, we follow the same 4-task split as [4,31]. At MICCAI, we focus more on the potential of general-domain PFMs in medical tasks. Yet, CRCL is not limited to fewer tasks or medical domains. On CIFAR100 with 10 tasks, CRCL achieves competitive performance (LastAcc: 91.63 vs. 91.35 [20]), showing its generalizability. For R3, Table 1 summarizes the datasets, with 4 tasks per dataset. Each task contains equally partitioned classes (Fig. 2), and classes are disjoint as stated in Sec. 2.1. We will revise to improve clarity.
Q5 (R3): Lack of explicit forgetting metrics and standard deviation reporting. A5: Reporting average and final session accuracy are also the mainstream (e.g., [30,33,34,19]). Our results are based on three runs. We agree that including forgetting metrics and std can be more comprehensive, and we will include them in our extension.
Other minor concerns will be carefully revised. Code will be released to help follow the details.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper proposes a method of interest to the community, is well-written and overall comprehensively evaluated. Post-rebuttal, all reviewers suggest acceptance of the paper.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A