Abstract

As massive medical data become available with an increasing number of scans, expanding classes, and varying sources, prevalent training paradigms–where AI is trained with multiple passes over fixed, finite datasets–face significant challenges. First, training AI all at once on such massive data is impractical as new scans/sources/classes continuously arrive. Second, training AI continuously on new scans/sources/classes can lead to catastrophic forgetting, where AI forgets old data as it learns new data, and vice versa. To address these two challenges, we propose an online learning method that enables training AI from massive medical data. Instead of repeatedly training AI on randomly selected data samples, our method identifies the most significant samples for the current AI model based on their data uniqueness and prediction uncertainty, then trains the AI on these selective data samples. Compared with prevalent training paradigms, our method not only improves data efficiency by enabling training on continual data streams, but also mitigates catastrophic forgetting by selectively training AI on significant data samples that might otherwise be forgotten, outperforming by 15% in Dice score for multi-organ and tumor segmentation.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0065_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0065_supp.pdf

Link to the Code Repository

https://github.com/MrGiovanni/OnlineLearning

Link to the Dataset(s)

https://huggingface.co/datasets/AbdomenAtlas/AbdomenAtlas1.0Mini

BibTex

@InProceedings{Cho_Embracing_MICCAI2024,
        author = { Chou, Yu-Cheng and Zhou, Zongwei and Yuille, Alan},
        title = { { Embracing Massive Medical Data } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a proactive learning framework which is more practical than conventional supervised training. The proposed framework uses the replay buffer to allow training on data flow with one pass and integrates data pruning and structure & uncertainty prioritization strategy to achieve comparable performance to training for multiple epochs on full data. The advantages are demonstrated on the continuous flow of vast CT data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper studies an important problem. The impracticality of conventional supervised training is a valid concern, and the paper adapts a reasonable angle to address the issue (continual learning and data selection).
    2. The paper overall clearly describes the problem and the method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper does not address existing works well and fails to clearly define the originality of the proposed framework. For example, not a single work on continual learning for medical images ([1,2] and many more exist) is mentioned. The claim that the paper is “the first study to forecast this trend and proactively take action in response” is not adequately grounded. There could be differences in the setting (pre-training or other factors). However, it should be important that these works be thoroughly discussed and compared.
    2. The paper does not well demonstrate the connection between each technical component and their intended contribution. The data pruning and structure & uncertainty prioritization strategy are claimed to have effects on both adaptation and mitigation of catastrophic forgetting. However, there are numerous methods that deal with these issues. Why are the chosen methods appropriate in the scenario and how can the results support these?
    3. The evaluation is insufficient. The paper only compares with conventional training but not with other continual learning methods [3] or data selection methods [4].
    4. Although the paper is overall well organized, there are still confusing presentations. For example, in the main results (Table 2, Figure 2), are the +DP and +SUP methods orthogonal to each other or are they combined for the final results? What do the baselines refer to in the “Implementation and baselines” paragraph?

    [1] Perkonigg, Matthias, et al. “Dynamic memory to alleviate catastrophic forgetting in continual learning with medical imaging.” Nature communications 12.1 (2021): 5678. [2] González, Camila, et al. “Lifelong nnU-Net: a framework for standardized medical continual learning.” Scientific Reports 13.1 (2023): 9381. [3] Smith, James Seale, et al. “A closer look at rehearsal-free continual learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. [4] Yan, Shipeng, Jiangwei Xie, and Xuming He. “Der: Dynamically expandable representation for class incremental learning.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper studies an important problem but lacks sufficient study on the existing works in this direction and related topics. I suggest that the authors incorporate these into the background of the proposed framework, carefully define the differences from the proposed method and clearly state the originality. I also suggest that the authors evaluate more continual learning methods, including but not limited to the replay buffer methods, and better demonstrate why the proposed method could effectively mitigate the catastrophic forgetting issue in the specific setting. In its current state, I don’t think the paper objectively presents the novelty and soundness of the proposed method and thus is not in a good shape to be accepted.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of mention of important related works and questionable originality. Limited evaluation and comparison.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper addresses the significant challenge of catastrophic forgetting and suboptimal performance when training models on stream data. The proposed method, which requires only a single pass per data point and selectively retains samples that exhibit high uncertainty and low similarity to other data, presents a novel approach to mitigate these issues. To validate the effectiveness of the proposed method, the authors have utilized two extensive datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper addresses an important and practical topic: training models on data streams. While previous methods in continual learning have largely focused on task-by-task data processing, this work innovatively considers a more practical scenario where data arrives in batches. This distinction is significant as it closely aligns with real-world data flow, making the proposed approach highly relevant to practical applications.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Experimental Settings Clarity: The experimental settings described in this paper, particularly in Table 2, is somewhat ambiguous. While the one-pass data processing approach is intriguing, the reported performance raises concerns about its efficacy. The results suggest suboptimal (/poor) performance, prompting a question about the necessity of this setting.
    2. Comparative Analysis: The paper lacks a comparative analysis with existing methods, including continual-learning-based and uncertainty-based approaches.
    3. Method Section Overview: The Method section is missing a comprehensive overview. Providing a holistic view of the approach would help readers better understand the proposed method.
    4. Detailing of Method and Experiments: The paper lacks crucial details about the proposed method and the experimental procedures.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    While the paper lacks some critical details, the provided code compensates for these gaps.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Refer to Weaknesses
    2. Section 2.1 - Data Sampling: The claim in Section 2.1 that ‘mini-batches x of training data can be generated at any time by random sampling from the buffer’ raises concerns about the feasibility of ensuring each data point is used exactly once.
    3. Section 2.2 - Minimum Similarity Criterion: The rationale behind choosing the sample with the ‘minimum similarity to its nearest neighbor’ (Section 2.2) is unclear. Typically, minimum similarity would identify the most unique or outlier samples rather than the most common or redundant ones.
    4. Equation (1) Clarification: The notation ∑Sc/Sc used in Equation (1) is not clear. Could you please provide a detailed explanation or correct the formula if there is a typographical error?
    5. Inconsistency in Dataset Size: There appears to be a discrepancy in the number of samples reported in this paper compared to those cited in reference [15].
    6. Figure 2 - Performance Restoration: The graph on the left of Figure 2 shows a performance restoration after training on dataset D13. It would be helpful to explain what factors or mechanisms contribute to this observed improvement.
    7. Please provide more details about the iteration numbers of Cvtnl, RB, DP, SUP, and Epoch*.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The introduction of the proposed method and the details of the experimental settings remain unclear, which impedes the reviewer’s full comprehension of the study’s implications. Additionally, the lack of sufficient comparative analysis with existing methods limits the conclusion of the proposed method’s effectiveness. While this reviewer appreciates the authors’ effort to tackle a practical and clinical scenario, addressing these key issues would significantly enhance the quality of the paper. Therefore, ‘Weak Reject’ is now suggested.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors addressed my concerns.



Review #3

  • Please describe the contribution of the paper

    The paper aims to avoid catastrophic forgetting when learning on big datasets by storing the most distinctive data points in terms of high uncertainty and low similarity to other data points. The paper proposes three components: a replay buffer to train the model with only a single pass over the data (rather than over multiple epochs), data pruning to keep only distinctive samples in terms of diversity, and a priorization strategy to retain informative samples in terms of uncertainty.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is easy to read and well-written.
    • The authors provide a simple, but convincing strategy to address catastrophic forgetting
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The description of the SUP strategy is confusing. In particular, the norm. function used to compute alpha_c hat is not explained, making it difficult to understand when alpha_c hat would be in ]0, 1[. Also, is S_c computed from the ground-truth masks?

    2) The novelty of the methodology is questionable. Several papers have used replay-based approaches (for example, “GCR: Gradient Coreset based Replay Buffer Selection for Continual Learning”, CVPR 2022 or “Gradient Episodic Memory for Continual Learning”, NeurIPS 2017)

    3) Comparison with other continual learning methods is missing

    4) Standard deviation is missing in results section

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Data pruning description is confusing. The authors mention that the aim is to discard the most redundant sample in the buffer which is the “sample with the minimum similarity to its nearest neighbor”. Would the most redundant sample be the one with maximum similarity?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written with a good experimental component to support the methodology proposed. However, the results are missing comparison with existing continual learning methods, and the methodology requires some clarifications.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

  1. Existing methods & our contributions (R3-R5). We will cite the suggested literature, [1-4] from R3 and [5-6] from R5, in our manuscript. Compared with these works, we are addressing more challenging scenarios faced in the practice. It is derived from massive medical data associated with following stringent algorithmic requirements.

Req. 1: AI must handle incomplete annotations. Annotating medical images at a voxel level is expensive, so public datasets are often tailored to specific classes, thus partially labeled.

Req. 2: AI must handle a continual data stream. The conventional training strategies that heavily rely on ‘epoch’—a complete pass through the entire dataset—are impractical.

Req. 3: AI must handle new classes/tasks. Most anatomical structures and abnormalities are currently unannotated in medical datasets; but we anticipate that they will be annotated soon, and the AI should not require repeated retraining for each new class/task.

Req. 4: AI must train using manageable resources to avoid excessive storage and computational demands in clinical settings, particularly from an online “streaming” learning perspective.

Currently, no single method satisfies all the aforementioned requirements. [1-6] violated Req. 1 as they required the inputs to be fully labeled. [2-4] violated Req. 2 as they required the AI to see the entire dataset multiple times. [1-2] violated Req. 3 as they discarded the possibility of new classes/tasks. [1,5] violated Req. 4 as they required additional extensive memory (2,000 vs. our 128 samples) for data selection.

Our contributions: the CLIP module can segment agnostic classes given corresponding embeddings (Reqs. 1,3), and can learn with partially labeled datasets (Tab. 2,3). Next, the use of a replay buffer under our experiment setting eliminates the concept of ‘epoch’ (Req. 2), enabling single-pass training on data streams (Tab. 1,2). Last, our technical contribution of DP and SUP can efficiently select important data without the need for additional extensive parameters (Req. 4), while achieving similar performance to epoch-based training (Tab. 2).

  1. Evaluating more continual learning methods (R3). Both suggested works [3-4] required the AI to see the entire dataset multiple times, which is impossible to implement in our data stream setting (Reqs. 1-4). Unlike [3-4], our method only has to pass through the entire data once, actively learns from important samples (Fig. 3), and effectively mitigates catastrophic forgetting (Fig. 2).

  2. Poor performance & experimental setting (R4). The setting of data streaming aims to make AI learn from the vast and ever-expanding medical data where the AI agent cannot revisit previous samples or repeatedly pass through the entire dataset. Also, the conventional static dataset with a fixed number of structures has problems when extending to new classes/tasks. On the other hand, the results of single-pass training in Tab. 1 justify the efficacy of the streaming setting. The reason behind the performance drop in Tab. 2 is due to the shifted distribution and different annotation policies between sub-datasets instead of the streaming setting.

  3. Confusing presentations. (R3) Our final results in Tab. 2 and Fig. 2 are obtained by the combination of DP and SUP. The baseline refers to Liu et al., ICCV 2023 compared in our experiments. (R4) Fig. 1 is an overview of the method. The data point is stored in the replay buffer once arrived and the iteration number is proportional to the sampling rate. We discard the sample with the ‘minimum similarity to its nearest neighbor’. \sum_c^{m} S_c is the numerator and S_c is the denominator in Eq. 1. The dataset size (N=2,100) is consistent with the GitHub of Liu et al., ICCV 2023. Performance restoration is due to the shared classes across these sub-datasets. (R5) Norm function is to normalize the results across classes and S_c is computed from gt mask. We will include standard deviation in our manuscript.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors’ rebuttal has effectively addressed the concerns and questions raised by the reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors’ rebuttal has effectively addressed the concerns and questions raised by the reviewers.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper looks good. The authors should also order the references in the document. Based on the reviewers comments, I think the paper can be accepted with weak accept.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper looks good. The authors should also order the references in the document. Based on the reviewers comments, I think the paper can be accepted with weak accept.



back to top