Abstract

Content-based histopathological image retrieval (CBHIR) has gained attention in recent years, offering the capability to return histopathology images that are content-wise similar to the query one from an established database. However, in clinical practice, the continuously expanding size of WSI databases limits the practical application of the current CBHIR methods. In this paper, we propose a Lifelong Whole Slide Retrieval (LWSR) framework to address the challenges of catastrophic forgetting by progressive model updating on continuously growing retrieval database. Our framework aims to achieve the balance between stability and plasticity during continuous learning. To preserve system plasticity, we utilize local memory bank with reservoir sampling method to save instances, which can comprehensively encompass the feature spaces of both old and new tasks. Furthermore, A distance consistency rehearsal (DCR) module is designed to ensure the retrieval queue’s consistency for previous tasks, which is regarded as stability within a lifelong CBHIR system. We evaluated the proposed method on four public WSI datasets from TCGA projects. The experimental results have demonstrated the proposed method is effective and is superior to the state-of-the-art methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1854_paper.pdf

SharedIt Link: https://rdcu.be/dY6iF

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72083-3_26

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1854_supp.pdf

Link to the Code Repository

https://github.com/OliverZXY/LWSR

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zhu_Lifelong_MICCAI2024,
        author = { Zhu, Xinyu and Jiang, Zhiguo and Wu, Kun and Shi, Jun and Zheng, Yushan},
        title = { { Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {274 -- 284}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces the Lifelong Whole Slide Retrieval framework, designed to enhance content-based histopathological image retrieval from whole slide image (WSI) databases. it incorporates a local memory bank employing a reservoir sampling method to maintain a representative feature space across both new and existing tasks, ensuring the model’s adaptability. Additionally, it introduces a distance consistency rehearsal loss to maintain retrieval accuracy for previously learned tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Innovative continual scenario: The paper proposes a novel framework that balances stability and plasticity in lifelong retrieval learning environments. Technical Contributions: Introduction of a distance consistency rehearsal module is something new not only in context of WSI but also in continual learning.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of Implementation Details: The paper does not provide sufficient detail about the specific implementation, such as the learning rate, epochs per task, how data is split(no mention to validation), and whether the whole slide images or patches are saved in the buffer. Additionally, the impact of the loss function based on feature distances, regulated by the alpha parameter, has not been experimentally validated. This omission complicates reproducibility and understanding of the approach’s applicability.

    Incomplete Comparative Analysis: While the paper claims improvements over state-of-the-art methods, it lacks a comprehensive comparative analysis with the latest relevant technologies such as Conslide[1], main baseline for continual learning on WSIs (if you can adapt DER++, famous for continual classification, I believe that even Conslide can be adapted).

    Reproducibility Concerns: There is no mention of the availability of the code, which coupled with insufficient methodological clarity, makes the paper’s findings difficult to replicate and verify independently.

    Questionable Practical Utility: The paper does not adequately address the practical applicability of the WSI retrieval task in aiding pathologists. Given the complexity and the detailed examination required in reading WSIs, the utility of retrieving similar slides is questionable. Pathologists primarily need rapid, clear indications of disease presence and location, not necessarily comparisons with other slides, especially in a continual learning scenario where performance may degrade over time. This raises concerns about the real-world effectiveness and usefulness of the proposed model in clinical practice.

    Minors: Typographical and Conceptual Errors: The paper misuses the term “plastcity” for “plasticity”. It is also unclear how the encoder’s output (referred to as ‘F’) is compared with the labels, especially since the labels correspond to whole WSI but the encoder processes on a patch level.

    Methodological Clarifications Needed: It is necessary to specify whether the buffer stores whole WSIs or patches and how cross-entropy loss, which typically takes two parameters, is applied with three in this context.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    To further enhance the quality and impact of your manuscript, I suggest addressing the following points in a detailed and constructive manner:

    1. Clarification of Methodological Details:
      • Expand on the implementation specifics such as learning rates, number of epochs per task, and the criteria used for selecting these hyperparameters. Such details are essential for reproducibility and to allow other researchers to build upon your work.
    2. Comprehensive Comparative Analysis:
      • Incorporate a more thorough comparative analysis with current state-of-the-art methods, particularly Conslide, which is known for its applicability in histopathological image analyses. This comparison will not only validate the superiority of your method but also contextualize its performance relative to the latest advancements in the field.
      • Provide a quantitative breakdown and detailed statistical analysis to substantiate the claims of your method’s superiority. Include metrics such as accuracy, recall, precision, and F1-score, and discuss the statistical significance of the observed differences.
    3. Addressing Practical Utility and Applicability:
      • Discuss the practical implications of using the LWSR framework in real clinical settings. Given the unique challenges of reading WSIs, consider whether retrieval of similar slides is genuinely beneficial for pathologists. Reflect on how this system can be integrated into clinical workflows, particularly focusing on how it enhances diagnostic efficiency or accuracy.
      • Consider potential modifications or additional features that could make the retrieval system more aligned with the practical needs of pathologists, such as integrating diagnostic suggestions or highlighting areas of interest within retrieved slides.
    4. Experimental Validation of Loss Function:
      • Elaborate on the role and impact of the novel distance-based loss component in your model. An experimental study focusing on this aspect could provide deeper insights into how this feature contributes to the overall performance of the retrieval system.
      • Experiment with varying the parameter alpha within your loss function to explore its influence on model performance and stability. This could offer valuable information on optimizing the balance between plasticity and stability in lifelong learning models.
    5. Reproducibility and Transparency:
      • Consider sharing the implementation code and data preprocessing steps as part of your supplementary materials. Making your code available would greatly aid in verifying the claims made and facilitate further research based on your framework.
      • Improve the transparency of your experimental setup by detailing the exact number of runs for each evaluation and the error margins associated with these runs. This information will help in assessing the reliability and robustness of your findings.
    6. Literature Review and References Update:
      • Update the literature review to include more recent studies that are relevant to your work. This will not only strengthen the foundation of your research but also show that the proposed method is up-to-date with current trends and technologies in the field.
      • Justify the selection of references, particularly focusing on why certain key pieces of literature may not have been tested against your method. This justification can help in positioning your work more strategically within the existing research landscape.

    Addressing these points will significantly enhance the depth, clarity, and impact of your manuscript, making it a valuable contribution to the field of medical image retrieval.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript is recommended for rejection due to several significant issues. Primarily, the lack of detailed methodological information undermines the reproducibility of the research and limits its applicability for further investigation. Additionally, the manuscript does not provide a robust comparative analysis with current state-of-the-art methods, notably omitting a comparison with Conslide, which is crucial for validating the claimed improvements. The practical utility of the proposed retrieval system in clinical settings is also not convincingly demonstrated, raising concerns about its real-world relevance for pathologists who require precise and rapid diagnostic insights. Furthermore, the absence of detailed statistical analysis and experimental validation, particularly regarding the novel loss function, compromises the scientific robustness of the findings. Lastly, the manuscript suffers from transparency issues due to the non-disclosure of source code and insufficient proced

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Thanks to the authors for the clarification. I have still doupts regarding the comparison with ConSlide, the SOTA in continual WSI analysis, so I am not confident to assign strong accept to this work



Review #2

  • Please describe the contribution of the paper
    1. The paper proposes a lifelong whole slide retrieval framework, which is the first to solve the continual learning problem in domain of histopathology image retrieval.
    2. The paper proposes a novel distance consistency rehearsal (DCR) module to maintain consistency of the retrieval queue for old tasks.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is the first to solve the continual learning problem in domain of histopathology image retrieval. Specifically, it addresses catastrophic forgetting with continuous learning, striking balance between stability and plasticity.
    2. The proposed framework constantly outperforms classic continual learning approaches, showing superior ability to handle the lifelong CBHIR task.
    3. The proposed distance consistency rehearsal (DCR) module is a novel design and the ablation study proves its effectiveness.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The explanation on how the framework facilitates retrieval precision with the memory bank is inadequate. This is slightly inappropriate since the memory bank, other than the DCR module, plays a larger role on improving the baseline model (Finetune) in the ablation study.
    2. Introduction to relative works is too brief, especially the replay-based methods. Since the proposed framework falls into the same category, failing to elaborate how replay-based methods are conducted leads to confusion on the difference between the proposed method and others.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    no

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The explanation on how the framework achieves retrieval precision with the memory bank is inadequate. The method section could be clearer if details about the retrieval relative losses were provided.
    2. The difference between the proposed framework and classic replay-based methods seems not clear enough. Is the DCR module the only difference or they differ from each other elsewhere as well? How, specifically, do the classic methods “not account for taking the interaction between the current task and previous tasks into consideration” and which part of design from the proposed method facilitates it?
    3. It would help readers better understand the algorithm if a brief introduction to the reservoir random sampling were included.
    4. It can be observed that in most circumstances, replay-based methods perform better with a 10-WSI buffer than those with a 15-WSI one. Additional discussion on this phenomenon would be nice.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is the first to facilitate continuous learning in the field of histopathology image retrieval, addressing the challenge of catastrophic forgetting. In addition to the replay-based paradigm, a novel distance consistency rehearsal (DCR) module is proposed to enhance the retrieval queue’s consistency for previous tasks. Experimental results demonstrate the framework’s superior effectiveness over existing methods. However, the clarity of this paper could be improved should more details about the replay-based paradigm be provided.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a framework for continual learning applied to content-based histopathological image retrieval. In this framework, a model sequentially learns from datasets of Whole Slide Images (WSIs). Each WSI is subdivided into patches that are transformed into feature cubes and stored in a memory bank. When a new task is introduced, the model combines feature cubes from the current task with sampled feature cubes from the memory bank. It then computes the distances between the feature representations of the new and stored WSIs. The framework introduces a method named Distance Consistency Rehearsal (DCR), which aims to maintain consistent distances between feature representations from old and new tasks by minimizing the mean squared error between the current task samples and replay samples.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Lifelong histopathology image retrieval is crucial given the extensive size of pathology databases, and the authors present a way to use public TCGA datasets for this task.

    2. The introduction of distance consistency loss seems novel, and it is particularly well-suited to the nature of the retrieval task. Unlike traditional lifelong classification tasks that do not use representation distance matrices, the paper shows effectively leverages this approach to improve continual learning in the context of retrieval tasks.

    3. The paper successfully adapts popular CL baselines, such as ER-ACE and A-GEM, for use with WSIs. The proposed approach consistently outperforms these baselines across various buffer sizes.

    4. The ablation study clearly highlights the benefits of the proposed distance consistency (DC) loss, providing clear evidence of its advantage.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Lack of Comparison with WSI-Specific CL Approaches: While the paper benchmarks against several continual learning (CL) baselines, it does not extend or compare its method to the only WSI-specific CL approach, ConSlide [13]. To establish the novelty of the proposed loss, it would be beneficial for the authors to explain the differences between their proposed Distance Consistency Rehearsal (DCR) and the Cross-Scale Similarity Learning (CSSL) approach used in ConSlide. Additionally, exploring whether the CSSL framework could be adapted for retrieval tasks might provide valuable insights to the community.

    2. Populating the Replay Buffer: The approach to populating the memory replay buffer marks a departure from the existing literature in lifelong learning from WSIs. In ConSlide [13], a random selection of regions from each WSI is stored in the buffer. In contrast, this paper seems to employ a method where the entire WSI is divided into patches, and all patches are stored. Can the authors clarify the rationale behind this strategy?

    3. Potential for Integrating Breakup-Reorganize Approach: A simple baseline that could have been explored is the ‘breakup-reorganize’ approach from [13], combined with the proposed DCR method. Would this integration yield any additional benefits in the lifelong retrieval task?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Hyperparameters for the different training configurations, including for the baselines, need to be provided. What implementations of the baselines were used? Are they public implementations, or did the authors implement them from scratch?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I suggest that the authors work on expanding the comparisons to include WSI-specific continual learning approaches, such as those described in ConSlide [13], to fully establish the benefits and novelty of the proposed Distance Consistency Rehearsal (DCR). Additionally, clarification on the rationale behind the memory buffer strategy would enhance the understanding of its advantages over previous methods.

    I also encourage the authors to take a second look at proofreading the manuscript, as some sentences are hard to read, and there were a few grammatically incorrect sentences. For example, ‘accounted’ in the following quote should be corrected: “methods are proposed to solve catastrophic forgetting in the domain of classification initially and do not accounted for taking…”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My current recommendation of “Weak Accept” is based primarily on the novel introduction of the Distance Consistency Rehearsal (DCR) method, which effectively addresses the challenge of catastrophic forgetting in histopathological image retrieval. However, the paper could benefit significantly from a clearer explanation of how the proposed approach differ from ConSlide and a more detailed justification for the memory buffer allocation strategy.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My main concern was the lack of justification for not using the BuRo approach from ConSlide. However, the I think the response that retrieval requires more complete semantic information of WSIs makes sense. I encourage the authors to include this intuition in the paper. I also think the provided results of integrating BuRo with the proposed method is insightful and strongly supports the authors’ choice, and I agree with R4 that the lack of these results in the original submission made the claim slightly weaker.

    I also shared R1’s confusion about the lack of proper justification for the memory bank design choices. Points 1 and 4 in the authors response address my concerns.

    If all code and experimental details are included in the final version, as the authors promise, this paper can provide a strong starting point for future continual learning for WSI retrieval research. I maintain my recommendation to accept the paper.




Author Feedback

The following responses tackle the following six issues.

  1. The effectiveness of DCR module –LWSR w/o DCR is the benchmark we adapted the continual learning into retrieval task where the designed memory bank and loss calculation strategy have shown crucial effect compared to the Finetune baseline. –Then, the proposed DCR module is designed to keep consistency of returned queues in old tasks and meanwhile further improve retrieval precision. It can be observed that both precision metrics and consistency metrics SRC and KRC are further improved compared to LWSR w/o DCR, which has demonstrated the effectiveness of DCR.
  2. The other differences of our method to continual baselines In some continual learning baselines like ER-ACE, DER++, and ConSlide, replay losses are calculated by treating online and stored samples separately, e.g. using cross entropy for online samples and regularization for stored ones. This method is effective in classification tasks but less useful in retrieval tasks where pair-wise loss is crucial. Calculating pair-wise loss separately for new and old samples hampers the model’s ability to differentiate between tasks. Conversely, our method merges online and buffered samples during replay loss calculations, which has enhanced the model’s capacity to discern fine-grained differences among samples.
  3. The reason we did not follow the paradigm of ConSlide –In ConSlide, CSSL combined with BuRo was used for histopathology classification, achieving data augmentation by randomly selecting and combining patches from same category WSIs. However, for retrieval tasks, representations are expected to describe the complete semantic information of WSIs. The virtual cases created by BuRo would mislead the retrieval model to describe the real WSIs. We had attempted to integrate BuRo within our method in our early study. It achieved a mAP of 0.563 with a buffer size of 10-WSI, which was even 11.7% inferior to the baseline method, DER++. This led us to exclude these results from our paper. We will include revised results of our method with BuRo in the final version if possible. –To maintain the semantics of WSIs, we chose to store as complete WSI information in the buffer as possible, rather than randomly selecting regions like ConSlide. Meanwhile, to ensure the efficiency of training, we stored representations of patches from WSIs instead of the original images.
  4. Discussion about the results for different buffer size A 10-WSI buffer balances diversity between current and previous tasks, aiding the model in achieving high retrieval precision after sequential learning. However, a 15-WSI buffer may overly emphasize past tasks at the expense of current ones, leading to decreased focus on the present task. That was the main reason, we think, the results for 15-WSI buffer could not surpass 10-WSI buffer.
  5. The value of slide research to pathologists WSI retrieval is discussed essential in CBHIR systems for quickly gathering relevant cases for pathologists (Chen et al. Nat. BME, 6(12): 1420-1434, 2022; Kalra et al. MedIA, 65: 101757, 2020; Wang et al. MedIA, 83: 102645, 2023). Therefore, we chose to first study the effectiveness of continual learning and our DCR method in WSI retrieval tasks. We will consider the region-level retrieval task in the future for it offers more detailed diagnostic information.
  6. Details and reproducibility declaration –We will release the code to cover the technical and experimental details. –We will elaborate retrieval related losses and reservoir sampling algorithm in context. –We will refine our sentences and modify ambiguous formulations, such as cross-entropy loss. –We will provide a table listing our experiment details in Supplemental Materials, such as learning rate, epochs per task, and dataset division details, etc.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top