Abstract

Histopathological whole slide image (WSI) analysis using deep learning has become a research focus in computational pathology. The current basic paradigm is through the multiple instance learning (MIL) method, which uses a WSI as a bag and the cropped patches as instances. As Transformer has become the mainstream framework of neural networks, many MIL methods based on Transformer have been widely studied. They regard the patches as a sequence to complete tasks based on sequence analysis. However, the long sequence brought by the high heterogeneity and gigapixel nature of WSI will bring challenges to Transformer-based MIL such as high memory consumption, low inference speed, and even low inference performance. To this end, we propose a hierarchical retentive-based MIL method called RetMIL, which is adopted at local and global levels. At the local level, patches are divided into multiple subsequences, and each subsequence is updated through a parallel linear retention mechanism and aggregated by each patch embedding. At the global level, slide-level subsequence is obtained by a serial retention mechanism and attention pooling. And finally using a fully connected layer to predict category score. We conduct experiments on two public CAMELYON and BRACS datasets and an internal TCGASYS-LUNG dataset, confirming that RetMIL not only has state-of-the-art performance but also significantly reduces computational overhead.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1723_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/Hongbo-Chu/RetMIL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Chu_RetMIL_MICCAI2024,
        author = { Chu, Hongbo and Sun, Qiehe and Li, Jiawen and Chen, Yuxuan and Zhang, Lizhong and Guan, Tian and Han, Anjia and He, Yonghong},
        title = { { RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The method describe a multiple instance learning approach based on the separation into subsequences in combination with a retention-based aggregation. The architecture is particularly efficient, e.g. with respect to memory compared to other Transformer MIL architectures.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written. Even though the content is complex, it is possible to follow. Sufficient motivation for the novel approach is provided. MIL is generally of a very high interest in the MIC community. There has recently been a large number of publications. The new approach has been compared with a good choice of state-of-the-art methods. The results show a good performance despite the higher efficiency of the method compared to other approaches.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I do not understand how the evaluation has been performed:

    Data set 1: “ We use all data of CAMELYON16 to conduct four-fold cross-validation experiments, and choose the CAMELYON17 training set as our testing dataset.” -> Was the test set always the same? How can it be cross-validation in this case?

    Data set 2: “We use four different sets of model initialization parameters for training and testing.” -> Was the training and test data set always the same? What is the motivation behind this strategy?

    Data set 3: “We conduct four-fold cross-validation experiments on the training set and perform inference on the test set” -> the same here: is it always the same test set?

    Fig. 3: here are conventional (non-transformer) architectures missing. Please provide a comparison also with, e.g. DS-MIL which is very efficient.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors write “Our code will be accessed shortly.” -> I guess this should mean “will be accessible”

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • please clarify the evaluation setting (see comments above)
    • I suggest to add the other approaches in Fig. 3
    • Clarity would profit from adding details to the caption of Fig. 1
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • overall the paper is well written
    • the method seems to be effective (and efficient)
    • sources will be published
    • the description of the evaluation are contradictory and need to be clarified. Was a real cross-validation performed or not?
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This study introduces retentive networks for an efficient, cost-effective attention mechanism with a hierarchical solution for multi-level application, validated on three datasets against traditional attentive models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel approach in the context of histological analysis. Well-executed and interesting experiments. Commitment to code release enhances reproducibility and transparency.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Comparisons primarily made with outdated models, lacking engagement with more recent advancements. Discussion on alternative linear complexity models networks is missing, limiting the context of retentive networks’ uniqueness.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper presents an innovative approach with clear potential; however, I recommend a revision at this stage. This should include a broader comparison or at least few lines in the intro section justifying why you don’t compare more recent state-of-the-art [1,2,3] and an expanded discussion on existing alternatives of efficient transformers[4]. These changes would significantly strengthen the paper’s claims and its position within the broader research context.

    [1] DAS-MIL: Distilling Across Scales for MIL Classification of Histological WSIs. In: Greenspan, H., et al. MICCAI 2023 [2] HIGT: Hierarchical Interaction Graph-Transformer for Whole Slide Image Analysis. In MICCAI 2023 [3] Multi-scale prototypical transformer for whole slide image classification. In MICCAI 2023 [4] Mamba: Linear-Time Sequence Modeling with Selective State Spaces

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I found the idea of using retentive networks interesting, original, and novel in the application to histological analyses. The paper is well-crafted in terms of writing and experiments. I believe that, apart from some minor refinements, there is no reason for this work to be rejected.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The primary focus lies in addressing the challenges posed by high memory consumption and slow inference speed. However, the square complexity resulting from the nonlinear mechanism of self-attention intensifies memory usage during both training and inference, leading to increased latency and decreased speed. The methodology tackles this issue by shifting attention from individual patches to sequences, which are then constructed accordingly. Subsequently, retention attention along with gated attention is employeds utilized to obtain the representation of subsequences, followed by the application of the same process to identify the representation of the entire slide. This approach effectively breaks down self-attention on a subsequence level, potentially mitigating memory consumption and improving inference speed due to parallelization opportunities.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-written and easy to follow. The authors address an important problem in MIL where many algorithms have high memory requirements and lower latency due to their reliance on transformer-type architectures. The algorithm is well-validated across three datasets and various model sizes. It demonstrates good performance, is fast, and consumes low memory, making the model highly relevant to the community.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper draws a lot of inspiration from the “Retentive Network: A Successor to Transformer for Large Language Models” paper. The idea of subsequence processing appears to bear a strong resemblance to the “Chunkwise Recurrent Representation of Retention” idea presented in the Retentive network. The difference between the proposed idea and the “Chunkwise Recurrent Representation” idea seems to lie in the calculation of the cross-chunk term. The authors employ a hierarchical approach to processing, along with attention-based pooling, to obtain slide-level representation. However, this similarity somewhat limits the novelty of the proposed method.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It would be intriguing to compare the proposed methodology with the “Chunkwise Recurrent representation” method proposed in the Retentive network. Additionally, exploring the effects of the S_{q+1} sequence and determining whether adding the same set of patches multiple times introduces any bias in the gated attention step or affects the final slide-level representation would be valuable. In future research, it would also be beneficial to compare the methodology with Graph Transformer Networks, which employ pooling to cluster multiple sets of patches, akin to how the model proposes subsequences to consider sets of patches. While the authors consider subsequences in 1D, it is also worth exploring their application in 2D due to the nature of whole-slide images (WSIs).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Inspite of similarities with Retentive Network, the method adapts the idea to MIL problem space which is highly useful to the community due to the memory and latency issues. The authors validated it across multiple datasets and also assess the memory and latency performance across different models and number of patches.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thanks to all reviewers for the valuable comments. Below are our point-to-point responses. To Reviewer #1: Thank you very much for your feedback. Regarding the issue you raised about the dataset, we acknowledge that our description was not very specific, and we will make corrections in the final version of the paper. For the Camelyon dataset, we divided the entire Camelyon 16 dataset into training and validation sets using four-fold cross-validation. Each time, the model that performed best on the validation set was tested on the fixed test set (Camelyon 17). We did this to evaluate our model’s performance more effectively using a larger and more comprehensive dataset. The same approach was applied to the Lung dataset. For the BRACS dataset, we followed the official dataset division, so the training, validation, and test sets are fixed. To thoroughly validate the model’s performance, we conducted four repeated experiments. Additionally, we have added a detailed description of the model in the caption of Figure 1.

To Reviewer #3: Thank you very much for your suggestions. The additional research you provided greatly enhances the completeness and impact of the paper. These studies innovatively use graph structures and hierarchical structures to build models, as well as the recently widely discussed Mamba structure. We will include a discussion of these works in the introduction section of the final version of the paper. Unfortunately, additional experimental results in the rebuttal are not allowed by MICCAI official, so we cannot provide detailed comparative experiments. We appreciate your understanding. Your suggestions have greatly improved our paper. Once again, thank you for your revision comments.

To Reviewer #4: Thank you for your suggestions. Your suggestions to compare with the Chunkwise Recurrent representation method and to explore the S_{q+1} sequence is very valuable. We will design experiments to verify these in future work. However, unfortunately, additional experimental results in the rebuttal are not allowed by MICCAI official, so we cannot provide detailed comparative experiments at this time. We appreciate your understanding. Additionally, your idea of comparing our work with Graph Transformer Networks and extending the work to 2-D is indeed very innovative. Thank you for these ideas, and we will continue to explore them in future research.




Meta-Review

Meta-review not available, early accepted paper.



back to top