Abstract

Accurate segmentation of pathology images plays a crucial role in digital pathology workflow. Fully supervised models have achieved excellent performance through dense pixel-level annotation. However, annotation on gigapixel pathology images is extremely expensive and time-consuming. Recently, the state space model with efficient hardware-aware design, known as Mamba, has achieved impressive results. In this paper, we propose a weakly supervised state space model (PathMamba) for multi-class segmentation of pathology images using only image-level labels. Our method integrates the standard features of both pixel-level and patch-level pathology images and can generate more regionally consistent segmentation results. Specifically, we first extract pixel-level feature maps based on Multi-Instance Multi-Label Learning by treating pixels as instances, which are subsequently injected into our designed Contrastive Mamba Block. The Contrastive Mamba Block adopts a state space model and integrates the concept of contrastive learning to extract non-causal dual-granularity features in pathological images. In addition, we suggest a Deep Contrast Supervised Loss to fully utilize the limited annotated information in weakly supervised methods. Our approach facilitates a comprehensive feature learning process and captures complex details and broader global contextual semantics in pathology images. Experiments on two public pathology image datasets show that the proposed method performs better than state-of-the-art weakly supervised methods. The code is available at https://github.com/hemo0826/PathMamba.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1354_paper.pdf

SharedIt Link: https://rdcu.be/dZxdZ

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_47

Supplementary Material: N/A

Link to the Code Repository

https://github.com/hemo0826/PathMamba

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Fan_PathMamba_MICCAI2024,
        author = { Fan, Jiansong and Lv, Tianxu and Di, Yicheng and Li, Lihua and Pan, Xiang},
        title = { { PathMamba: Weakly Supervised State Space Model for Multi-class Segmentation of Pathology Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {500 -- 509}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper the authors propose to combine vision Mamba model and contrastive learning for pathology image segmentation using image-level labels. Both pixel-level features and patch-level features are utilized to generate more consistent segmentation results. They also propose a deep contrat supervised loss to better utilize the limited data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) It is a good idea to introduce Mamba in this weakly supervised pathology image segmentation, which will enlighten more explorations in this direction. (2) Experiments and ablations show the proposed Contrastive Mamba Block is effective in improving the segmentation performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) No clear explanation on how the deep contrast supervised loss works. (2) The baseline methods are not selected properly. For the fully-supervised method, Unet has been an old method for benchmarking. It could be better to use a recent fully-supervised method for comparison (e.g. DETisSeg [1]). For weakly-supervised methods, there are more recent papers that should be compared with, such as CVFC [2] and [3]. [1] He, Penghui, et al. “DETisSeg: A dual-encoder network for tissue semantic segmentation of histopathology image.” Biomedical Signal Processing and Control 87 (2024): 105544. [2] Pan, Liangrui, et al. “CVFC: Attention-Based Cross-View Feature Consistency for Weakly Supervised Semantic Segmentation of Pathology Images.” 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2023. [3] Lan, Xiaobin, et al. “A Weakly Supervised Semantic Segmentation Method on Lung Adenocarcinoma Histopathology Images.” International Conference on Intelligent Computing. Singapore: Springer Nature Singapore, 2023.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It could be better to select most recent methods for comparison.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is novel, but the comparison is not good enough.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a weakly supervised state-space model for pathology image segmentation. The proposed method consists of a contrastive manba block with a deep contrast loss. It achieves competitive performance on two datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper claims it as the first work that introduces Manba to the weakly supervised image segmentation task and it shows competitive performance on two pathology image segmentation datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper lacks clarity regarding the relevance of Manba to the task of weakly supervised segmentation. A more explicit explanation of how Manba addresses the specific challenges inherent to weakly supervised segmentation would enhance the justification for its integration into this framework. Visualization results need to be provided to validate the significance of each effective module within the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See weaknesses

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose PathMamba, a weakly supervised state space model. They propose a Deep Contrast Supervised Loss and combine the pixel-wise feature map with image-level annotation with this loss and Visual Mamba to make fully use of the weak annotations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is novel in term of using both contrastive learning and Visual Mamba. The performance of this method is impressive on the two datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The clarity of this paper needs to be improved. To be specific:

    1. Why does the original Mamba have limited ability to capture non-causal information?
    2. In the introduction, the authors mention that they “adopt a lightweight decoder head to integrate dual-granularity contrastive feature sequences to predict segmentation masks”. But I did not find any explanation in the method section on this decoder. In Fig 1, it seems that Decoder and Mamba decoder are two different networks. What’s the input and output of this Mamba decoder and how is it trained?
    3. In eq(2), “A_s and A_w denote the pixel level feature map’s vital and weak attention area vectors”. How are the vital and weak areas determined? What’s the intuition behind this Contrast Correlation?
    4. Are \hat{Y}_n in eq(3) and \hat{y}_n in Fig.1 the same?
    5. How is Y’_n(i, j) in eq(3) calculated?
    6. What’s the relationship between the patch level label and the segmentation classes in the two datasets? Is the patch level label defined by the most dominant segmentation class in this patch?
    7. In section 2.2, the authors mention “we first expand each pathology image into a sequence along four different directions through a scan expansion operation” What are the four directions?
    8. The proposed method achieves comparable performance of the supervised UNet. Is this UNet trained from scratch? Can the Deep Contrast Supervised Loss also be applied to UNet to improve the performance?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As what I mentioned in the weakness section, I think the major problem of this paper is its clarity.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel and the performance is impressive. But the clarity of this paper needs to be improved.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely thank all reviewers for their valuable feedback that help us greatly to improve the quality of our manuscript. Response to Reviewer 1: Thanks for your questions. For question 1: Due to image-level labels instead of pixel-level labels, the supervisory ability of Multi-Instance Multi-Label Learning (MIML) is weaker than fully supervised methods. Therefore, to utilize the advantages of image-level labels in the training process, we combine MIML with deep contrast supervision to enhance the network’s learning ability. This is done by using the side outputs in the network to predict the probability maps of each side output layer and training the network by minimizing the loss between the output probability maps and the image-level ground truth. For question 2: Our proposed PathMamba is a weakly supervised segmentation method, and to highlight its superiority, we compare it to UNet, a fully supervised algorithm that uses pixel-level labeling. Although UNet is an old method, it is also a generalized segmentation method. We will refine the above description in the manuscript.

Response to Reviewer 3: Thanks for your suggestion. Weakly supervised methods require a high feature capture capability of the network, and previous methods are usually based on Vision Transformer (ViT) or CNN for modeling. However, CNN cannot capture long-range dependencies in images, and ViT is highly complex. Mamba can scale linearly while capturing long-range dependencies, and the memory footprint is smaller than that of ViT with quadratic complexity. We will add the above description of Mamba’s relevance to weakly supervised segmentation tasks to the manuscript to clarify how the proposed Mamba addresses the challenges inherent in weakly supervised segmentation.

Response to Reviewer 4: Thanks for your questions. For question 1, Mamba uses the S4 modeling approach, which is a purely sequence-to-sequence network. Mamba is characterized by modeling long sequence dependency information, which stems from the way it is parametrically run. It is an autoregressive model that is usually unidirectional; e.g., it has good temporal properties and can model causal sequences. However, it is not capable of modeling relationships between sequence elements, e.g., between non-causal sequences. For questions 2-7 regarding the details of the article, we will refine the description of the method section in the revised manuscript. For question 8, UNet is trained from scratch. Our proposed PathMamba combines Mamba and Multi-Instance Multi-Label Learning to extract long sequence-dependent information and pixel-level features, respectively, and Deep Contrast Supervised Loss to supervise the feature representation of both. UNet is a fully supervised segmentation method, and the Deep Contrast Supervised Loss cannot be used for it. This is because the UNet fitting process has only a single convolutional feature and does not contain two types of information (pixel-level and patch-level information) similar to those in our method.




Meta-Review

Meta-review not available, early accepted paper.



back to top