Abstract

Class activation maps- (CAMs-) based image-level weakly supervised tissue segmentation has became a popular research topic due to the advantage of its low annotation cost. However, there are still two challenges exist in this task: (1) low-quality pseudo masks generation, and (2) training with noisy label supervision. To address these issues, we propose a novel weakly supervised segmentation framework with Activation Relocalization and Mutual Learning (ARML). First, we integrate an Activation Relocalization Scheme (ARS) into classification phase to more accurately cover the useful areas in initial CAMs. Second, to deal with the inevitably noisy annotations in pseudo masks generated by ARS, we propose a noise-robust mutual learning segmentation model. The model promotes peer networks to capture different characteristics of the outputs, and two noise suppression strategies namely samples weighted voting (SWV) and samples relation mining (SRM) are introduced to excavate the potential credible information from noisy annotations. Extensive experiments on BCSS and LUAD-HistoSeg datasets demonstrate that our proposed ARML exceeds many state-of-the-art weakly supervised semantic segmentation methods, which gives a new insight for tissue segmentation tasks. The code is available at: https://github.com/director87/ARML.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1156_paper.pdf

SharedIt Link: https://rdcu.be/dZxdR

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_39

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1156_supp.pdf

Link to the Code Repository

https://github.com/director87/ARML

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Fen_Mining_MICCAI2024,
        author = { Feng, Siyang and Chen, Jiale and Liu, Zhenbing and Liu, Wentao and Wang, Zimin and Lan, Rushi and Pan, Xipeng},
        title = { { Mining Gold from the Sand: Weakly Supervised Histological Tissue Segmentation with Activation Relocalization and Mutual Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {414 -- 423}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors propose a relatively novel weakly supervised method for histopathological images segmentation, which achieves the better performance compared with other SOTA methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. the proposed method is incrementally novel. The author proposes an ARS structure based on the self-attention structure to try to generate pixel-level pseudo mask. Then the authors also propose a mechanism to learn segmentation network through another two student models with similar structure.
    2. Extensive results are conducted on two datasets to prove the effectiveness of the proposed methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.The proposed method is only incrementally novel by inserting a self-attention like structure in generating the pseudo mask. It seems that the performance will be better is guaranteed due to the self-attention structure compared with the one without the structure.

    1. In the segmentation stage, it seems that there are multiple combinations of a, b, c to construct different or similar losses for segmentation. Why a is the teacher and the other two are students networks? Have authors tried segmentation performance by using fb or fc in the segmentation stage?
    2. When changing the backbone of the network, why there is not comparison with different backbones on the mask generation stage?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It will be better if the authors can release the code in the future.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.The proposed method is only incrementally novel by inserting a self-attention like structure in generating the pseudo mask. It seems that the performance will be better is guaranteed due to the self-attention structure compared with the one without the structure.

    1. In the segmentation stage, it seems that there are multiple combinations of a, b, c to construct different or similar losses for segmentation. Why a is the teacher and the other two are students networks? Have authors tried segmentation performance by using fb or fc in the segmentation stage?
    2. When changing the backbone of the network, why there is not comparison with different backbones on the mask generation stage?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    the novelty of the proposed method, whether the problem construction is reasonable, whether enough experiments have been conducted to support the claims in the paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors has solved most of my concerns, but still feel the novelty of the paper is incremental.



Review #2

  • Please describe the contribution of the paper

    the authors propose a novel approach to segment regions in histopathological images in a weakly supervised manner. The proposed approach promises to overcome limitations of previous methods working on Class Activation Maps for the purposes of segmenting regions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • results are reported on publicly available dataset, the strategy used to make these dataset suitable for the task at hand (weakly supervised signals from fully supervised annotations and their benchmark on full annotations) is sound and appropriate
    • results seem to support the claims of the authors
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • in the experimental evaluation the margins of improvement of the proposed approach in comparison with latest state-of-the-art methods aren’t very big
    • the method is a bit cumbersome and complex to grasp, while the improvements brought by it seem to be limited to a couple percentage points
    • the paper would benefit from more high level explanations of what the authors have done before delving into math
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    this method might be challenging to reproduce due to its complexity

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • the paper would benefit from more high level explanations and conceptual clarifications
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • the ratio of the complexity of the method vs. the improvement its results is not impressive but the method has merit and it is original to the best of my knowledge
    • the issue this approach is trying to help with, weak segmentation of histopathology images, is an important topic in the field.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper outlines a method that integrates ARS for better initial mask generation and employs mutual learning with SWV and SRM to handle noise in training data. These strategies lead to improved segmentation accuracy demonstrated through extensive experiments, where the proposed methods outperform existing techniques in terms of mIoU and other metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors have proposed Activation Relocalization Scheme (ARS) and added Mutual Learning (ML) with Noise Suppression which is interesting and, overall the methods and authors state SOTA performance with 1.5% ($\approx$) dice on BCSS and 0.9% ($\approx$) dice on LUAG-HistoSeg respectively.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The mutual learning concept is not novel and it has show its significance in other domains where ensemble methods improve the robustness of predictions especially in Histopathology.

    The hyperparameter for the final loss function (9) is $\lambda$ and again there are $\alpha, \beta$ hyperprameters of loss function L_st (equation (8)) and this is again integrated in final loss. This reviewer is convinced that, your model had good performance for datasets they have mentioned. But, how does one have to reproduce on other datasets. If there are these many hyperparameters how does one can track the optimization? If they need not to be tuned and are fixed for datasets what’s the fundamental importance of these many hyperparameters?

    The authors have improved 1.5% ($\approx$) dice on BCSS and 0.9% ($\approx$) dice on LUAG-HistoSeg. With many hyperparameters in loss and Mutual learning +Denosing strategies the model seems computationally expensive. It would be great, if authors would compare the latest SOTA metho’ds complexity with their (or at least GPU training time).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The ablation, visualizations (supplementary) and many other aspects give a conclusive evidence that, authors have provided clear instructions for their to ensure reproducibility. But, code is not publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please provide justifcations of each hyperparam used and how do they influence training. Second, note the complexity of model or GPU running time

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the proposed ARML have many components and individual components are well addressed by authors. The methods provides improved performance compared to previous baselines. Despite the fact that ARML is significant, it is still a question how this weightage is given in loss function to each of module (or the component). How do these weightages direct the network to perform well ? This is the major concern that is to be addressed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We would like to thank all the reviewers for their constructive comments.

Q1: Reproducibility of paper. A1: We will release the code and models on GitHub if paper is accepted.

Reviewer#1: Q2: Only incrementally novel by self-attention structure in generating pseudo mask. A2: The reactivation attention is also important. Since conventional CAM only focus on most discriminative regions, which is insufficient for precise segmentation. While our reactivation attention can push model to focus on non-predominant features to expand the useful areas of CAMs, and generate fine-grained pseudo masks to alleviate the impact of noisy labels. Compared with previous works, our method is simple but effective.

Q3: Why Fa is teacher network and Fb, Fc are student networks? Why not try segmentation performance of Fb or Fc? A3: In fact, three networks training independently, each network is supervised by the pseudo mask and the output of other two networks during training. To save paper layout space, we only take Fa as an example to illustrate our method. So when Fa is served as teacher, Fb and Fc will be the students. And the condition is similar on Fb or Fc network. Note that the denoising strategies for Fb and Fc are defined similar too. We already tried the segmentation performance of Fb or Fc. The results of our model listed in Table 1 is the average of three networks’ segmentation results for three repeated times.

Q4: No comparison with different backbones on the mask generation stage. A4: All backbones listed in Table 1 are applied to segmentation stage, they are not used as classification networks on the pseudo mask generation stage. Actually, the previous SOTAs listed in Table 1 serve ResNet38 as their backbone on the pseudo mask generation stage and so do we.

Reviewer#3: Q5: Method is a bit complex and the improvements aren’t very big. A5: Although our improvements of performances are not impressive, it has statistical significance (p<0.05) with other methods. And our method also reducing about 10%-30% GPU training time compared with the latest SOTA.

Q6: Benefit from more high level explanations. A6: ARS is proposed to reactivate the insufficient CAM to obtain more precise pseudo masks. SWV is proposed to highlight correct labels and suppress noisy signals. SRM is proposed to guide models in learning the feature relation difference between pixels, in order to better clarify noises and clean samples.

Reviewer#4: Q7: Hyperparameters justification and how do they influence training. A7: We conducted a sensitivity analysis of all hyperparameters, but not fully included in paper due to space limits. The λ in Eq.(9) is to balance the final loss of segmentation model. We found that the model achieves the best mIoU score 67.87 on BCSS validation set (shown in Table S3 in Supplementary Material) and 76.28 on LUAD-HistoSeg validation set when λ=0.2. Since there is an order of magnitudes between Lswv and Lst, so the value range of α and β in Eq.(8) is considered from 10 to 100 in order to balance loss. Moreover, the angle-wise loss can better guide model to excavate useful samples than distance-wise loss, so we give a larger weight to β and lower weight to α. When α and β increased, the mIoU score on validation set of two dataset also increased, and reached the best score on two datasets when α=30 and β=60. As these two values continue to increase, model’s performance will decrease. Since too large value will bias the model’s learning ability and bring extra noise to influence final performance.

Q8: Complexity comparison with SOTA. A8: We already done the complexity comparison but not included in the paper due to space limits. For MixTransformer, ResNet38 and ResNeSt101 backbone, the GPU training time of our method is 3.7h, 2.7h and 3.1h for BCSS, and 2.9h, 2.1h and 2.4h for LUAD-HistoSeg respectively. While the latest SOTA method TPRO needs 4.3h for BCSS and 3.2h for LUAD-HistoSeg. So our method is 10%-30% faster than latest SOTA.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    One reviewer increased the score from WR to WA, and the other two reviewers did not update the scores after rebuttal. The AC considered the paper, the rebuttal, and the post-rebuttal comments and find that the authors’ rebuttal addressed the major questions by the reviewers. The common concern by the reviewers is the incrimental novelty issue. The AC followed the reviewers’ recommentations and recommended acceptance of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    One reviewer increased the score from WR to WA, and the other two reviewers did not update the scores after rebuttal. The AC considered the paper, the rebuttal, and the post-rebuttal comments and find that the authors’ rebuttal addressed the major questions by the reviewers. The common concern by the reviewers is the incrimental novelty issue. The AC followed the reviewers’ recommentations and recommended acceptance of the paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    accepts

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    accepts



back to top