Abstract

Tissue-level semantic segmentation is crucial in digital pathology workflow. However, since dense pixel-level annotation of gigapixel pathology images is expensive and time-consuming, Weakly Supervised Semantic Segmentation (WSSS) methods have gradually attracted attention. The WSSS methods using image-level labels usually rely on Class Activation Map to generate pseudo labels, which have difficulty capturing complete object regions and may incorrectly activate regions with weak semantic relevance of pathology images. In this work, we propose SIA-WSSS, a weakly supervised semantic segmentation model for pathology images that synchronous inhibition and activation. Specifically, we first extract pathology images class and patch tokens using a VisionTransformer (ViT) and construct a Regularized Focus Mechanism (RFM). The RFM implicitly regularizes class-patch interactions through graph learning, ensuring that class tokens can dynamically compress patch information and inhibit irrelevant backgrounds. Next, we introduce a Discriminative Activation Module to contrast the class tokens of fine-grained regions and global objects to capture the unique features of each class and activate the foreground region. Moreover, we design a Regional Self-modulation Module synchronizing each region’s activation and inhibition information to generate segmentation results with finer structures. Experimental results on the LUAD-HistoSeg and BCSS-WSSS datasets demonstrate that the proposed SIA-WSSS significantly outperforms state-of-the-art WSSS methods. The code is available at https://github.com/Jsf826/SIA-WSSS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/5096_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

BCSS dataset: https://drive.google.com/drive/folders/1iS2Z0DsbACqGp7m6VDJbAcgzeXNEFr77 LUAD-HistoSeg dataset: https://drive.google.com/drive/folders/1E3Yei3Or3xJXukHIybZAgochxfn6FJpr

BibTex

@InProceedings{FanJia_Synchronous_MICCAI2025,
        author = { Fan, Jiansong and Di, Yicheng and Bao, Jiayu and Li, Lihua and Pan, Xiang},
        title = { { Synchronous Inhibition and Activation for Weakly Supervised Semantic Segmentation of Pathology Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a weakly supervised semantic segmentation (WSSS) method for pathology images that aims to address the limitations of existing approaches relying on Class Activation Maps, which often produce incomplete or inaccurate segmentations.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The Regularized Focus Mechanism uses Vision Transformer (ViT) class and patch tokens with graph-based regularization to suppress irrelevant background regions.

    The Discriminative Activation Module contrasts class tokens between local and global regions to enhance class-specific foreground activation.

    The Regional Self-modulation Module synchronizes region-wise activation and inhibition to capture finer structural details.

    Experiments on the LUAD-HistoSeg and BCSS-WSSS datasets show that SIA-WSSS outperforms current state-of-the-art WSSS methods.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper heavily relies on jargon and provides vague descriptions of key components — specifically the Regularized Focus Mechanism (RFM), Discriminative Activation Module (DAM), and Regional Self-Modulation Module (RSM). As a result, the technical contributions are difficult to follow.

    While the proposed approach introduces architectural novelties, the paper does not sufficiently justify the practical advantages of SIA-WSSS over existing CAM-based methods. The reported performance gains are relatively marginal, and the added model complexity is not clearly offset by substantial improvements in segmentation quality. A deeper analysis — such as qualitative comparisons, robustness under different weak supervision settings, or efficiency trade-offs — would help establish the value of the proposed approach.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper introduces mechanisms for weakly supervised segmentation through activation and inhibition modeling, the overall presentation is hindered by a lack of clarity in describing the core components, making it difficult to fully grasp their design or individual contributions.

    Moreover, the methodology adds considerable complexity without demonstrating a proportionate gain in performance or interpretability compared to existing CAM-based approaches. As a result, the practical value and usability of the proposed SIA-WSSS framework remain unclear. These issues limit both the accessibility and impact of the work, which currently does not meet the threshold for publication.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes SIA-WSSS, a novel framework for weakly supervised semantic segmentation of pathology images using only image-level labels. The method integrates activation and inhibition information to address common issues in WSSS, such as incomplete object activation and false positives. Key modules include a Regularized Focus Mechanism for suppressing irrelevant regions via graph learning, a Discriminative Activation Module for enhancing object completeness, and a Region Self-modulation Module for refined segmentation. Extensive experiments on two benchmark datasets demonstrate superior performance over state-of-the-art methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The overall framework is well-structured, clearly presented, and easy to follow, with each component logically motivated.

    Experimental results on two benchmark datasets show strong performance gains over existing state-of-the-art methods, and the ablation studies effectively support the method’s effectiveness.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper lacks discussion on failure modes or qualitative examples where the method underperforms, which would be helpful for understanding its limitations.

    While the method is technically sound, the description of each module (especially the RFM graph construction and update process) is sometimes overly dense and could benefit from clearer notation and step-wise explanation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses an important problem in computational pathology. The proposed method demonstrates technical design and empirical results. The integration of inhibition and activation mechanisms is thoughtful and contributes to state-of-the-art performance. However, the paper would benefit from clearer exposition and broader analysis of potential limitations.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Though the authors did not address my concerns. I still lean to accept this paper. Yet, it would not bother me if this paper is rejected.



Review #3

  • Please describe the contribution of the paper

    This work proposes a weakly supervised segmentation model that leverages synchronized inhibition and activation through novel focus, activation, and modulation modules to generate fine-grained pathology segmentations using only image-level labels. The major contribution of this paper is its graph representation learning based module RFM to ensure that model can extract high quality patch representation by discarding the unnecessary background regions.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper has careully designed modules to tackle 3 problems in the Pathology Image Segmentation: lack of fine-grained supervision, background noise interference and poor localization of class-specific features. The main strength of this paper is its graph representation learning based module RFM that yeilds the most performance gain in segmentation. The provided experiments show that this method outperforms all of the baselines in terms of both quantitative and qualitative results. The visualization of the segmenttaion results further adds to the argument that this method is reasonably better than the provided baselines. The descriptions along with the method diagrams are good enough and aids to the better understanding of the proposed method.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    While, I found the paper to be an interesting read, I am curious about whether the RFM module is a novel formulation by the authors or is comparable to some of the known graph learning based methods? Few explanations on how this compares to the existing graph learning method in terms of formulation would have been better.

    Similarly, the authors mention that they repeated the experiments 5 times but do not report the standard deviation in the main table.

    Because the authors claim DAM and RSM to be major contributions too, the ablation is missing experiments to show how these modules perform without the RFM module. Do they always have to be paired with RFM to yeild good performance? Can we use them directly with the baseline? What are the cases these would not work? The paper lacks these details.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a method that usages 3 modules to yeild superior performance than the baselines in both quantitative and qualitative metrics. Some concerns about the explanations of the proposed modules need clarifications. Otherwise, the paper is well written and easy to follow and the method looks promising.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors addressed some of the concerns and have added a discussion of graph learning methods to the revised manuscript and other content for more clarity. With this my major concerns are resolved and I recommend accepting this submission.




Author Feedback

Response to reviewer 1: We sincerely thank you for your valuable comments on this manuscript. We provide detailed responses to your questions:

  1. Regarding your response to the practical advantages of this study over existing CAM-based methods, as well as the performance improvement and increased complexity. Traditional weakly supervised semantic segmentation (WSSS) relies on CAM technology to provide coarse and imprecise class-specific dense localization maps. ViT enables WSSS methods to capture a wide range of contextual information while tending to aggregate global semantics in low-information patches, resulting in many weakly semantically related regions being misactivated. In this study, our innovation is to consider both the low-level semantic information that needs to be suppressed and the target semantic information that needs to be highly activated in the WSSS task, and design a novel WSSS framework, SIA-WSSS. Compared with the existing CAM-based and ViT-based WSSS methods, we think about the representation of different targets in the weak supervision process from an entirely new perspective. In addition, the experimental section of this study compares our SIA-WSSS in detail with the current best methods (SSC TIP2024, Mctformer+ PAMI2024, DuPL CVPR2024, etc.). Table 1 shows that SIA-WSSS outperforms existing methods in all indicators, and mDice has increased by about 2%-8%. It is worth mentioning that in the weakly supervised segmentation of pathological images, the above improvement is very significant. Figures 2 and 3 show the visualization results. In addition, your question about the increase in model complexity is understandable. The model parameters and inference time of SIA-WSSS and other current state-of-the-art methods were tested. The results show that the existing ViT-based methods have an average parameter volume increase of about 40%, while our SIA-WSSS is limited (about 10%). First, Regularized Focus Mechanism (RFM) is a lightweight graph learning that does not introduce external graph networks or additional decoders; Discriminative Activation Module (DAM) is built based on the built-in token relationship of ViT, without adding convolution or attention layers, and its operating efficiency is very similar to the original ViT; Region Self-modulation Module (RSM) is not a multi-branch structure, but completes foreground-background joint modeling through an internal token regulation mechanism. Due to the length of the manuscript, we did not show the results.

  2. Regarding the unclear description of the model, we respond it as follows: First of all, we apologize for your doubts about our description. We propose a WSSS model (SIA-WSSS), which can use image-level labels to finely segment pathological images. SIA-WSSS contains three core components: Regularized Focus Mechanism , Discriminative Activation Module, and Region Self-modulation Module . Due to the manuscript’s length limitation, our description of the above three components is not very detailed. To this end, in the revised manuscript, we will supplement the design principles of each component in more detail and explain their input and output, action targets, and role in the training process in detail. At the same time, we will strengthen the collaborative logic between modules and more clearly explain how the three components work together for foreground-background discrimination in pathological images.

Finally, thank you again for your comments!

Response to reviewer 2: Thank you for your comments and for pointing out the manuscript’s shortcomings. Based on your suggestions, we have added relevant discussions on the poor performance of failure modes to the revised manuscript. Best wishes to you!

Response to reviewer 3: Thank you for your comments, which helped us find some missing details. We have added a discussion of graph learning methods to the revised manuscript and other content. Best wishes to you!




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper’s core components in weakly supervised segmentation lack clear description, and the methodology’s added complexity doesn’t show proportional performance gains.

    We encourage the authors to revise the paper by enhancing clarity in component descriptions, demonstrating practical value, and conducting comprehensive analysis for potential reconsideration.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This work presents a novel method for weakly supervised semantic segmentation of pathological images, and experiments on public datasets showed its effectiveness. Though one reviewer has some concerns on its writing and practical value and usability, they are expected to be clarified in the final version, and does not affect the overall quality of this work.



back to top