Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Weakly-supervised medical image segmentation with only image-level annotation is particularly challenging to infer precise pixel-wise predictions. Existing works are usually highly restricted by the assumption that the medical images for training and testing are under the same distribution. However, a robust weakly-supervised segmentation model needs to show accurate inference on medical images from unseen distributions. Different feature distributions can lead to a dramatic shift in the feature activation and class activation map (CAM), which in turn leads to the degradation of pseudo labels. In this paper, we aim to learn generalizable weakly-supervised medical image segmentation by focusing on enhancing the domain invariance for pseudo labels. A novel domain-invariant CAM learning scheme (D-CAM) is proposed, in which the content and style are decoupled during training. By inferring domain-invariant pseudo labels, the supervision of a segmentation model is more generalizable to different target domains. Extensive experiments under multiple generalized medical image segmentation settings show the state-of-the-art performance of our D-CAM. Source code is available at https://github.com/JingjunYi/D-CAM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0830_paper.pdf

SharedIt Link: https://rdcu.be/eHwTi

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04971-1_12

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/JingjunYi/D-CAM

Link to the Dataset(s)

https://drive.google.com/drive/folders/1E3Yei3Or3xJXukHIybZAgochxfn6FJpr?usp=sharing https://drive.google.com/drive/folders/1iS2Z0DsbACqGp7m6VDJbAcgzeXNEFr77?usp=sharing

BibTex

@InProceedings{YiJin_DCAM_MICCAI2025,
        author = { Yi, Jingjun AND Bi, Qi AND Zheng, Hao AND Zhan, Haolan AND Ji, Wei AND Huang, Huimin AND Li, Yuexiang AND Li, Shaoxin AND Wu, Xian AND Zheng, Yefeng AND Huang, Feiyue},
        title = { { D-CAM: Learning Generalizable Weakly-Supervised Medical Image Segmentation from Domain-invariant CAM } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {122 -- 132}
}

Reviews

Review #1

Please describe the contribution of the paper

The manuscript proposes a new evaluation scenario for weakly supervised methods that utilize image-level labels for training a semantic segmentation model, namely cross-domain weakly supervised learning or weakly supervised domain generalization. The main contributions lie in proposing a method for this scenario and evaluating it alongside multiple weakly supervised learning strategies from literature on histopathology imaging. The contributed method is based on decoupling the content and style of histopathology images by leveraging the fourier transform to seperate and normalize the phase and amplitude of the learned features in the classification network.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The area of exploration is very interesting and highly relevant. Weakly supervised methods suffer in their applicability for practitioners if small domain gaps already lead to significant performance drops. Setting up a test-bed for developing methods that go beyond individual domains is a worthwhile endeavour.
- Exploration is done with multiple generalization scenarios (i.e., multiple source and target datasets) with good quantitative results of the proposed method, as it produces the best results for three out of four cross-domain scenarios as measured in mean Intersection over Union.
- The proposed methodology is quite agnostic towards neural network architecture specifics and can theoretically be applied on top of many different existing approaches.
- The writing and presentation is good, the manuscript is easy to follow and provides a good overview over related literature.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The notation seems to be partially inconsistent and incomplete. On page 3 the formulation of $D^{(K_1)}$ is described inconsistent, as it is said, that there are N domains but the subsequent definition suggests there are $K_N$ domains and $N^{(K_N)}$ samples per domain. Further, in the description of the approach, key variables are not described, such as $\gamma$ and $\beta$ in Equation (3), or H and W in Equation (1).
- On page 4, it is stated that, $\hat{A}_i$ encapsulates the style information which is invariant to domain variation. This statement is experimentally unfounded, there are no explorations why this should be the case, i.e., analysis of amplitude components in the different histopathology domains and how this component does not deviate between them.
- On page 7, it is stated, that using the $P$ or $\hat{P}$ does not make difference, yet, the associated Table 3 does show that there is a significant difference in the results when investigating the BCSS –> Hist scenario: using $P$ even degrades the performance below the baseline scenario. This statement should be revised.
- The quantitative results in Figure 2 require to also display the ground-truth alongside the model predictions to enable the reader to see the advantages / improvements made by the different methods.
- The manuscript proposes a general method that may be utilized within any CAM-based weakly supervised segmentation method (i.e., merely access to feature maps is needed). An exploration to validate that a normalization of the amplitude of the fourier transformed feature space is indeed effective for domain generalization throughout has to be carried out.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- It is not clear what you refer to as Baseline in Table 1 and 2 even after consulting the paper the reader is referred to.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The manuscript is set in an important area of exploration and tackles an important practical problem to enable more annotation-efficient and transferrable solutions. Further, the proposed method is simple and conceptually gerneric.

Yet, there are considerable discrepancies towards notation, unfounded statements and the extent of explorations of the proposed method and its generalizability.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I have read the rebuttal, the other reviews and re-read relevant parts in the manuscript.

For most of my questions, the authors provide additional information in the rebuttal, which clarifies my concerns, especially regarding the presentation and missing citations for statements made in the text.

One of my raised points in the review is that generality of the proposed method could be shown by applying it to other CAM-based approaches. Here, the authors provide following insights: “Applied to CAM-based methods (C-CAM [1], S2C [3]), amplitude normalization improves mIoU by up to 4.25%, 3.79% on WSSS, validating generalization gains.” The information given here is very sparse, i.e., it is not clear whether the BCSS –> WSSS scenario or Hist –> WSSS scenario is meant here and what the absolute mIoU results are for the scenario after adapting the method as only improvement values are given. While this exploration lacks detail, I acknowledge that an improvement is evident when integrating the method into other CAM-based methods.

In summary, the rebuttal lifted my initial concerns towards the manuscript.

Review #2

Please describe the contribution of the paper

This paper tackles the problem of generalized weakly supervised medical image segmentation using only class labels. It proposes a domain-invariant pseudo-label generation method based on Instance Normalization (IN) and FFT, and validates the approach across multiple datasets.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method is conceptually simple yet achieves promising results. By leveraging IN and FFT, the method aims to produce domain-invariant feature representations, enabling the transfer of pseudo-label generation from the source domain to the target domain. The experiments are comprehensive, with extensive comparisons to many baseline methods.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The network in the figure involves downsampling, so the size of F_4 should be significantly smaller than the original image. It is unclear how the corresponding pseudo-labels are generated at this resolution. The paper should provide a clear explanation.
2. In the background section, the authors claim that a major limitation of existing methods is the inability to capture precise boundaries in pseudo-labels. However, the proposed method does not appear to address or resolve this issue.
3. The statement “From a frequency perspective, style information is typically found in the amplitude component, whereas content information is generally located in the phase component” should be supported with appropriate citations.
4. Why does the use of IN lead to domain-invariant representations? If IN itself is domain-invariant, what additional role does FFT play in this context? The rationale behind combining IN with FFT should be clarified.
5. The authors are encouraged to release their code to enhance reproducibility.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the method lacks sufficient novelty in its design, I believe that if the authors can adequately address the concerns raised above, I would be willing to reconsider and potentially raise my score.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I think the authors have addressed most of my concerns in the rebuttal and clarified any ambiguities.

Review #3

Please describe the contribution of the paper

The paper proposes a framework for weakly supervised medical image segmentation using domain-invariant class activation maps. The method decouples phase and amplitude in the frequency domain to extract style-invariant, content-preserving features for more robust pseudo-label generation.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is clearly written and well-structured, making complex ideas easy to follow.
- The paper introduces a novel and effective use of frequency-domain decomposition to achieve domain-invariant CAMs.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Additional ablation studies are needed. In particular, it would be helpful to evaluate the contribution of Stage 2 by comparing performance with and without the segmentation refinement stage. Since pseudo-labels are generated at Stage 1, the necessity of Stage 2 should be better justified.
- In Fig. 2, ground-truth segmentation masks are not shown, making it difficult to visually assess the accuracy of the predictions.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While there are some aspects that need improvement—such as more comprehensive ablation studies and clearer visualization—the paper presents a novel and well-motivated approach to domain-generalizable weakly supervised segmentation. Overall, the contribution is meaningful enough to justify the score.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The rebuttal addresses my concerns. I maintain my recommendation to accept the paper.

Author Feedback

We thank the reviewers for their thoughtful feedback. Below is our point-by-point response. Reviewer 1:

Notation and Formulations: $D^{K_{N}}$ refers to the dataset of the N-th domain; $N^{K_{N}}$ is its sample count. Eq. (3): $\gamma, \beta \in \mathbb{R}^{c}$ are affine parameters [6]. H, W denote feature map dimensions.

Amplitude as Domain-Invariant Feature: Prior works (SET[7], DAC[4]) suggest that style and content are primarily reflected in amplitude and phase, respectively. Table 3 shows using normalized amplitude $\tilde{A}$ improves performance on unseen domains.

Amplitude-Phase Justification: $\tilde{P}$ captures content robustly across domains. Adding $A$ may reintroduce style noise, while $\tilde{P}+\tilde{A}$ can cause information loss due to over-normalization. Our design balances these effects for better generalization.

Visualization: We will include GT masks in Fig. 2 to enhance result clarity.

Validating Amplitude Normalization: Applied to CAM-based methods (C-CAM [1], S2C [3]), amplitude normalization improves mIoU by up to 4.25%, 3.79% on WSSS, validating generalization gains.

Baseline Clarification: The baseline is WSSS-Tissue [2]. Our method builds upon it via frequency-based enhancement in D-CAM. Reviewer 2:

Stage 2 Ablation: Removing Stage 2 drops Hist/WSSS results (e.g., 42.85/40.56 mIoU → full pipeline: +3–5%), confirming its importance.

Justification for Stage 2: Stage 1 produces coarse pseudo labels; Stage 2 refines them using a segmentation model, crucial for more accurate boundaries [2,5].

Fig. 2 GT Masks: GT will be added to enhance visual clarity. Reviewer 3:

Pseudo-Label Resolution: F4 is upsampled for label generation.

Boundary Precision: Stage 2 addresses CAM limitations by refining spatial/boundary accuracy. Furthermore, in the context of domain generalized weakly supervised segmentation, the CAM trained on the source domain tends to produce large, region-level errors when applied to the unseen target domain. Addressing this issue is the core motivation behind D-CAM.

Amplitude-Phase Justification: SET [7], DAC [4] show that variations in image style are primarily manifested in the amplitude. This supports our normalization approach.

IN + FFT Rationale: IN is widely used to remove style noise, while content/style separation achieved by FFT makes this more effective. Together, they produce domain-invariant features.

Code Release: We will release code/models upon acceptance for reproducibility.

Chen, Z., et. al.: C-CAM: Causal CAM for weakly supervised semantic segmentation on medical image

Han, C., et. al.: Multi-layer pseudo-supervision for histopathology tissue semantic segmentation using patch-level classification labels

Kweon, H., et. al.: Exploring segment anything model for weakly supervised semantic segmentation.

Lee, et. al.: Effective normalization by playing with frequency for domain generalization.

Li, et. al.: Online easy example mining for weakly supervised gland segmentation from histology images.

Nam, et. al.: Batch-instance normalization for adaptively style-invariantneural networks

Yi, et. al.: Learning spectral-decomposited tokens for domain generalized semantic segmentation.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The authors addressed most of the concerns in the rebuttal.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

While the reviews received were mixed, the rebuttal clarified many raised concerns, and reviewers agree that the paper should be accepted. After reading the reviewer comments, as well as the paper, I do not have any major concern that contradicts the reviewers final scores. I thus recommend the acceptance of this work, and strongly recommend the authors to integrate, as much as possible, the “pre-rebuttal” concerns raised by the reviewers. Furthermore, it seems that during the rebuttal, another concern appeared, particularly related to the ambiguity between “Baseline” and “WSSS-Tissue” models, which I encourage to clarify.

back to top

D-CAM: Learning Generalizable Weakly-Supervised Medical Image Segmentation from Domain-invariant CAM

Author(s):