Abstract

In computational pathology, deep learning (DL) models for tasks such as segmentation or tissue classification are known to suffer from domain shifts due to different staining techniques. Stain adaptation aims to reduce the generalization error between different stains by training a model on source stains that generalizes to target stains. Despite the abundance of target stain data, a key challenge is the lack of annotations. To address this, we propose a joint training between artificially labeled and unlabeled data including all available stained images called Unsupervised Latent Stain Adaptation (ULSA). Our method uses stain translation to enrich labeled source images with synthetic target images in order to increase the supervised signals. Moreover, we leverage unlabeled target stain images using stain-invariant feature consistency learning. With ULSA we present a semi-supervised strategy for efficient stain adaptation without access to annotated target stain data. Remarkably, ULSA is task agnostic in patch-level analysis for whole slide images (WSIs). Through extensive evaluation on external datasets, we demonstrate that ULSA achieves state-of-the-art (SOTA) performance in kidney tissue segmentation and breast cancer classification across a spectrum of staining variations. Our findings suggest that ULSA is an important framework for stain adaptation in computational pathology.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2012_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2012_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

https://atlas.kpmp.org http://haeckel.case.edu/data/KI_data/ https://www.kaggle.com/competitions/hubmap-kidney-segmentation/data



BibTex

@InProceedings{Rei_Unsupervised_MICCAI2024,
        author = { Reisenbüchler, Daniel and Luttner, Lucas and Schaadt, Nadine S. and Feuerhake, Friedrich and Merhof, Dorit},
        title = { { Unsupervised Latent Stain Adaptation for Computational Pathology } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this work, the authors seek to build a task-agnostic stain generalization method leveraging labeled data from a source stain and unlabeled data from target stains. Their approach entails two primary techniques: 1) cGAN trained to transform from s -> t used for augmentation of labeled images 2) feature consistency loss (unsupervised) on transformed vs non-transformed unlabeled images from target domain They assess their method on two datasets (kidney segmentation and breast cancer classification), and compare their method to a naive baseline and 5 other techniques. The other methods include classical color space corrections based on channel-wise statistics (classical methods) for normalization, as well as more modern techniques such as cGAN augmentation. The authors claim a new SOTA with their methodology based on best performance in intra- and inter-stain generalization on both datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem is a real challenge in computational pathology, and the proposed method for leveraging both labeled and unlabeled data is clever. The method could be readily used in computational pathology. The ablations on labeled data fraction and components of the methodology provide additional value and explanatory power. Finally, the paper and figures are largely clear and descriptive.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Major weaknesses The major weakness of this submission is that the results and methods are not well-situated within the context of the existing literature. In doing so, the authors claim both novelty and an SOTA result that cannot actually be evaluated. This significantly weakens the conclusions one can draw from the paper for several reasons:

    • The method is not novel in the ways that the authors claim in the introduction. For example, other pathology-based domain generalization techniques are unsupervised and task-agnostic (e.g. HistAuGAN, ContriMix, etc.). Feature consistency loss is a component of contrastive learning work. Other methods enable similar approaches without requiring labeling the stain/domain.
    • The claim of SOTA cannot be evaluated. Authors did not use a benchmark dataset or compare with results from other methods for which a SOTA has been established. For this reason the method cannot be fairly evaluated. Camelyon17-WILDS may be an appropriate benchmark. Finally, the method presumes the only difference between stains is “style” whereas this is not the case. Intra-stain differences (e.g. H&E intensity, color vectors, etc) may be readily considered “style” differences. However, inter-stain differences fundamentally contain different biological information that cannot be readily accessed from each other. For example, a lymphocyte on H&E may be CD8+ or CD8-, and it cannot be determined which from H&E. Thus, there is a necessary component of hallucination when performing stain transfer between different types of stain. This suggests that the problem may be poorly posed.

    Minor weaknesses

    • In the ablations, they mention using pre-trained weights from a “foundational model for histology” and report negative results relative to ImageNet pre-training. The cited paper does not claim to be an FM and the architecture is not consistent with current SOTA.
    • It is unclear if “comparable methods” were implemented in an apples-to-apples way / tuned to the same degree their method was. Implementations are not described in detail. This would give the reader more confidence in claims of SOTA, however, a benchmark comparison is still preferable.
    • Cosine similarity should be maximized rather than minimized if trying to match representations. This may simply be a text error.
    • Patients and WSI counts are missing from dataset descriptions

    Text edits and minutiae

    • Intra- and inter-stain are flipped
    • Notation in formulas is challenging to follow, some overloaded terms
    • There are small grammatical errors throughout, suggest another round of copy editing Overall, I recommend considering re-ordering of some results, e.g. the definition of the full objective function comes when the components are only partially explained.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the weaknesses section. A more thorough integration with the existing literature would be helpful, although this breaks some of the claims in the paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper’s central claims do not appear to hold up. However, further work might support them.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I appreciate the authors’ response to the review, and clarifying that not all tasks can be accomplished simply by converting one stain to another (though I could not find this in the manuscript). Still, there are remain fundamental issues with the manuscript, including inability to validate SOTA claims by not comparing to appropriate methods or literature, and fundamental issues with the language: “inter/intra” being flipped in usage, “minimization” of cosine similarity etc. Correction of these technical issues is possible, but would require weakening the claims of contribution.



Review #2

  • Please describe the contribution of the paper

    The paper proposed an unsupervised latent stain adaptation method (ULSA) to address the domain transfer problem of deep learning models in digital pathology under different staining techniques. This method increases the labeled source images through stain transformation and utilizes unlabeled target stain images for feature consistency learning, thereby achieving effective stain adaptation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. An unsupervised latent stain adaptation method is proposed, capable of effective stain adaptation without labeled target stain data.
    2. By utilizing stain transformation and feature consistency learning, supervised signals between labeled source images and unlabeled target images are enhanced, improving model performance.
    3. In kidney tissue segmentation and breast cancer classification tasks, the ULSA method achieves state-of-the-art performance under different staining variations.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In the experimental section, there is a limited selection of comparison methods, which may overlook better-performing alternatives.
    2. In the evaluation of stain adaptation, only one evaluation metric (Dice score and AUROC) was utilized. More metrics are needed to comprehensively assess method performance.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. In the introduction section, it would be beneficial to provide a more detailed explanation of the domain transfer issues faced by deep learning models in digital pathology, along with additional background information.

    2. In the methodology section, a more detailed description of the specific steps and parameter settings for cGAN staining enhancement and feature consistency learning would be helpful.

    3. In the experimental section, providing more experimental details such as a comprehensive description of the dataset, model hyperparameter settings, and training strategies would be advantageous.

    4. In the results and discussion section, a more in-depth analysis of the experimental results, discussing the differences and strengths/weaknesses of different methods, and suggesting future directions for improvement would be valuable.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an effective method for addressing the staining adaptation issue in digital pathology and achieves good performance in experiments. However, there is still room for improvement in some aspects of the paper, such as detailed description of the method and comprehensive evaluation of the experiments.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My questions were well answered.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a novel training strategy for handling stain variability in histopathological images, a critical issue stemming from diverse staining techniques. This strategy employs Unsupervised Domain Adaptation using a cycle GAN to bridge the gap between source and target stains, combined with feature consistency learning to ensure the model produces stain-invariant features. The innovation lies in this integration, which not only addresses the scarcity of annotated datasets but also enhances model robustness across different staining conditions. Moreover, the authors demonstrate that their approach outperforms state-of-the-art methods in both segmentation and classification tasks, underscoring its effectiveness and potential impact on medical image analysis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents several notable strengths that enhance its contribution to the field of medical image analysis. Firstly, it proposes a reproducible Unsupervised Domain Adaptation training strategy for common histopathological tasks such as segmentation and classification. This approach not only aids in achieving consistent results across different studies but also serves as a valuable framework for other researchers in the field. A major highlight of the paper is the innovative combination of Unsupervised Domain Adaptation with feature consistency learning. This integration effectively tackles the challenge of stain variability and ensures that the model’s features are invariant across different datasets, enhancing the robustness and applicability of the model. Furthermore, the strategy outlined in the paper has been demonstrated to outperform other state-of-the-art Domain Adaptation techniques in both segmentation and classification tasks. The ability to excel in these varied problem types underlines the versatility and effectiveness of the proposed method. Another significant strength is the robust handling of datasets. By performing multiple folds on the datasets, the authors ensure that their results are reliable and not biased towards any specific subset of data. This methodological rigor adds an additional layer of credibility to their findings. Lastly, the paper is well-written and clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper exhibits a few minor weaknesses that could be refined to enhance its overall presentation and clarity. One area of concern is the complexity of the figures; the first figure attempts to convey a substantial amount of information in a single image, which may overwhelm the reader. Additionally, the schematic in the second image could benefit from clearer labeling and explanation to prevent any misunderstanding about the processes it illustrates. For example, the explanation of how labeled data is utilized in the model training process could be more detailed. The paper does not fully articulate whether the labeled data is used directly for training or primarily for adapting the target model. Furthermore, there is a slight ambiguity regarding the strategy’s application. It remains unclear whether the strategy is intended to create a universal model applicable to all stains or to develop a robust model specifically tailored to a particular stain using data from other stains. Clarifying this could help readers better understand the strategy’s scope and potential applications.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is overall well-structured and presents a compelling approach to handling stain variability in histopathological images. However, there are a few areas where additional clarification could further enhance the manuscript:

    1. Details on Training Folds: It would be beneficial for the authors to provide more detailed information on how the training folds were constructed. Specifically, clarifications on how patient data was handled and whether images from the same Whole Slide Images (WSI) were included in both training and testing datasets would be valuable. This detail is crucial for understanding the robustness of the model training and the generalizability of the results.
    2. Consistency in Figures: In section 2, in (C), There appears to be a discrepancy between the text and the figures regarding the use of “stain translated” images in feature consistency learning. The text describes their use, yet the corresponding figure lacks an illustrative arrow or indication of t
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper should be accepted due to its substantial contributions to histopathological image analysis, particularly in addressing stain variability with a novel combination of Unsupervised Domain Adaptation and feature consistency learning. The strategy outperforms current state-of-the-art methods and introduces a reproducible framework beneficial for further research. The minor issues identified regarding figure clarity and methodological details are easily correctable and do not significantly impact the overall quality of the work.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all reviewers for their valuable feedback R3) Metrics: We report the most common metrics and observed equivalent performance deviations between methods for e.g. accuracy R3) cGAN/FCL information: We followed the original cGAN implementation and all adjusted HPs and used data are provided in Sec 3.2. Further information is contained in the supplements. We carefully detailed FCL in Sec 2C and Fig1b R3+4) Train strategies/comparable methods: We provide all HPs and follow the original implementations (Sec 3.2). We performed grid search for a fair comparison, with more details in supplementary R3) Future work & strengths/weaknesses of methods: We already provide information on both, e.g. scarce data settings + catastrophic forgetting in Sec 4 Ablation, runtime issues of Macenko in Sec 3.2, performance deviations Tab1. We will add more information to our discussion R3+4) Information on domain transfer issues: This is numerically shown by comparing to Baseline experiment (Fig 1a, Tab1), visualized in Fig1c and discussed in Intro R4) Literature: We include methods evaluated for inter-stain setups and cover a wide variety of strategies (classical, GAN-based and semi-supervised approaches). On the other hand, HistAuGAN/ContriMix are evaluated for intra-stain setups. We chose cGAN for supervised augmentation by following the benchmark of Zingman et al. and comparable work Bouteldja et al. (see Introduction). Note that HistAuGAN/ContriMix could potentially replace the cGAN component, which we will consider in future work R4) Task agnostic: Both mentioned works haven’t been evaluated for dense prediction tasks like segmentation. Computer Vision literature often shows different performance deviations for methods evaluated between classification and dense tasks, Tab1 reflects this R4) Novelty & FCL: We explicitly designed our FCL component for stain adaptation in pathology and focused on forcing the network to learn the same representations for different stainings in a layer-wise fashion (major novelty). Moreover, we learn in an unsupervised and supervised manner with cGAN in parallel, another novelty. These facts are highlighted as strength by all reviewers R4) Domain label dependency: The stains in all public datasets and also in our internal dataset are inherently labeled, since the staining agent is always known in a pathology workflow R4) Data/SOTA: In our work we focus on the problem of inter-stain variations (Fig1a, Tab1), whereas the Chameleon dataset only provides a single staining (H&E). Our experiments also comprise several public external datasets (Sec 3.1). We provided numerical evidence for the efficiency of our approach R4) Inter-stain variations might hide information: As pointed out, stains do not only vary in color, but may also provide different biological information (e.g. stainings that specifically highlight immune cells). However, the addressed tasks of cancer detection and kidney tissue segmentation across inter-stainings are feasible for all stainings. Nevertheless, we will add this aspect to the discussion as a possible limitation and highly appreciate this feedback. After consulting with our medical collaborators, we do not think our models are hallucinating, as this special case of missing biological information is not contained in our experimental setup and results showed improvements R5) Ambiguity about strategy application: Our method is designed to train a model “to rule them all”, so is robust to all considered inter-stain variations and is also evaluated for this setup (Intro, Tab1). We think this is an important step towards training foundational models R3+5) Patient data details: Kidney: we evaluate on external datasets NEPTUNE and HuBMAP which were hidden in training. Breast: We split data on patient level to avoid data leakage. Thus, no patient or slide appears in both train and test sets (Sec 3.1). More information on data is contained in the supplementary, we will add patient and WSI counts.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper proposes an approach for to transfer information across different staining types, and evaluate the performance of the approach on a segmentation and a classification task.

    The reviews positively remark on the consistent performance of the approach, the application, and the ablation studies. It is critically mentioned that the underlying goal is not fully clear (universal model vs. model for finetuning), the methods selected for comparison and therefore the presentation of the novelty, and the selection of the datasets used for benchmarking.

    The rebuttal addresses the comments, however, ratings continued to be divided. Still, given the overall positive feedback, I would see that the merits outweigh the shortcomings and recommend accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper proposes an approach for to transfer information across different staining types, and evaluate the performance of the approach on a segmentation and a classification task.

    The reviews positively remark on the consistent performance of the approach, the application, and the ablation studies. It is critically mentioned that the underlying goal is not fully clear (universal model vs. model for finetuning), the methods selected for comparison and therefore the presentation of the novelty, and the selection of the datasets used for benchmarking.

    The rebuttal addresses the comments, however, ratings continued to be divided. Still, given the overall positive feedback, I would see that the merits outweigh the shortcomings and recommend accepting this paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top