Abstract

Conventional hematoxylin-eosin (H&E) staining is limited to revealing cell morphology and distribution, whereas immunohistochemical (IHC) staining provides precise and specific visualization of protein activation at the molecular level. Virtual staining technology has emerged as a solution for highly efficient IHC examination, which directly transforms H&E-stained images to IHC-stained images. However, virtual staining is challenged by the insufficient mining of pathological semantics and the spatial misalignment of pathological semantics. To address these issues, we propose the Pathological Semantics-Preserving Learning method for Virtual Staining (PSPStain), which directly incorporates the molecular-level semantic information and enhances semantics interaction despite any spatial inconsistency. Specifically, PSPStain comprises two novel learning strategies: 1) Protein-Aware Learning Strategy (PALS) with Focal Optical Density (FOD) map maintains the coherence of protein expression level, which represents molecular-level semantic information; 2) Prototype-Consistent Learning Strategy (PCLS), which enhances cross-image semantic interaction by prototypical consistency learning. We evaluate PSPStain on two public datasets using five metrics: three clinically relevant metrics and two for image quality. Extensive experiments indicate that PSPStain outperforms current state-of-the-art H&E-to-IHC virtual staining methods and demonstrates a high pathological correlation between the staging of real and virtual stains. Code is available at https://github.com/ccitachi/PSPStain.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2078_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ccitachi/PSPStain

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Che_Pathological_MICCAI2024,
        author = { Chen, Fuqiang and Zhang, Ranran and Zheng, Boyun and Sun, Yiwen and He, Jiahui and Qin, Wenjian},
        title = { { Pathological Semantics-Preserving Learning for H&E-to-IHC Virtual Staining } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a framework for IHC virtual staining from HE slides including pathological and spatial semantics in the learning scheme.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • overall good writing
    • interesting idea
    • comparison with SOTA + ablation studies
    • testing with two datasets
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • some ideias are confusing
    • some assumptions are not completely right
    • results improvements are not clear
    • some metrics don’t seem the most appropriate
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Figure 2 is difficult to read, because it’s too confusing. There are a lot of concepts interwined and some steps seem out of context.

    • “the tumor content within these images remains consistent.” This is not true in case of tissue loss, for example.

    • The prototype-consistent learning strategy sounds too confusing. Authors should clarify the rational and clear up some concepts. How did they use a segmentation model as feature extractor? What are cross-proptotypes? How are the probability maps built (for which class)? …

    • I suggest including cycleGAN in the evaluation.

    • I suggest including a classification task from generated images as part of the evaluation (if possible, with public a public dataset(s)). Also, a qualitative testing on real/fake images would strengthen the analysis.

    • “revealing that PSPStain closely matches the ensemble of ground truth.” Such statement is too strong, when the results presented in the paper seem to stained, in comparison with the ground-truth.

    • “The higher PSNR and SSIM indicate higher image quality however not always in virtual staining because of the inconsistent GT pairs.” The authors only mention the limitation without suggesting or discussing a solution for a more accurate analysis of the results.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is overall well-written, altough there are some concepts that should be clarified. There is one method that should be included in SOTA comparison and authors should improve the testing and results analysis.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors study a image-to-image generation problem consisting of mapping a H&E image into the associated IHC image. They propose two new loss functions based on prior knowledge about IHC images and the imperfect spatial alignment with associated H&E to improve the H&E-to-IHC mapping quality obtained using a Generative Adversarial Network (GAN).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the two losses introduced are well motivated for the problem of H&E-to-IHC synthesis. Those losses formalize prior knowledge about the task
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • there are missing implementation details about the proposed two losses. It is not possible to implement them based solely on the manuscript.
    • the performance in terms of the most widely used metrics (PSNR, SSIM) decreases when using the proposed losses
    • The method seems limited to HER2 IHC
    • No details on the method used to find the hyperparameter values (learning rate, losses weight values, the focusing parameter, the number of epochs)
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • No details on the method used to find the hyperparameter values, this limits the reproducibiliy of this method on different datasets than the one used here.
    • The authors specify that the code will be available. This would greatly improve the reproducibility of this work.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Missing information about the proposed loss functions

    It is not clear to me how the FOD is estimated. Here is the only explaination of how the FOD is estimated that I have found in the manuscript: “the first color deconvolution layer attempts color deconvolution of IHC images and gets the DAB channel image” What is the deconvolution method used here? Are there limitations to this method? In particular, how accurate does it really estimate the protein expression? To what extent is it specific to HER2? The estimated FOD is a central part of the method. I think FOD’s estimation should be more clearly described to allow the reader to us the method.

    What is the segmentation model used for the proposed PCLS loss? And how does the segmentation performance influences the results obtasined using the PCLS loss? This should be discussed in this paper since PCLS is part of the main contributions.

    1. Performance.

    I have noted your point about the impact of spatial misalignment on the PSNR and the SSIM. Howerver, this does not justify why those metrics in your abblation study on the two proposed losses (Table 2). Is there another explaination to this tendency?

    1. The method seems limited to HER2 IHC

    Only tested with HER2 IHC. Would the method work for other proteins than HER2? Since it is tested only for HER2, it should be mentioned explicitely in the title, the abstract, and either in the discussion or conclusion.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • lack of clarity of the method
    • limited reproducibility
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces a new method for virtual slide staining, tracking two of the current major problems. Introducing an extension of an existing method for optical density determination and a new learning strategy - inspired by a similar approach in segmentation - the method aims to reduce the effects of the insufficient mining of pathological semantics, as well as the spatial misalignment of pathological semantics.

    The method is thoroughly tested on two public datasets fitting for the task and compared to current state of the art approaches. The results of the new method are superior or comparable to the current approaches, and the optical results seem significantly closer to the ground truth than with the current state-of-the-art solutions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of the paper are a) the proposed novel method and b) the thorough evaluation of the method on public datasets.

    a) The authors are able to describe their new method for pathological semantics preservation, as well as their learning strategy, in a detailed and comprehensible way. The approach is based on similar, proven methods and solid research combined into a new single pipeline. Applying (and refining) a new way to overcome spatial misalignment is an important step for digital pathology and can be applied beyond the presented use case.

    b) The evaluation of the method on two public datasets shows a certain degree of generalizability of the proposed approach. The comparison with 4 state-of-the-art methods created for the same use case further provides a good overview of the improvements provided by the proposed method. The addition of an ablation study in the evaluation provides further insights into the

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the overall novel methods are detailed quite precisely some details, e.g., the used color deconvolution algorithm, are not mentioned. As the code is also not made publicly available the reproduction of the study results on the current information is impossible.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The study mentiones the two public datasets used for training and evaluation.

    The current state of the work does not allow for reproducibility. While the approach is detailed quite precisely, it is not detailed enough to reproduce the approach completely. Some details, like the used color deconvolution algorithm, are missing.

    Providing the code for the proposed new method, as well as the code used to reconstruct the evaluation in the study, would greatly benefit the reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The initial paragraph of the “Methods” section seems a bit off regarding punctuation marks and the order of sentences. While this does not directly hinder the understanding, it is a bit irritating during reading.

    There is some inconsistency regarding the use of abbreviations: FOD is introduced twice and is not introduced at the first appearance, ground truth is used multiple times after introducing GT as an abbreviation, … Please carefully doublecheck all abbreviations and their consistent usage.

    The figures are quite small (especially Fig. 1 and 4). While they still can be read it isn’t easy. I understand the space constraints. However, maybe some things can be shifted and the figures slightly enlarged.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method seems to provide a significant improvement to current state-of-the-art approaches. Especially the approach to reduce the problems connected to spatial misalignment provides a foundation for broad use in the pathological sector beyond the specific use case of virtual staining. The very strong evaluation, combined with the improvements and proof of concept of the strategies, makes this a valuable addition to the scientific community.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    No change in opinion. The other reviewers mentioned good points which were addressed by the authors. Overall I feel like the paper provides a valuable contribution to the MIC domain.




Author Feedback

We thank reviewers for constructive comments and we elaborate on comments below. R#1 Q1: We will clarify Fig.2 refer to R1,Q3 & R3,Q1 in revised version. Q2: We fully agree tissue loss is an inevitable issue. But it is highly likely that the corresponding patches in two stains share the same diagnostic label[7]. And we have involved a 20% tolerance of tissue loss when calculating protein aware loss in Eq.3. Q3: Clarifying PCLS: We pretrain a segmentation UNet on real IHC image and freeze it to extract tumor semantic, aiming to enhance the authenticity of generated tumor in PCLS. Initially, input IHC image is processed by UNet, yielding a seg probability map as output. Feature maps are then derived from the layer preceding the output layer. Using these feature maps and probability maps, we aggregate class-wise prototypes representing pixel-wise features of tumor and background (Eq.7). For semantic interaction, cosine similarity is computed leveraging image features with prototypes from another image (e.g. generated image features and ground truth prototypes) (Eq.8), referred to as cross-prototype. Finally, the similarity map applies softmax (Eq.9) to get probability map of tumor class for calculating loss (Eq.10). Q4: We have included pix2pix in results, and cyclegan performs worse than pix2pix in [8]. Q5: The real/fake classification would be a valuable additional test for our method. But our metrics have showed our method has strong diagnostic consistency on two datasets, which is core of virtual staining. Q6: Please refer to R4,Q1. Q7: Discussion on metric: We include PSNR and SSIM for the popularity in previous works but these have shortcoming: PSNR is calculated based on MSE, and SSIM, by default, involves sliding a 7x7 window for computation. Hence, they can be affected by spatial misalignment. Specifically, the blurry IHC images with more averaged pixel value may cause them to be inflated (Zhu et al, “Breast Cancer Immunohistochemical Image Generation: a Benchmark Dataset and Challenge Review”). In ablation, PALS and PCLS respectively amplify changes at tumor and pixel level. Such intense changes lead to PSNR and SSIM decrease. We want to emphasize our goal is to highlight tumor area precisely. To supplement this, we involve mIOD, IOD[12,15] to evaluate positive signal and Peason-R[9] to evaluate pathological correlation. These shows our method exhibits stronger pathological consistency in H&E-IHC virtual staining. R#3 Q1: Clarifying FOD estimation: We use traditional color deconvolution (Ruifrok et al, “Quantification of histochemical staining by color deconvolution”) for stain separation. Initially, we apply a logarithm to the IHC to obtain the OD image. Then, color deconvolution involves multiplying the OD image by the inverse of the OD matrix, yielding OD values of H, E, and DAB stains. We specifically select DAB stain’s OD values to generate the RGB image(IHC DAB). For focal OD, we simulate Eq.1 by converting IHC DAB to grayscale and using the focal calibrated map to assign gray values to positive signal(FOD). Limitation of method refer to R3,Q3. We will clarify these in revised version. Discussion of segmentation model: The seg model is UNet. Our PCLS drives tumor and background features to be orthogonal, so the model which feature has better inter-class orthogonality is beneficial. Considering IHC image highlights tumor area, we want to emphasize UNet has effectively acted as feature extractor refer to R1,Q3. Q2: Please refer to R1,Q7. Q3: Limitation of method: Our method is appropriate for all the DAB stained IHC image but not for AEC or other IHC stained image. In [11], many proteins such as ER,PR can be revealed by DAB. Due to test only on Her, we will mention it explicitly in revised version. R#4 Q1: Correction: We will clarify our method refer to R3,Q1 in revised version. Thanks again for constructive suggestions on expression and figures. Q2: Reproducibility: We will release our code and pretrained model.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have adequately address the reviewers’ concerns and hence, I recommend the paper for acceptance. Unfortunately, reviewers 1 and 3 did not update their scores post-rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have adequately address the reviewers’ concerns and hence, I recommend the paper for acceptance. Unfortunately, reviewers 1 and 3 did not update their scores post-rebuttal.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top