Abstract

The assessment of HER2 expression is crucial in diagnosing breast cancer. Staining pathological tissues with immunohistochemistry (IHC) is a critically pivotal step in the assessment procedure, while it is expensive and time-consuming. Recently, generative models have emerged as a novel paradigm for virtual staining from hematoxylin-eosin (H&E) to IHC. Unlike traditional image translation tasks, virtual staining in IHC for HER2 scoring requires greater attention to regions like nuclei and stained membranes, informed by task-specific domain knowledge. Unfortunately, most existing virtual staining methods overlook this point. In this paper, we propose a novel generative adversarial network (GAN) based solution that incorporates specific knowledge of HER2 scoring, i.e., nuclei distribution and membrane staining intensity. We introduce a nuclei density estimator to learn the nuclei distribution and thus facilitate the cell alignment between the real and generated images by an auxiliary regularization branch. Moreover, another branch is tailored to focus on the stained membranes, ensuring a more consistent membrane staining intensity. We collect RegH2I, a dataset comprising 2592 pairs of registered H&E-IHC images and conduct extensive experiments to evaluate our approach, including H&E-to-IHC virtual staining on internal and external datasets, nuclei distribution and membrane staining intensity analysis, as well as downstream tasks for generated images. The results demonstrate that our method achieves superior performance than existing methods. Code and dataset are released at https://github.com/balball/TDKstain.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3227_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/balball/TDKstain

Link to the Dataset(s)

https://github.com/balball/TDKstain

BibTex

@InProceedings{Pen_Advancing_MICCAI2024,
        author = { Peng, Qiong and Lin, Weiping and Hu, Yihuang and Bao, Ailisi and Lian, Chenyu and Wei, Weiwei and Yue, Meng and Liu, Jingxin and Yu, Lequan and Wang, Liansheng},
        title = { { Advancing H&E-to-IHC Virtual Staining with Task-Specific Domain Knowledge for HER2 Scoring } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a framework for IHC virtual staining from HE slides including domain knowledge, such as nuclei distribution and membrane staining intensity, in the training strategy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • overall good writing
    • interesting idea
    • good comparison with SOTA + ablation studies
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • results improvements are not so clear
    • some metrics don’t seem the most appropriate
    • no testing on public datasets (such as BCI, MIST, AIDPATH, …)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • In Figure 1, it’s not clear which methods are used in section (a) and the step sequence is also not very clear (what is the input and output of each step?). Also, the abbreviations used in the figure should be clarified in the caption.

    • If typically samples are not paired and don’t exactly correspond, shouldn’t this be taken into account in the training strategy, such as, using a more robust GAN loss?

    • “It can be observed that our method achieves the best mapping from H&E to IHC across all HER2 expressions. This superiority..” The results quality look very close to the PyramidPix2Pix.

    • There are several public datasets available with pair of H&E/IHC (either tiles or slides) that should be used for better testing.

    • I suggest including a classification task from generated images as part of the evaluation (if possible, with public a public dataset(s)). Also, a qualitative testing on real/fake images would strengthen the analysis.

    • For staining intensity classification, the improvement seems only marginal.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written and the idea is interesting. However, the improvements in results don’t seem very significant. Also, there is no testing on public datasets (when there are some available).

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presented a new GAN based framework for generating HER2 IHC images from corresponding H&E images by leveraging two auxiliary branches for containing nuclei density regularization and membrane staining enhancement. The author demonstrated superior performance comparing to 5 previous published methods on a dataset including 2592 1024 by 1024 image patches (1992 for training, 600 for validation) and an external dataset of 285 patch images. Also, a validation dataset on downstream task was performed on the external dataset. The manuscript is well written and easy to follow.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strength of this paper is the method and task. The author leveraged the knowledge and incorporated the knowledge for HER2 scoring to the model design. Although the validation part needs to be improved (will elaborate in the weakness of the paper), the validation is comprehensive in general. It includes image level validation in both internal and external dataset (used different antibody) and downstream tasks (nuclei counts and membrane area estimation). The head-to-head comparison experiments to previous methods are appreciated. The structure of the paper is clear, and the figures are very helpful to understand the content.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Although the method is smart to incorporate the HER2 scoring knowledge to the model design, it involves a lot processing steps to obtain the middle step images to train the GAN models (e.g., color deconvolution, thresholding etc.). Many of those methods need to set hyper-parameters, which make the method less generic and more difficult to implement. Validation: The validation was performed on a very small sample size (i.e., 600 and 285 patches), which make the results much less convincing. Importantly, a detailed description for the dataset is missing, which makes it difficult to interpreter the results. A few validation measurement might not provide the most informative results:

    1. For the Task-specific Domain Knowledge Analysis, the author varied threshold for M and computed the ratio for M. I would think computing the DICE between the real IHC M and generated IHC M provide more accurate measurement. This is because not only the intensity of the membrane stain is useful, the spatial location for membrane pixels is also important (e.g., the completeness of the membrane stain is one factor for HER2 scoring).
    2. Although an hold-out external dataset was used, all the experiments are single run validation without standard deviation/confidence intervals. Considering that GAN models are notorious unstable and the superior performance are marginal comparing to some methods, it would be better if the standard deviation/confidence intervals were provided .
    3. For the classification takes for HER2 scoring, it would be better if the pre-trained ResNet50 was trained using the real IHC instead of the generated IHC. Thus, the results show better performance on the virtual IHCs that are more similar to the real IHCs.
    4. The results in Table 2 miss the data distribution information (i.e., samples for each HER2 score).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. I suggest the author to provide a detailed description for the data (internal and external) they used. Specifically, the description should include sample size at case and slide level, the SOP protocol for tissue processing, scanner brand and model number, pixel spacing.
    2. Provide updated results in Fig.3 measuring M by DICE.
    3. Provide a description for image registration for H&E and IHC.
    4. Provide results break down by HER2 score in Table 2. For example, the accuracy for HER2 1+
    5. Provide a description for how the grand truth for HER2 score was generated, especially for HER2 2+. Was a FISH test performed?
    6. For the Task-specific experiment, provide a description for how the nuclear count was evaluated (for both the ground truth and the generated results).
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the paper has its weaknesses, the method is interesting which leverages the scoring knowledge. If the author can provide the descriptions based on my suggestion, the paper would be better. I would like to see this work to be used for other IHC stains in the future work. Recommendation: accept

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a new model for virtual staining, putting its focus on spatial consistency. Spatial features of the slides contain important features when analyzed by humans but are currently often not translated during virtual staining. Translating nuclei information via density maps and membrane staining with a GAN-based approach, the proposed method is able to outperform current virtual staining methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is the novel approach presented, which is able to outperform the current state-of-the-art methods for virtual staining. Besides the technical evaluation, the results are vividly presented in Figure 2, showing a clear improvement to the current methods.

    Keeping nuclei and membrane details in visually stained slides consistent with their corresponding H&E image transfers information vital in human analysis like no other approach before. This moves virtual staining one step closer to applicability in diagnostics. The approach is then also tested as a foundation for a Classification Model, making predictions on real and virtually stained slices. From all presented slices, the new approach is only put second by the original slices, demonstrating a certain degree of superiority.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of the paper is its low reproducibility in the current state. While the authors claim to publish code and datasets, the available data is not sufficient to reproduce the approach or results from the paper. Especially the information about the datasets is problematic. There is no information about the origin of the data. If publicly available data was used, it should be linked. If nonpublic data were used, this research most likely would be in need of ethics approval, which was not mentioned in the text.

    Besides the low reproducibility, only minor weaknesses can be found in the paper, which can be resolved.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The abstract states the release of code and dataset. However, the current state is not reproducible.

    The information about the used dataset especially is very scarce. Please consider adding links to data sources if applicable or describe the origin of the data in any way.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In the Abstract, GAN is only used once. Therefore, there is no need to use an abbreviation in the Abstract. The introduction of the abbreviation at the end of the Introduction section is sufficient.

    The 3th paragraph of the introduction is difficult to understand (“Different stained images from…”). Sentences like “, thus resulting in a lack of nearly perfect pixel-level registration between image pairs” are overly complex. Please consider rewriting them in a more straightforward manner, e.g., “thus a perfect pixel-level registration between image pairs is not possible.”

    Minor layout errors need to be addressed. Some new paragraphs don’t start with indentation (e.g., page 5, last paragraph before Section 3).

    In Section 3, subsection “Comparison with State-of-the-arts.” the claim “evidenced by more convincing results in the data distribution and human perception.” needs proof. While Figure 2 seems to show this to some degree, experts need to be consulted. If they were, please add information about the experts (e.g., 3 certified pathologists with 10 years of work experience). If no experts were asked and this is the opinion of the authors, please state it in a way that it is a theory that needs further research.

    Figure 3 is not readable (without heavy zooming). The legends and subscriptions are too small. The lines in the plots are too close to each other/also too small to be readable.

    Comparison with state of the art not clear (in Section 3). Why is the lower SSIM score justified by the inconsistency between the real H&E and IHC images? Shouldn’t this affect the other state-of-the-art models as well?

    Also in Section 3,” Task-specific Domain Knowledge Analysis”. Here, the nuclei density results are only compared regarding how many images regenerate the nuclei density without (/with minimal error). If possible, with the current data, please also add a more generic metric, e.g., the mean error of all density maps.

    While the “Analysis on External Data” is a great addition and using a downstream classification task is a well-selected use case to really demonstrate the use of the approach, there is one minor problem with the text regarding the analysis, namely: “Notably, the original H&E-stained images lead to poor performance for the classification task due to the disability of measuring membrane staining intensity.” With the current scope of evaluation, this can only be an assumption. There is no mention of explainable AI/analyzing what was used by the model to make the determination. While this is “the” logical explanation from a domain expert (medical) standpoint, this is not proven.

    Table 2: Please consider highlighting the best results in all columns and not only the first two for consistency. Further, it should state that the best results based on generated IHC staining slides are marked, and the model on the real IHC slides must be marked bold.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the approach is sadly not reproducible in the current state, I think it is a very valuable improvement to the current state-of-the-art models. Therefore, it should be shared with the scientific community. I hope the code will be provided, making it usable for all of us and adding its value to a broad range of upcoming research projects.

    While it would be a loss if the paper is published and the code is never provided, it would be a greater loss if the approach was never published.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

N/A




Meta-Review

Meta-review not available, early accepted paper.



back to top