Abstract

Histopathological samples are typically processed by formalin fixation and paraffin embedding (FFPE) for long-term preservation. To visualize the blurry structures of cells and tissue in FFPE slides, hematoxylin and eosin (HE) staining is commonly utilized, a process that involves sophisticated laboratory facilities and complicated procedures. Recently, virtual staining realized by generative models has been widely utilized. The blurry cell structure in FFPE slides poses challenges to well-studied FFPE-to-HE virtual staining. However, most existing researches overlook this issue. In this paper, we propose a framework for boosting FFPE-to-HE virtual staining with cell semantics from pretrained cell segmentation models (PCSM) as the well-trained PCSM has learned effective representation for cell structure, which contains richer cell semantics than that from a generative model. Thus, we learn from PCSM by utilizing the high-level and low-level semantics of real and virtual images. Specifically, We propose to utilize PCSM to extract multiple-scale latent representations from real and virtual images and align them. Moreover, we introduce the low-level cell location guidance for generative models, informed by PCSM. We conduct extensive experiments on our collected dataset. The results demonstrate a significant improvement of our method over the existing network qualitatively and quantitatively. Code is available at https://github.com/huyihuang/FFPE-to-HE.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3335_paper.pdf

SharedIt Link: https://rdcu.be/dV1Vg

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72384-1_7

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Hu_Boosting_MICCAI2024,
        author = { Hu, Yihuang and Peng, Qiong and Du, Zhicheng and Zhang, Guojun and Wu, Huisi and Liu, Jingxin and Chen, Hao and Wang, Liansheng},
        title = { { Boosting FFPE-to-HE Virtual Staining with Cell Semantics from Pretrained Segmentation Model } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {67 -- 76}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors propose a framework for boosting FFPE-to-HE virtual staining with cell semantics from pre-trained cell segmentation models (PCSM), which can effectively capture the high-level and low-level semantics of real and virtual images to improve the virtual staining quality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The concern of generating blurry slides image (FFPE) to clear vHE is great.
    • The ideal of integrating high-level and low-level in the training phase is interesting.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors only align the encoder part of the generative model, which is a major limitation. Aligning only the encoder and not the decoder part is not explained, as it fails to ensure the overall quality and fidelity of the synthesized images.
    • The lack of downstream task such as comparison between the segmentation masks of real and synthesized images is a significant weakness. Without such a comparison, the authors cannot effectively evaluate the usefulness of the generated images for downstream tasks.

    • The absence of ground truth labels for the H&E stained images is a shortcoming but acceptable. The authors’ decision to not train a segmentation network, as suggested in [1], is a missed opportunity to provide a more comprehensive evaluation of the generated images.
    • Concatenating the low-level guidance map (m) to both the real (y) and synthesized (y^) images raises concerns about the efficacy of this approach. Providing the same guidance for both inputs for discriminator could be considered redundant and may not effectively improve the quality of the synthesized images. The authors’ explanation for why this approach is effective is insufficient and lacks a thorough justification.

    [1] Bao, S., Lee, H. H., Yang, Q., Remedios, L. W., Deng, R., Cui, C., … & Huo, Y. (2023, April). Alleviating tiling effect by random walk sliding window in high-resolution histological whole slide image synthesis. In Medical Imaging with Deep Learning.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • In pix2pixHD, is a multi-layer discriminator used?
    • What are the fundamental differences between the two datasets, aside from the patch size?
    • Are there any potential batch effects in the collect dataset?- -
    • Since some of the results are very close, statistical analysis would be ideal.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of Comprehensive Model Evaluation Limitations in Dataset Characterization Insufficient Statistical Analysis

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors did a great job addressing most of my comments. I have adjusted the score accordingly.



Review #2

  • Please describe the contribution of the paper

    The blurry cell structure in FFPE slides poses challenges to well-studied in FFPE-to-HE virtual staining. Most existing researches overlook this issue. This paper proposes a framework for boosting FFPE-to-HE virtual staining with cell semantics from pre-trained cell segmentation models (PCSM). That mainly utilizes PCSM to extract multiple-scale latent representations and low-level cell location guidances for generative models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The paper is based on the assumption that Understanding the semantics of cell structures is critical for mitigating the challenges posed by their blur.This is a new perspective for solving virtual staining.

    1. The paper proposes a framework to simultaneously introduce the low-level semantic and the high-level semantic of cells into the cGAN without expert annotation.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.Please provide a more detailed description of the dataset source 2.Is drawing a slash on FFPE a common academic expression? 3.Wics , How to take this weight value, please clarify. 4.The structure of the Quantitative Evaluation section is quite confusing. It is recommended to place “As shown in Table 1, the addition of CSLoss significantly improves the…” after the paragraph “Benchmark Results.” 5.The order of references should be from small to large. For example, “Fox example, Asaf et al., [1] applied DCLGAN [4] on unstained skin issue and compared its performance with CycleGAN [14] and CUT [9]” 6.Does cell segmentation help improve the quality of image generation in cell regions for Virtual Staying images? Please explain. 7.CSLoss is an obvious application similar to deep supervision in segmentation networks.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.Please provide a more detailed description of the dataset source 2.Is drawing a slash on FFPE a common academic expression? 3.Wics , How to take this weight value, please clarify. 4.The structure of the Quantitative Evaluation section is quite confusing. It is recommended to place “As shown in Table 1, the addition of CSLoss significantly improves the…” after the paragraph “Benchmark Results.” 5.The order of references should be from small to large. For example, “Fox example, Asaf et al., [1] applied DCLGAN [4] on unstained skin issue and compared its performance with CycleGAN [14] and CUT [9]” 6.Does cell segmentation help improve the quality of image generation in cell regions for Virtual Staying images? Please explain. 7.CSLoss is an obvious application similar to deep supervision in segmentation networks.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The blurry cell structure in FFPE slides poses challenges to well-studied in FFPE-to-HE virtual staining. Most existing researches overlook this issue.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    “Deep supervision applies supervision at both the middle and final layers to help the network learn more useful features at the middle stages. CSLoss, simply and effectively, leverages the cell semantics of PCSM to provide supervision at the final layer.”There is no evidence of its effectiveness in the experiment.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a Conditional Generative Adversarial Network for virtual staining, from FFPE to HE. While the basis network that they use (Pix2PixHD) uses as conditioning image for the discriminator the original image (FFPE in this case), the authors propose to use the result of a pretrained cell segmentation network. Thus, the segmentation of the real HE image is used as conditioning image for the discriminator. They also propose to use, as an additional loss for the generator, the distance between the features obtained in the encoder part of the segmentation network by the real HE image and by the virtual HE image at the output of the generator. The results presented show superior performance compared to pix2pixHD.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is clear, the method is simple and the results are useful. The authors claim that the code will be publicly available, but it could also be easily reproduced with the details provided.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The results are not compared to other state of the art virtual staining methods, such as those referred by the same authors (references [1], [2] or [4] for example). So it is not possible to judge if it represents an advance over the state of the art. It is not justified either the advantage of using the CSLoss proposed (distance between features in the segmentation network) with respect to the L1 loss between real and virtual HE, as it is proposed in the original Pix2Pix paper. This should be also included in the ablation study.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It would also be interesting to have the result without the adversarial loss (only L1 loss at the output of the generator to be able to appreciate the need of the adversarial and segmentation networks. Some errors in the text:

    • In 2.2, “where x is the semantic label map and…” is not correct. In the references provided x is the image at the input of the generator. It is a semantic label map only when the task is generation of natural images from label maps.
    • Equation (5) is not correct. If arg function is used, the result is not a Loss. It is not clear, in Section 3.2 why one-fifth of the training dataset (1075 pairs) is used for testing. In 3.1 the test dataset consists of different 1399 pairs. There are many spelling and grammatical errors which should be corrected.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall score of “weak accept” for this paper was influenced by its clear presentation and the innovative use of a pretrained cell segmentation network in a Conditional Generative Adversarial Network for virtual staining, which demonstrated superior performance compared to the baseline model. However, the lack of comparisons with other state-of-the-art virtual staining methods and insufficient justification for the proposed CSLoss limited our ability to fully evaluate the advancement over existing techniques. Additionally, several textual inaccuracies and unclear experimental setups contributed to the decision for a weak acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate reviewers’ positive comments on the novelty and superior performance of our method. Below, we clarify main issues of reviewers.

Common Questions CQ1: Details of datasets. The two datasets were obtained from the axillary lymph node resection specimens of breast cancer from two hospitals, respectively. All patches were registered at 40x magnification. Further details will be provided in the final version. CQ2: Cell region improvement. Cells are the primary structure in HE images. Our results outperform baselines at both SSIM and MS-SSIM, which are measures of structural similarity. Besides, we had assumed masks of HE images generated by Cellpose as ground truth and observed a higher IOU of our method than baselines. CQ3: Writing suggestion. We appreciate the suggestions and promise to thoroughly revise the manuscript.

Response to R3 Q1: Details of dataset A1: Please refer to CQ1. Q2: Weight values of CSLoss A2: We use the first four weights of VGGLoss for CSLoss, which is clarified in Section 2.3. Q3: Cell region improvement A3: Please refer to CQ2. Q4: CSLoss is similar to deep supervision A4: Deep supervision applies supervision at both the middle and final layers to help the network learn more useful features at the middle stages. CSLoss, simply and effectively, leverages the cell semantics of PCSM to provide supervision at the final layer. Q5: Writing suggestion. A5: Please refer to CQ3.

Response to R4 Q1: Fundamental difference between datasets A1: CQ1 has provided more details. The fundamental difference is the data collection process, such as scanners and staining procedures of different hospitals. Q2: Any potential batch effects in the datasets A2: Yes, Factors such as data processing at various hospitals and varying train-test data splits. These prompted us to conduct the external validation. Q3: Statistical analysis for close results A3: The test and external validation results have demonstrated the effectiveness of our method. We will address statistical analysis for better clarity in future work. Q4: Only align the encoder of PCSM A4: Firstly, the decoder performance relies heavily on the high-quality features of the encoder. Besides, the decoder blocks’ output integrates their own features with those from the encoder blocks through skip connections. Aligning two sets of features simultaneously is more challenging, which was found unstable for training in early experiments. We are exploring potential ways to better utilize the decoder. Q5: Is a multi-scale discriminator used? A5: It’s used in default pix2pixHD. Q6: Downstream task. Why not use a pretrained model A6: Please refer to CQ2 first. We didn’t report the results since the Cellpose-generated mask differed somewhat from the ground truth. We are planning to conduct downstream tasks with ground truth. Q7: The same guidance for both inputs to D could be redundant A7: It’s natural that the guidance for virtual images helps D identify fakes, while the guidance for real images also helps D identify real ones more certainly. Actually, for the label-to-image task, the same label guidance for both inputs to D is also adopted in pix2pixHD.

Response to R5 Q1: Comparison to other SOTAs A1: Most studies on FFPE-to-HE virtual staining use unaligned cGANs, which typically perform inferiorly to aligned cGANs. Other studies are based on pix2pix, which has been compared. Other aligned cGANs like SPADE or OASIS are unsuitable for our task due to the demand for pixel category information. Q2: Advantage of CSLoss A2: L1 loss directly aligns every pixel and tends to incentivize a blur, as suggested in pix2pix. In contrast, CSLoss aligns high-level cell semantics and is advantageous in preserving structural details and robustness to noise (e.g., blots). Better performance of CSLoss than L1 loss is found in early experiments. Q3: Research w/o adversarial loss A3: This idea is constructive and worth further exploration. Q4: Writing suggestion A4: Please refer to CQ3.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper recieves three weak accept after rebuttal. Reviewers acknowledge the paper’s innovative integration of high-level and low-level cell semantics into the staining process, which has demonstrated improvement in the clarity and quality of virtual stains. Despite some concerns regarding the methodological execution and lack of comprehensive model evaluations, the authors have adequately addressed major concerns through their rebuttal, particularly clarifying the dataset details and the effectiveness of CSLoss. The overall consensus among the reviewers suggests that the strengths and potential impact of the proposed methods outweigh the initial reservations, warranting acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper recieves three weak accept after rebuttal. Reviewers acknowledge the paper’s innovative integration of high-level and low-level cell semantics into the staining process, which has demonstrated improvement in the clarity and quality of virtual stains. Despite some concerns regarding the methodological execution and lack of comprehensive model evaluations, the authors have adequately addressed major concerns through their rebuttal, particularly clarifying the dataset details and the effectiveness of CSLoss. The overall consensus among the reviewers suggests that the strengths and potential impact of the proposed methods outweigh the initial reservations, warranting acceptance.



back to top