Abstract

Despite the strong prediction power of deep learning models, their interpretability remains an important concern. Disentanglement models increase interpretability by decomposing the latent space into interpretable subspaces. In this paper, we propose the first disentanglement method for pathology images. We focus on the task of detecting tumor-infiltrating lymphocytes (TIL). We propose different ideas including cascading disentanglement, novel architecture and reconstruction branches. We achieve superior performance on complex pathology images, thus improving the interpretability and even generalization power of TIL detection deep learning models.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3843_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3843_supp.pdf

Link to the Code Repository

https://github.com/Shauqi/SS-cVAE

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Has_SemiSupervised_MICCAI2024,
        author = { Hasan, Mahmudul and Hu, Xiaoling and Abousamra, Shahira and Prasanna, Prateek and Saltz, Joel and Chen, Chao},
        title = { { Semi-Supervised Contrastive VAE for Disentanglement of Digital Pathology Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a disentanglement approach specifically designed for pathology images, focusing on the task of detecting Tumor-Infiltrating Lymphocytes (TILs). The proposed method employs a cascade contrastive analysis technique that effectively separates and analyzes different elements within pathology images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The authors introduce the disentanglement method into the pathology image domain, which can be beneficial for the pathology task. 2.The proposed method achieves superior performance boost compared to the SOTA methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.Despite the noteworthy boost in performance, the novelty is slightly limited since many contrastive disentanglement models already exist. The paper seems to simply apply the contrastive disentanglement models to pathology images without other sufficient improvements. 2.Equation (1) needs to show more clearly how each loss is calculated, either by using the formula or to specify the loss to be used. 3.It should be explained in the “Introduction” what the disentanglement model is and what tasks the disentanglement model is specifically targeting in the pathology image. 4.The paper has limited readability, including the presence of figure and textual descriptions that are not easily understood. 5.It is better to add the references of comparative methods in Table 1.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors are suggested to emphasize the differences between the proposed method and the existing contrastive disentanglement models to highlight the contributions, as well as to improve the paper writing and figures in a more understandable way.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major concern is the limited novelty. The paper seems to simply apply the contrastive disentanglement models to pathology images without other sufficient contributions.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes novel semi-supervised variational autoencoder (VAE) for disentanglement of digital pathology images. The authors are right in claiming this to be the first application of disentanglement methods to complex pathology images with many different tissue structures. The method is in particular focused on the task of detection tumor-infiltrating lymphocytes (TILs). They are essential for cancer diagnosis and prognosis. The most important ingredient of proposed method is that it provides interpretable features (density of TILs) with increased generalization power. The method is validate convincingly on synthetic datasets constructed from two well-known datasets: TCGA BRCA and CoNSeP.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes original architecture for disentanglement of complex pathology images. The concept is based on semi-supervised learning of contrasting variational encoder (cVAE) structure. Thereby, labeling is on the patch level. While SOTA unsupervised disentangling methods work on simple images comprised of one object, proposed method works on complex pathology images comprised of multiple tissues with varying spatial support and morphology. That is achieved by construction of two-stage disentanglement process. At the first stage the VAEs are trained to distinguish patches containing cells from patches without cells. At the second stage VAEs are trained to distinguish patches with high-density TILs from patches with low-density TILs. That is the crucial invention that enables prediction of the TILs-density that is aimed downstream task important for cancer diagnosis and prognosis. It is shown in Table 2, that proposed methodology yields improved classification performance on downstream task showcasing that extracted features have increased generalization power.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Proposed semi-supervised of contrasting variational encoder for disentanglement of pathology images presumes pairs of manually labeled patches with cells and patches without cells in the first stage, and patches with high-density TILS vs. patches with the low-density TILs in the second stage. As authors described that is performed through copy-paste mechanism from TCGA BRCA and CoNSeP datasets. That appears to be a limiting factor and can hinder applicability of proposed methodology to related problems in pathology. While it is claimed that image reconstruction module based on IDGAN is another novelty of proposed methodology. I do not agree with that. Based on results in Table 2, proposed SS-cVAE method has better quality of reconstructed image than MM-cVAE, and that is method developed by the authors previously (the review is assumed to be double-blind, but authors identified themselves by citing their own work in ref. 20). On the other side, double Info-GAN has better image quality metric. Thus, there is no justification for claiming that proposed novelty brings improved quality of reconstructed image. It is however possible that FID metric is not the most appropriate one and authors should consider some alternative.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Authors did not provide access to code and that certainly limits reproducibility of their work. On the other side, detailed information on experimental data are provided. As a potential problem that can affect reproducibility I also find the values of hyperparameters in losses (1) and (2) in the supplement. In section 4, authors gave their values to be 10, but that was given without any background reasoning, discussion and/or analysis.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As a potential problem that can affect reproducibility I also find the values of hyperparameters in losses (1) and (2) in the supplement. In section 4, authors gave their values to be 10, but that was given without any background reasoning, discussion and/or analysis.

    While it is claimed that image reconstruction module based on IDGAN is another novelty of proposed methodology. I do not agree with that. Based on results in Table 2, proposed SS-cVAE method has better quality of reconstructed image than MM-cVAE, and that is method developed by the authors previously (the review is assumed to be double-blind, but authors identified themselves by citing their own work in ref. 20). On the other side, double Info-GAN has better image quality metric. Thus, there is no justification for claiming that proposed novelty brings improved quality of reconstructed image. It is however possible that FID metric is not the most appropriate one and authors should consider some alternative.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation is weak acceptance. The justifications for acceptance are: (1) novel semi-supervised variational autoencoder (VAE) for disentanglement of complex digital pathology images; (2) validation on problem of clinical relevance: extraction of interpretable feature related to density of tumor-infiltrating lymphocytes (TILs). That is important for cancer diagnosis and prognosis; (3) demonstration of increased generalization power of TILs feature. The reasons for weak acceptance: (1) It is claimed that image reconstruction module based on IDGAN is another novelty of proposed methodology. I do not agree with that. Based on results in Table 2, proposed SS-cVAE method has better quality of reconstructed image than MM-cVAE, and that is method developed by the authors previously (the review is assumed to be double-blind, but authors identified themselves by citing their own work in ref. 20). On the other side, double Info-GAN has better image quality metric.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    My original recommendation was weak accept. After rebuttal phase I changed my recommendation to accept. I justify my recommendation by the convincing arguments authors gave on my comments. In particular: (1) they provided explanation why the copy-paste mechanism is not a limiting factor of proposed method; (2) they explained why slightly inferior performance of their method in comparison with Double-Infogan in image reconstruction is not of high importance. The reason is that primary focus of their method in on latent space disentanglement.; (3) they provided explanation how hyperparameters were selected; (4) they promised to provide the source code after the paper’s acceptance.



Review #3

  • Please describe the contribution of the paper

    This paper introduces the first disentanglement method for pathology images, specifically for detecting tumor-infiltrating lymphocytes (TIL). By leveraging disentanglement models, which decompose the latent space into interpretable subspaces, the paper enhances the interpretability of deep learning models. Key innovations include cascading disentanglement, a novel architecture, and reconstruction branches. These ideas perform well on complex pathology images, improving both the interpretability and generalization power of TIL detection deep learning models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a novel method that diverges from conventional unsupervised disentangling approaches, offering improved performance in handling complex pathology images with multiple objects and intricate relationships. By utilizing supervised contrastive analysis to decompose latent space dimensions effectively, the method enhances interpretability and generalization. Its comprehensive approach encompasses disentangling discriminative factors from common factors, isolating density-related factors, and employing GAN-based reconstruction using disentangled latent representations, ensuring a holistic treatment of disentanglement and reconstruction in pathology image analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The paper lacks comprehensive ablation experiments to validate the effectiveness of key innovations. Evaluating the necessity of two-step disentanglement process and choosing different reconstruction methods would enhance credibility. (2) Abstract and Conclusion sections could be clearer regarding the data flow of the proposed method to improve reader comprehension. (3) Could you explain more about why this method is called semi-supervised?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) Add an ablation study (2) Enhance the writing quality in the Abstract and Conclusion sections.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The topic of disentangling digital pathology images is highly interesting. The experiments encompass a lot of quantitative and qualitative evaluations. Notably, the visualization of Latent Space Interpolation effectively illustrates the disentanglement within the latent space.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    (1) They mentioned that more ablation studies (two-step disentanglement and reconstruction method) will be added in the future version. (2) They explained why this method is semi-supervised.




Author Feedback

We appreciate all reviewers’ constructive comments. They found our approach highly interesting (R5) and novel (R1, R5), generating interpretable features (R1). They regarded it as one of the first applications of the disentanglement method on complex pathology images (R1, R4), achieving significant performance (R1, R4, R5). We elaborate on their comments below. They made suggestions to perform modifications to enhance readability (R4). We appreciate their suggestions for the writing and display; we will address them in the revised version.

Q1. The copy-paste mechanism is a limiting factor (R1). A: The copy-paste mechanism optimizes the model for disentanglement by simulating paired background/foreground data. We don’t find it limiting; it works as long as we have sample patches of different types, and a reasonable nuclei segmentation model, e.g., Hovernet [8]. Our approach allows easy generalization to other cancers and other tissue types.

Q2. IDGAN as a novelty does not perform as well as Double-Infogan in reconstruction (R1). A: The primary focus of the paper is on latent space disentanglement rather than reconstruction. Although SS-cVAE using IDGAN does not surpass double Info-GAN in terms of FID score (the difference is marginal), as shown in Tab. 2, it significantly outperforms double Info-GAN in disentanglement quality, as measured in silhouette score [5, 20] in Tab. 1. Qualitative results (Fig. 4) indicate the same: double info-GAN can confuse different conditions (cell vs. no-cell, high TIL vs. low TIL).

Q3. Selection of hyperparameter values (R1). A: We empirically selected λ_1 and λ_2 in the loss (Eqs. 1 and 2 in the supplementary). For each of them, we tested different values ranging from 10^(-3) to 10^3 with steps of multiples of 10 on the BRCA dataset and selected the one with the best validation performance.

Q4. Limited novelty due to existing contrastive disentanglement (CD) models (R4). A: Applications of existing CD models to our problem are not trivial. Existing methods are restricted to single-object natural images. In our setting, as R5 commended, we have “complex pathology images with multiple objects and intricate relationships.” To address the challenge, we propose different contributions, including a cascade architecture (see Q6 below), a copy-paste mechanism (Q1), and IDGAN (Q2). Collectively, these innovations significantly enhance the framework’s practical applicability, as demonstrated in Tab. 1 and Fig. 4.

Q5. Details on Eq. (1) (R4). A: We already provided a derivation of Eq. (1) in supplementary. We will move it to the main paper.

Q6. Ablation studies: two-step disentanglement and reconstruction method (R5). A: Good idea. Indeed, we have tested the one-step disentanglement with three labels (no cell, low TIL, and high TIL). The result is not satisfactory. Separating the three labels involves different types of factors. Separating no cell from the other two labels is necessary to capture the presence of cells. Separating low and high TIL requires closer attention to cell morphology/spatial arrangement. It is much harder to learn all these in one step. We have also compared IDGAN with the VAE decoder. IDGAN has much better reconstruction quality. This ensures that the latent representation is well preserved while being disentangled. These ablation studies were actually done before we submitted the paper. We did not include them due to the page limits of the paper (8 pages) and the supplementary (2 pages). We will make sure to include them in the final version.

Q7. Why is the method semi-supervised (R5)? A: The term semi-supervised was used because the model is trained on synthetic data generated by the copy-paste mechanism. However, we realized this may not be proper because the synthetic data is generated using labeled data. We will remove the word in the final version.

Q8. Reproducibility (R1, R4, R5). A: We will provide the code and preprocessed data upon acceptance of the paper.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    NA

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NA



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper uses a VAE approach to disentangle the concepts of cell density from background.

    Disentanglement applied to histopathology images is a difficult task as each image contains multiple instances. The novelty in the paper centres around the use of a 2 stage process to train the VAE for TIL density estimation but it is not clear how the proposed disentangling approach aids interpretability. The validation experiments do not convince me that a useful disentanglement has been achieved. The typical way of demonstrating success is to show the effect of changing the amplitude of one latent vector on a sequence of images - here this could be done by showing a sequence where the number of TILs increased.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper uses a VAE approach to disentangle the concepts of cell density from background.

    Disentanglement applied to histopathology images is a difficult task as each image contains multiple instances. The novelty in the paper centres around the use of a 2 stage process to train the VAE for TIL density estimation but it is not clear how the proposed disentangling approach aids interpretability. The validation experiments do not convince me that a useful disentanglement has been achieved. The typical way of demonstrating success is to show the effect of changing the amplitude of one latent vector on a sequence of images - here this could be done by showing a sequence where the number of TILs increased.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I think the meta reviewers have different opinions about this paper. After reading the paper, reviews, and rebuttal, I would lead toward acceptance. First of all, it is might not fair to reject it once two reviewers render A decisions. Second, even the real practical value might be questionable, the idea of disentanglement might lead to interesting discussion at MICCAI.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I think the meta reviewers have different opinions about this paper. After reading the paper, reviews, and rebuttal, I would lead toward acceptance. First of all, it is might not fair to reject it once two reviewers render A decisions. Second, even the real practical value might be questionable, the idea of disentanglement might lead to interesting discussion at MICCAI.



back to top