Abstract

Skin cancer diagnosis relies on assessing the histopathological appearance of skin cells and the patterns of epithelial skin tissue architecture. Despite recent advancements in deep learning for automating skin cancer detection, two main challenges persist for their clinical deployment. (1) Deep learning models only recognize the classes trained on, giving arbitrary predictions for rare or unknown diseases. (2) The generalization across healthcare institutions, as variations arising from diverse scanners and staining procedures, increase the task complexity. We propose a novel Domain Adaptation method for Unsupervised cancer Detection (DAUD) using whole slide images to address these concerns. Our method consists of an autoencoder-based model with stochastic latent variables that reflect each institution’s features. We have validated DAUD in a real-world dataset from two different hospitals. In addition, we utilized an external dataset to evaluate the capability for out-of-distribution detection. DAUD demonstrates comparable or superior performance to the state-of-the-art methods for anomaly detection.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3110_paper.pdf

SharedIt Link: https://rdcu.be/dY6ie

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72083-3_6

Supplementary Material: N/A

Link to the Code Repository

https://github.com/cvblab/DAUD-MICCAI2024

Link to the Dataset(s)

N/A

BibTex

@InProceedings{P._Domain_MICCAI2024,
        author = { P. García-de-la-Puente, Natalia and López-Pérez, Miguel and Launet, Laëtitia and Naranjo, Valery},
        title = { { Domain Adaptation for Unsupervised Cancer Detection: An application for skin Whole Slides Images from an interhospital dataset } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {58 -- 68}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a novel OOD-detection method for detecting abnormalities (malignancy) in skin cancers and test the performance across three datasets. Their approach involves using PLIP embeddings fed into an autoencoder to detect benign and malignant WSIs – termed DUAD. In addition, the authors show that adding a domain-specific latent stochastic variable helps improve OOD/malignancy detection. They find that their method outperforms competitor methods in the literature.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    In general, the approach they take is interesting and well-suited to the problem. The long tail of diagnoses makes a pure-classification malignancy detector a challenging prospect, and an OOD-detection method will improve performance. The experiment is well-suited to test their hypotheses. The ablation study is helpful for interpretation. The manuscript is well-written, and sharing code is also of benefit to the reader.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Major weaknesses The main weakness of this manuscript is the evaluation of the method rests on a single experiment, and the details of the experiment are not entirely clear. A more detailed description of the experimental setup is necessary to understand this experiment, and the claims of methodological superiority require additional exploration beyond a single use case. Experimental detail: the description of each dataset should be expanded significantly, including covering the overall different histologies available, scanner, resolution, tissue of origin, etc. Results: fundamentally it is challenging to show the superiority of a method using a single experiment. I recommend adding a second experiment to build confidence that there is not some confound in the experiment driving the results.

    Minor weaknesses

    • Performance is relatively weak in hospital A (AUROC < 0.7). This is challenging to interpret.
    • Average pooling across PLIP embeddings may be why performance is relatively weak, especially if abnormalities are sparse within larger tissue samples.
    • Generally, it is not clear that hospital is the correct label for the latent variable. Single hospitals may have multiple scanners, prep processes (frozen or FFPE, etc).
    • Spitzoid tumors and CSC neoplasms come from different organ sites – this is a very large domain shift, it may help to show that this shift is accurately captured or that subtler shifts are indeed subtler by the model.
    • More details are needed on whether the method is evaluated on H&E or other stains
    • The tSNE plots do not add much information
    • The authors report a 95% CI over 10 replicates - how is this CI generated? Would min-max be more informative?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see weaknesses. Specifically, it would be helpful to demonstrate the method in an additional experimental setting; as well as expanding the detail around the experiment performed here.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The central claims can’t be substantiated with the results shown.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I thank the authors for their response to review. I appreciate that the authors have provided additional clarification around methods and cancer details. I do think the claims are not entirely supported by the evidence and would recommend that the area chairs consider whether the claims are adequately supported by the single experiment. If so, it is OK to accept.



Review #2

  • Please describe the contribution of the paper

    This paper develops a domain adaptation method for unsupervised cancer detection (DAUD) with skin whole slide images. DAUD consists of a self-supervised feature extraction and a domain adaptation unsupervised cancer detection model. The method has an autoencoder structure that is equipped with the normally distributed latent variables associated with different institutions. In this way, the DAUD is generalized to fit to data from different institutions. The developed method can differentiate benign from anomaly group (i.e. malignant and out-of-distribution) sample. It is well validated with three datasets on skin cancers and carefully compared with multiple state-of-the-art methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this paper is 1) the design of institution dependent latent variables characterizing institution specific properties, and 2) the extensive comparison experiments with a large set of state-of-the-art methods on three datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of some method details is noticed, e.g. it is not clear how to estimate the mean and co-variance of the multi-variate normal distribution that institution specific embedding follows. Additionally, it is not clear what are the decision rules in DAUD that determine normal from anomaly sample group.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper develops DAUD, i.e. a domain adaptation method for unsupervised cancer detection with skin whole slide images. This pipeline partitions a WSI into patches, uses PLIP to extract each image patch embedding of 512 dimensions, average the image patch features as the global WSI representation, and provide it to an autoencoder structure that is equipped with the stochastic latent variables representing different institutions. For method efficacy demonstration, a set of extensive comparison experiments is presented with a large set of state-of-the-art methods on three datasets. The design of institution dependent latent variables characterizing institution specific properties within an autoencoder structure is interesting. Additionally, extensive comparison experiments are well conducted. However, some method details are absent from the paper. It is not clear why institution specific distribution is normal. It is not clear why the covariance matrix is diagonal. It is not clear how to estimate the mean and covariance matrix. An EM algorithm might be followed for parameter estimation. However, this is not described in the paper. It is not clear what are the decision rules DAUD follows to determine normal from anomaly sample group. It is not clear how Fig. 5, 6, 7 in section 3.4 “Visualizations” help justify the superiority of DAUD to the auto-endcoder structure. (Minor): This paper needs proofreading, e.g. “We hypothesize that both suppose a concept shift, so the reconstruction error must be higher.” on page 2 and “Fig. 5 shows that the posterior distribution of d for the two first dimensions is shifted,…” on page 7, among others.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary recommendation factors include the interesting model architecture, extensive comparison experiments, and lack of method details (e.g. decision rules, normality distribution assumption, and parameter estimation).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My questions are partially addressed.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a novel Domain Adaptation method for Unsupervised cancer Detection (DAUD) to address the issues in skin cancer diagnosis, particularly in dealing with rare or unseen diseases and variations across healthcare institutions. By employing an autoencoder-based model with stochastic latent variables, DAUD effectively detects malign and unseen Whole Slide Images (WSIs) while reducing covariate shift between centers. The study validates DAUD using real-world datasets from two different hospitals and demonstrates comparable or better performance to state-of-the-art methods for anomaly detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel Domain Adaptation Method (DAUD): The paper introduces DAUD, a novel domain adaptation method for unsupervised detection of malignant and unseen Whole Slide Images (WSIs). DAUD effectively addresses challenges in recognizing rare diseases and generalizing across healthcare institutions, crucial for clinical deployment.

    2. Comprehensive Experimental Validation: The study conducts experiments on real-world datasets from two hospitals, offering robust evaluation. External validation using the SOPHIE dataset demonstrates DAUD’s generalization capability.

    3. Performance: DAUD performs well with respec to the state-of-the-art methods for anomaly detection, particularly in malignant and out-of-distribution detection. Quantitative results, including AUC metrics, support that.

    4. Insightful Ablation Study and Visualizations: The ablation study and visualizations provide valuable insights into DAUD’s effectiveness and behavior, showcasing its robustness to covariate shift.

    Overall, the paper presents a novel approach to domain adaptation in cancer detection, supported by rigorous assess of performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Further clarification on how DAUD could distinguishe between benign, malignant, and out-of-distribution samples would enhance understanding.

    A more detailed exploration of potential limitations and future directions could strengthen the paper’s impact.

    Better clarify the data preprocessing steps and model hyperparameters, would enhance reproducibility.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See above

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See above

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We deeply thank the reviewers for their insightful comments. It is truly gratifying to see that they find this work rigorous (R1), extensive (R2), interesting and well-suited to the problem (R5). Below is our detailed response to the comments.

Experiment details (R1, R5). Due to space constraints, we couldn’t encompass all the specified information in the paper (resolution, scanner, stains, prep steps, etc.). However, to provide a more detailed description, we will clarify that each institution scanned H&E WSIs using a standardized protocol and a single scanner, Roche’s scanner Ventana iScan HT (for Hosp. A) and Philips Ultra Fast Scanner (for Hosp. B). For further clarification, we will include a citation to a paper that details the datasets used in this study, which was initially omitted due to anonymization requirements.

Latent variables (R2, R5). Because a single scanner was used per hospital, similar intra-hospital characteristics were observed. For this reason, we assumed that the latent variables were the hospitals and behaved as a normal distribution. This distribution can model the characteristics and variability of the specific institution’s scanner.

Model details (R2). We utilized the reparameterization trick for our model, which is very popular in some probabilistic deep learning models, e.g., VAEs and BNNs. This trick enables backpropagation and gradient-based optimizers, which excel in deep learning. We will include this information in the final manuscript. We also share the code (and will release it on GitHub) so that the model can be understood in depth and easily reproducible.

Anomaly decision (R1, R2). As exposed in the last paragraph of section 2, we estimate the likelihood of a new test sample with Monte Carlo estimation. This likelihood is defined by eq. 2 and is used as the anomaly score. Notice that this likelihood can be understood as the reconstruction error or probability. We further clarify this in the manuscript.

Evaluation (R5). It is pointed out that we only evaluate our method on a single experiment. Notice that skin cancer is one of the most prevalent cancers worldwide, and CSC is particularly challenging to detect. Thus, we address CSC neoplasm malignancy detection, which is of special interest. We consider that the proposed model suits this clinical problem well. To show the efficacy of DAUD, we leveraged real-world datasets from two different hospitals. We also conducted an additional assessment of the capability for OoD detection on an external dataset of a different skin neoplasm. We also report figures (R2, R5) to better justify the model performance. We can see in the t-sne plots that while AE distributes test points uniformly, DAUD mixes the spaces. This fact means that our model disentangles the information between the domain features from the semantic ones. We acknowledge that studying the model in other CPATH scenarios would be interesting. We are grateful for the suggestion and will work on the model generalization to other tumors, adapting the latent variables for future work (R1).

Other clarifications (R5). Spitzoid tumors and CSC neoplasms are characterized by spindle-shaped cells. Although they come from different origins, both manifest in the skin and are typically diagnosed by dermatopathologists through skin biopsies, such as those used in this study. We believe it is difficult to determine the magnitude of this semantic shift. Average pooling is employed among the features because of the great capability of PLIP for feature extraction; however, we consider using other attention methods further. To report 95% CI, we used a typical standard statistical procedure calculating mean and std, extensively used when several replicates are done.

We will thoroughly proofread the text to cover the writing. Thank you again for your valuable input and consideration, which have strengthened our manuscript and inspired us to conduct future extensions.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Before the rebuttal, the reviewers emphasized the fit of the proposed approach to the problem, the validation of the study, the performance of the approach as well as the conducted ablation experiments as positive. It was criticizes that the motivation, some method details, and used hyperparameters were not fully clear and that the approach was evaluated in a single experimental setting.

    After the rebuttal, all reviewers rated the paper with a weak accept - accept, and found their comments mostly addressed. While some concern regarding the claims persist, there seems to be the general understanding that discussing the approach with the MICCAI audience has sufficient merit.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Before the rebuttal, the reviewers emphasized the fit of the proposed approach to the problem, the validation of the study, the performance of the approach as well as the conducted ablation experiments as positive. It was criticizes that the motivation, some method details, and used hyperparameters were not fully clear and that the approach was evaluated in a single experimental setting.

    After the rebuttal, all reviewers rated the paper with a weak accept - accept, and found their comments mostly addressed. While some concern regarding the claims persist, there seems to be the general understanding that discussing the approach with the MICCAI audience has sufficient merit.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper is recommended for acceptance due to its novel approach, robust evaluation, and positive overall assessment from reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper is recommended for acceptance due to its novel approach, robust evaluation, and positive overall assessment from reviewers.



back to top