Abstract

Deep Learning (DL) can predict biomarkers directly from digitized cancer histology in a weakly-supervised setting. Recently, the prediction of continuous biomarkers through regression-based DL has seen an increasing interest. Nonetheless, clinical decision making often requires a categorical outcome. Consequently, we developed a weakly-supervised joint multi-task Transformer architecture which has been trained and evaluated on four public patient cohorts for the prediction of two key predictive biomarkers, microsatellite instability (MSI) and homologous recombination deficiency (HRD), trained with auxiliary regression tasks related to the tumor microenvironment. Moreover, we perform a comprehensive benchmark of 16 task balancing approaches for weakly-supervised joint multi-task learning in computational pathology. Using our novel approach, we outperform the state of the art by +7.7% and +4.1% as measured by the area under the receiver operating characteristic, and enhance clustering of latent embeddings by +8% and +5%, for the prediction of MSI and HRD in external cohorts, respectively.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1368_paper.pdf

SharedIt Link: https://rdcu.be/dY6iB

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72083-3_24

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1368_supp.pdf

Link to the Code Repository

https://github.com/KatherLab/joint-mtl-cpath

Link to the Dataset(s)

The slides for TCGA are available at https://portal.gdc.cancer.gov The slides for CPTAC are available at https://proteomics.cancer.gov/data-portal

BibTex

@InProceedings{El_Joint_MICCAI2024,
        author = { El Nahhas, Omar S. M. and Wölflein, Georg and Ligero, Marta and Lenz, Tim and van Treeck, Marko and Khader, Firas and Truhn, Daniel and Kather, Jakob Nikolas},
        title = { { Joint multi-task learning improves weakly-supervised biomarker prediction in computational pathology } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {254 -- 262}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a weakly-supervised joint multi-task learning framework, which allows for additional biological information about the tumor microenvironment to be learned to improve the main biomarker prediction objective. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed framework.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The research topic is interesting.
    2. Extensive experiments.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The contribution is very small.
    2. The Figure 1 is not clear, like how to obtain the class and regression token.
    3. The Architecture is not clear, it is better to provide more details.
    4. In experiments, few related comparison methods and do not introduce them.
    5. The conclusion is too long.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The Figure 1 is not clear. For example, how to obtain the class and regression token.
    2. The Architecture is not clear. It is better to provide more details.
    3. In experiments, few related comparison methods and do not introduce them. It is better to introduce the comparison methods and show their results.
    4. It is better to show the future work.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The details of the proposed framework are not clear.
    2. Experimental results are not convincing.
    3. The paper writing is bad.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    After reading all comments and the rebuttal, the authors have addressed many of my concerns.



Review #2

  • Please describe the contribution of the paper

    Adding a second task, i.e. regressing on properties of the tumor microenvironment, helps to predict biomarkers from histological slides using weakly supervised learning

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Adding another, meaningful task to the classification problem is a novel idea
    • The approach seems to increase performance beyond benchmarks
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Methods could be more formal
    • Interpretation about the different performance of used balancing methods is missing
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Why upper case ‘Transformer’?
    • Chapter 3 could be a bit more formal: eg are we optimizing two loss terms with the weights between them?
    • Table 1 & 2: Would be great to have standard deviations to assess significance of differences.
    • Table 1 caption: please explain the different weight balancing methods
    • It would be very instructive to show the embedding with colored classification and regression scores.
    • Can you explain why the different balancing methods are leading to vastly different results? Which one to use in practice?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Good, novel idea
    • Paper very well written
    • Convincing results
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Authors addressed my issues.



Review #3

  • Please describe the contribution of the paper

    The authors propose a transformer-based weakly supervised multitask learning model to enhance prediction of MSI and HRD status from standard H&E slides. The main contribution of this work is the formulation of weakly supervised biomarker prediction as a multitask problem involving classification and regression and demonstration of improvement over SOTA single task models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths

    • The paper is well organized and easy to read.
    • The idea of leveraging the TME to boost biomarker prediction performance is innovative and biologically well motivated given known functional effects of MSI and HRD on the tumor microenvironment of patients
    • The authors demonstrate generalizability on independent data derived from CPTAC
    • The benchmarking results reported by the authors demonstrate marked improvement of ~11% over the state of the art for MSI prediction in cross-validation and ~7% on independent held out test.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This work does not report bootstrapped confidence intervals to assess significance of improvements.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • It would help if the authors further elaborate in the supplementary how specific TME-relevant tasks were chosen for multi task training of MSI and HRD prediction models.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • It is not clear why the authors chose lung adenocarcinoma to evaluate efficacy of HRD status, given the higher prevalence and need for effective testing of HRD in other cancer types such as high grade ovarian cancer (https://doi.org/10.1200/PO.17.00286). There are by now a few additional public datasets (beyond TCGA/CPTAC) with whole slide image and clinical metadata available for independent testing of HRD biomarkers in ovarian cancer (See for eg: https://www.cancerimagingarchive.net/collection/ptrc-hgsoc/)
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, this paper presents an elegant and biologically well motivated computational solution to tackle the important clinical problem of MSI and HRD prediction from readily available histopathology images. The authors perform comprehensive benchmarking of different multitask training regimes to demonstrate improvement over state of the art. Given the important role of the TME in tumor progression, the techniques introduced in this work have the potential to enhance performance of several other clinically relevant computational pathology tasks. Hence, I recommend accepting this paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their insightful feedback. We are pleased they appreciate the “novel” (R1) and “innovative and biologically well motivated” (R4) concepts of our paper which “have the potential to enhance performance of several other clinically relevant computational pathology tasks” (R4). We are encouraged by positive feedback on our extensive experiments (R3, R4) with “convincing results” that “increase performance beyond benchmarks” (R1).

@R3 “The paper writing is bad”: We were surprised by R3’s sentiment given that the other reviewers described the paper as “very well written” (R1) and “easy to read” (R4). Nonetheless, we would appreciate specific feedback on which parts were hard to follow, so we can improve the clarity of our paper. Further, in response to R3’s incorrect claim that no open access source code is available, we kindly point to the bottom of page 3 which contains the link to an anonymised version of our repository.

@R3 “The contribution is very small”: We disagree. Our contribution of weakly-supervised joint multi-task learning including tumor microenvironment (TME) tasks shows “improvement over SOTA single task models” (R4) for “the important clinical problem of MSI and HRD prediction” (R4). We are pleased that R1 and R4 recognize the significance of our contribution, describing our idea as “good, novel” and “innovative and biologically well motivated”.

@R3 “results are not convincing” and “few related comparison methods”: On the contrary, we compared with 16 different methods across 4 datasets. In our opinion, echoed by the other two reviewers, this constitutes a “comprehensive benchmarking” (R4) with “convincing results” (R1). The results “increase performance beyond benchmarks” (R1), “demonstrate marked improvement” and “generalizability” (R4). R3’s initial statement that our “extensive experiments on multiple datasets demonstrate the effectiveness of the proposed framework” conflicts with the sentiment of the rest of their review. We would appreciate specific feedback causing R3 to find the results unconvincing.

@R3 “Architecture is not clear” and “Figure 1 is not clear”: The architecture that we used is based on the widely-known standard Vision Transformer (ViT). All deviations and extensions from the standard ViT are explained in the methods, and visualized in Figure 1. Moreover, the class tokens (and, by extension the regression tokens) are randomly initialized learnable vectors, as is standard in ViT-style architectures. We thank R1 and R3 for their suggestions for improving the clarity of Figure 1 (adding colored classification/regression scores and expanding the caption to explain the classification/regression tokens), which we will incorporate in the camera-ready version.

@R1,R4 confidence intervals: We will include 95% CIs for Table 1 and 2 from the 5-fold experiments in the camera-ready version, which were previously removed due to table size and page limitations.

@R1,R3 explain balancing methods: We will further formalize the type of task balancing optimization that is performed, how the approaches differentiate from each other, and which layers of the architecture are affected in the methods and the caption of Table 1 (R1) and Figure 1 (R3).

@R4 choice of cancer types and targets: We thank R4 for suggesting a dataset for evaluation of HRD in ovarian cancer (OV). We compared to El Nahhas et al. (2024) for HRD, which did not include OV experiments. HRD prediction in OV from histology is still not convincing (AUCs 0.51-0.56; Ahn et al. (2024)). The evaluation of our framework on OV will be mentioned as future work in the conclusion (R1). We will also expand on our choice of cancer type and TME-relevant tasks (R4).

@R1,R3,R4 conclusion: We will reduce the length of the current conclusion (R3), add more information regarding the interpretation of the task-balancing outcomes (R1), and extend the perspective to future work from a technical and clinical perspective (R1, R3, R4).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    NA

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NA



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top