Abstract

In the elderly, degenerative diseases often develop differently over time for individual patients. For optimal treatment, physicians and patients would like to know how much time is left for them until symptoms reach a certain stage. However, compared to simple disease detection tasks, disease progression modeling has received much less attention. In addition, most existing models are black-box models which provide little insight into the mechanisms driving the prediction. Here, we introduce an interpretable-by-design survival model to predict the progression of age-related macular degeneration (AMD) from fundus images. Our model not only achieves state-of-the-art prediction performance compared to black-box models but also provides a sparse map of local evidence of AMD progression for individual patients. Our evidence map faithfully reflects the decision-making process of the model in contrast to widely used post-hoc saliency methods. Furthermore, we show that the identified regions mostly align with established clinical AMD progression markers. We believe that our method may help to inform treatment decisions and may lead to better insights into imaging biomarkers indicative of disease progression. The project’s code is available at github.com/berenslab/interpretable-deep-survival-analysis.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1325_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1325_supp.pdf

Link to the Code Repository

https://github.com/berenslab/interpretable-deep-survival-analysis

Link to the Dataset(s)

https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000001.v3.p1

BibTex

@InProceedings{Ger_Interpretablebydesign_MICCAI2024,
        author = { Gervelmeyer, Julius and Müller, Sarah and Djoumessi, Kerol and Merle, David and Clark, Simon J. and Koch, Lisa and Berens, Philipp},
        title = { { Interpretable-by-design Deep Survival Analysis for Disease Progression Modeling } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper designed an image-based deep survival model to model the pregression of age-related macular degeneration using fundus images. The main contribution of this paper is the use of a more advanced survival model combined with a deep learning model, and the model can provide interpretability that the previous black box model does not have.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The topic of this paper is fascinating. Image-based disease progression modeling is a relatively hot topic.
    2. The technology adopted in this paper is reasonable. The survival model is used to realize time-to-event modeling, while the deep learning model is used to improve the feature capture ability and provide model interpretation ability. The combination of the two technologies is suitable.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The two core technologies used in this article are outdated, especially BegNet, the deep learning model. With the rapid development of deep learning, the author can consider adopting more powerful models based on the Transformer architecture to replace BegNet.
    2. In the experimental part, as shown in Table 1, the performance of the model proposed in this paper has no obvious advantage compared with the baseline method. Whether the author can further improve the performance of the model.
    3. In the explainability part of the experimental part, the explainability of the model designed in this paper still stays at the image level, which means, the high/low risk area is given as a hint to the doctor’s diagnosis. The authors mentioned in 3.3 that they evaluated the features or indications contained in each patch, so whether the model can output the features contained in the prediction in each patch while making interpretation, such interpretation should have higher credibility.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See main weaknesses. Fig. 1 and Fig. 2 lack legends, such as whether there is a difference between real line and dotted line in Fig. 1, AP indicating that average pooling should be written out, and S(t) function has not appeared before Fig. 1, which should be explained in advance.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As mentioned in Weakness, I think the core problem of this paper is that the technology used is outdated and the performance is not advantageous. The two issues may be related. If these two problems can be solved, I would be willing to upgrade my score dependent on the rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The author’s reply has answered most of my questions, but there are still the following problems:

    [Trade-off between performance and interpretability] :I recognize the author’s point of view. Achieving performance close to that of non-interpretable SOTA models on the basis of the method’s interpretability can illustrate the value of the method.

    However, if the author regards interpretability as the core contribution, I think the author should add experiments comparing with existing other interpretable methods in terms of the performance, quantitative indicators of interpretability or interpretation effects of the method in the begin of experiment design. The comparison experiments in the current version of the article only have the comparison in terms of performance with non-interpretable methods, and this is always insufficient.

    In conclusion, I have decided to maintain the original score unchanged



Review #2

  • Please describe the contribution of the paper

    The authors propose a deep survival model where its reasoning is interpretable-by-design to predict from color-fundus (CF) images the progression (or risk of progression) of age-related macular degeneration (AMD) from intermediate to late-stage. They adapted a sparse BagNet that provide high-resolution evidence maps without the necessity of computing saliency maps, from a classifier to a risk or time-to-event model based on Cox proportional hazards (PH).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Working with time-to-event data and models is trickier than with classification models, also in terms of evaluation. The authors chose suitable metrics and suitable ways to interpret model results.

    • Interpretable time-to-event deep-learning models with comparable performance is of high interest in the medical imaging domain in general, not only in the domain of retinal imaging. From that perspective this paper is highly relevant.

    • The model seem to pick up the disease relevant regions in the CF image.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I struggle a bit with the amount of novelty. The sparse BagNet has been introduced and discussed in detail already for a classification task in the same domain. So the novelty is mainly using a CoxPH loss and the time-domain setting for training and evaluation.

    The inherent interpretability of the model comes with the cost of reduced performance.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Dataset is publicly available. Source code is on (anonymized) github.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Minor: Results: The first paragraph is quite redundant to introduction and method, and is not necessary for understanding. Fig 2: At the first glance it is not obvious that these scans show follow-up images of a patient. An arrow with “time” caption below the CF images or something similar may help to indicate that the scans are follow-up scans.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and don’t leave many questions open. The method contribution is limited to adding CoxPH to an already known interpretable network (sparse BagNet) to obtain a time-to-event model. However, the topic is clinically relevant, and the authors describe well that the model is providing meaningful results that can be interpreted in a clinical context. Furthermore, the method is not limited to the ophthalmic domain, and can be applied in other imaging domains as well. Interpretable models providing risk estimates or time-to-event predictions is clinically relevant. Evaluation is not that extensive but sufficient.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes an inherently interpretable design for disease progression modeling. This is a clinically meaningful task to solve and often receives lesser attention. This is a clinically meaningful problem, and the results show that the method achieved similar state-of-the-art results along with interpretation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this work is to re-formulate the inherent interpretability model design to a disease progressing modeling task. The method is not completely novel but the implementation of the Sparse BagNet in the AMD disease progression with an inherently interpretable design is an important contribution.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Here are some of the critical comments on the paper which need to be addressed before any possible publications

    1. In the results, the authors mention that the accuracy given by their interpretable design is very similar to the black-box models. There are a few related publications that recently discussed that there might be some trade-off between inherent interpretability vs accuracy. It is certainly not observed for all methods, but it is a commonly seen phenomenon (See the first couple of sections of: Sengupta, S., Revisiting model self-interpretability in a decision-theoretic way for binary medical image classification). Can the authors provide any intuitive understanding about how their design was able to perform very similarly to black-box models?
    2. The experiments are limited in the paper to a single problem. Is the method generalizable for other disease progression problems?
    3. The authors mentioned annotation was done by an author (“one of the authors annotated the six patches with highest predicted risk from 20 fun- dus images”) . It is kind of well-known that annotating medical images is extremely complicated and might need clinical expertise. Can the authors comment on the clinical expertise of the annotator?
    4. In the method description, it will be good to have some clear sentences to describe how this method is different than just studying CNN filters/feature maps.
    5. How is this method conceptually similar/dissimilar to other inherently interpretable design ideas? (e.g: Chen et al. “This looks like that: deep learning for interpretable image recognition,”, Agarwal et al. “Neural additive models: Interpretable machine learning with neural nets,”)
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the previous strength and weakness section for my detailed comments.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper can be accepted depending on the rebuttal of the authors addressing the comments. There are a few weakness and missing clarifications in the paper, but overall the paper addresses a clinically important problem. Hence dependent on proper rebuttal of the comments written in the weakness section the paper could be accepted.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their constructive feedback. They appreciated our interpretable time-to-event model as highly clinically relevant and liked that it captured clinically relevant changes. Regarding the critical feedback:

Trade-off between performance and interpretability [R1, R3, R5]: Importantly, the goal of our study was not to introduce a model with new SOTA performance, but an interpretable model with near SOTA performance. Given that there is typically a clear trade-off between performance and interpretability (as also pointed out by R3), our model performed surprisingly well, almost reaching the performance of non-interpretable SOTA models. The reason for this is likely that its inductive bias fits the task structure very well: AMD lesions are localised, so the localised processing of BagNets did not incur large performance loss.

Clinical expertise of annotator [R3]: We apologize for this omission. The annotator is a senior resident in ophthalmology with four years of experience and research experience in AMD. We added this information.

Generalizability to other problems [R3]: The model will generalise well to other clinical time-to-event problems for which image data is relevant, if the disease-related lesions are small, including progression modeling for diabetic retinopathy. We added a sentence to the discussion.

Novelty [R1, R5]: The survival setting has received much less attention than disease detection. Therefore, using sparse BagNets in this setting with the CoxPH model contributes two aspects: (a) we obtain an inherently interpretable model without the need for post-hoc methods and (b) we can predict risk over time from a single image. In contrast, most other models for survival prediction are not interpretable without post-hoc methods and treat time-to-event prediction as independent classification problems for each time point at the discretization available in the data. We clarified this in the text.

Conceptual similarities to other approaches [R3]: Few deep learning models provide explanations of their inner workings. (a) Prototype models learn prototypical image parts and provide them for interpretability. Their main drawback is that the explanations are coarse and spatially imprecise, which is problematic for diseases characterized by multiple small lesions, such as AMD. (b) NAMs are conceptually similar to BagNets but have to our knowledge been mostly used for tabular data, generalizing spline-based GAMs. NAMs for images like the EPU-CNN, would not yield contributions of pixels, as the BagNet does, but contributions of previously defined concepts/features. None of these methods have been used in image-based survival settings. We added references to these alternative approaches.

Other architectures [R5]: R5 recommended using transformers in this framework. Currently, we are neither aware of any works combining vision transformers and survival models, nor is it immediately clear how this would yield inherent interpretability. Also due to rebuttal rules, we think it is beyond the scope of the current paper.

Interpretability [R3, R5]: To distinguish the BagNets’ interpretability from analysing CNN filters (R3), we clarified in Methods: “Standard ResNets learn potentially global features and their interactions, while the BagNet learns the local evidence in an image patch. This eliminates the need for post-hoc saliency maps or the post-hoc analysis of convolutional filters”. R5 pointed out that the explainability “stays at the image level” and “whether the model can output the features contained in each patch”. Unfortunately, we are not sure what exactly is meant. We want to stress that we clearly show that the identified local patches contain clinically meaningful tissue lesions.

Minor points: We thank the reviewers for pointing out redundancies and small issues with the figures. We will address these in the camera-ready version.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While the paper is well written, a major drawback is the lack of experiments comparing with other existing interpretable methods in terms of the performance, quantitative indicators of interpretability or interpretation effects of the method in the begin of experiment design, rather than only in terms of performance with non-interpretable methods.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    While the paper is well written, a major drawback is the lack of experiments comparing with other existing interpretable methods in terms of the performance, quantitative indicators of interpretability or interpretation effects of the method in the begin of experiment design, rather than only in terms of performance with non-interpretable methods.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces an interpretable deep survival analysis for disease progression modeling. The technical contribution is limited to adding CoxPH to existing interpretable network (sparse BagNet) to obtain a time-to-event model. However, the paper is clinically well motivated and results looks meaningful. I would suggest acceptance of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper introduces an interpretable deep survival analysis for disease progression modeling. The technical contribution is limited to adding CoxPH to existing interpretable network (sparse BagNet) to obtain a time-to-event model. However, the paper is clinically well motivated and results looks meaningful. I would suggest acceptance of the paper.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper provided an interesting research direction for survival analysis, which the topic itself has novelty in the field. Based on all the comments, I would recommend acceptance for this work.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper provided an interesting research direction for survival analysis, which the topic itself has novelty in the field. Based on all the comments, I would recommend acceptance for this work.



back to top