Abstract

There have been significant advancements in analyzing retinal images for the diagnosis of eye diseases and other systemic conditions. However, a key challenge is multi-disease detection, particularly in addressing the demands of real-world applications where a patient may have more than one condition. To address this challenge, this study introduces a novel end-to-end approach to multi-disease detection using retinal images guided by disease causal estimation. This model leverages disease-specific features, integrating disease causal relationships and interactions between image features and disease conditions. Specifically, 1) the interactions between disease and image features are captured by cross-attention in a transformer decoder. 2) The causal relationships among diseases are automatically estimated as the directed acyclic graph (DAG) based on the dataset itself and are utilized to regularize disease-specific feature learning with disease causal interaction. 3) A novel retinal multi-disease dataset of 500 patients, including six lesion labels, was generated for evaluation purposes. Compared with other methods, the proposed approach not only achieves multi-disease diagnosis with high performance but also provides a method to estimate the causal relationships among diseases. We evaluated our method on two retinal datasets: a public color fundus photography and an in-house fundus fluorescein angiography (FFA). The results show that the proposed method outperforms other state-of-the-art multi-label models. Our FFA database and code have been released at https://github.com/davelailai/multi-disease-detection-guided-by-causal-estimation.git.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0973_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0973_supp.pdf

Link to the Code Repository

https://github.com/davelailai/multi-disease-detection-guided-by-causal-estimation.git

Link to the Dataset(s)

https://github.com/davelailai/multi-disease-detection-guided-by-causal-estimation.git

BibTex

@InProceedings{Xie_Multidisease_MICCAI2024,
        author = { Xie, Jianyang and Chen, Xiuju and Zhao, Yitian and Meng, Yanda and Zhao, He and Nguyen, Anh and Li, Xiaoxin and Zheng, Yalin},
        title = { { Multi-disease Detection in Retinal Images Guided by Disease Causal Estimation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors propose a method for detecting multiple diseases in retinal images. This method is based on a multi-label classification method that utilizes the relationship between image features and disease labels using Transformer, and introduces a module that utilizes a prior knowledge of causal relationships between labels based on graph structure. The effectiveness of the method is demonstrated through experiments using OIA-ODIR and the private dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths of this paper are as follows.

    • The detection accuracy is improved by introducing a module that uses a directed acyclic graph to utilize causal relationships among labels as prior knowledge into a multi-label classification method based on Transformer.
    • The module exploits the interaction between image features and labels, and also utilizes the causal relationship between labels as prior knowledge.
    • The conventional methods [3,11,22] use undirected graphs, which can learn the relevance between labels but not the causal relationship. On the other hand, the proposed method, which uses directed acyclic graphs, can learn causal relationships among labels.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The weaknesses of this paper are as follows.

    • In Sect. 3.3, you mention that ResNet50 is used as the backbone of the conventional methods. In this case, I wonder if the performance of the conventional methods can be properly evaluated. For example, GCN [11] uses SENet50 as the encoder, and dyGCN [22] uses ResNet101 as the encoder. Replacing the encoder with ResNet50 may not achieve the true performance of the conventional methods.
    • In Table 2, the accuracy of the proposed method is generally higher on the private dataset. On the other hand, the proposed method has the highest mAUC on the public dataset, but the conventional methods perform better on the other evaluation metrics.
    • Since the proposed method learns causal relationships among labels, the accuracy of multi-label classification may decrease for unlearned labels. In page 2, you mentioned that label co-occurrence methods may result in overfitting due to bias when the dataset is small. This problem may also occur in the proposed method that learns causal relationships among labels.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • There are inconsistencies in the notation of DAGs, such as directed acyclic graph (DAG), causal-directed acyclic graph (DAG), causal DAG, and causality DAG.

    • On page 5, line 7, $Q = Sigm(f(Q_1,w))$ is a mistake for $\tilde{Q}_1 = Sigm(f(Q_1,w))$.

    • On page 5, line 13, $L_{SEM} =     Q_1 - Q_1 W   2^2$ can be rewritten as $L{SEM}=   Q_1 - Relu(f(Q_1,w))   _2^2$. It is not clear why it is rewritten in this way. Is not $Q = Conv1D(\tilde{Q}_1) = Conv1D(Sigm(f(Q_1),w))$?
    • In Table 1, there are several labels in each row that exceed the total: in the Train row, the positive and negative in column L add up to 3,373+418=3,791, which exceeds the total of 3,318.

    • “Explorision” in Table 3 is a mistake for “Exploration”.

    • The feature vector $Q_0$ corresponding to each disease is initialized with a random number, but I consider that the training is more stable if CLIP features are used.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I determined this paper to be weak reject because there are some unclear points in the explanation of the formula, some unclear points in the experimental conditions, and some questions about the accuracy of the proposed method.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The supplemental explanations by the authors have answered some of my questions, though not all. Since the novelty of this method is acceptable, I have decided that this paper is weak accept.



Review #2

  • Please describe the contribution of the paper

    The paper investigates multi-disease detection in retinal images, introducing a method based on directed acyclic graphs (DAG) to evaluate the causal relationships among diseases. It demonstrates improved results on both a public dataset and an in-house dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The authors claim to open-source their code and dataset, which will significantly advance research in the relevant fields. (2) A clear analysis of the background conveys the feasibility of incorporating causal estimation into multi-disease detection, offering insights for similar multi-label problems. (3) The validation of two baseline methods across two modal datasets, coupled with a discussion on relevant parameters, provides ample experimental results that lend credible support to the proposed approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) There is a lack of in-depth analysis of the experimental results, such as potential reasons for the decrease in mAP compared to the baseline in Table 2; and an explanation for why the results with fewer and more channels are inferior to those with 30 channels in Fig. 3. (2) The necessity of introducing the “weighted adjacency matrix A(W)” lacks thorough analysis. In my view, the global interaction among Q0/Q1 features during the self-attention step in the Transformer Decoder already encompasses the functionality of a DAG. The authors need to further clarify the distinctions and theoretical advantages of their proposed method. (3) Section 3.5 mentions “causal relationship”, yet both the manuscript and supplementary materials lack detailed explanations. It is essential to conduct a trustworthy analysis of the DAG-learned parameters in the context of their clinical significance. Incorporating interpretable examples to illustrate the successes and failures of the parameterized DAG in learning causal relationships would be beneficial.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) Are there any constraints on the values of A(W), such as value mapping or clipping? An adjacency matrix often poses challenges for robust optimization, and the authors should provide detailed implementation specifics. (2) Re-examine the correctness of Equation 3. Should W be represented as Q=WQ1, with W acting as the left-multiplying transition matrix?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    With its insightful analysis and the provision of an open-source dataset, I advocate for acceptance provided the authors can address my concerns.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The rebuttal answered some of my concerns. I would like to keep the decision of weak accept.



Review #3

  • Please describe the contribution of the paper

    The study presents a method for multi-disease/pathology feature classification on fundus fluorescein angiography. The application of causal representation learning in fundus imaging is novel, and the multi-pathology classification is very relevant to clinical reality, as many patients present with multiple conditions. The authors condition a transformer model on encoded lesion information input to extract lesion specific features. This is combined in a common training workflow with a method for learning DAGs that estimate the causal relationship between lesions. The author model the causality by means of logistic regression and use a 2-layer feed forward network to achieve that. Besides learning the causal relationships, the proposed approach acts as regularization for the lesions classification. The method was tested on 2 relatively big dataset, demonstrating improved performance against existing works, also the addition of the causal DAG further improves the lesion classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Methodology: The authors introduce causal estimation in a multi-class pathology classification problem. Their method is modular and sucessfully learns the causal relationships between the lesions as the classification model is trained. At the same time, the learned DAG can act as a regularization for the feature learning process. The proposed method is clinically signficant in scenarios like clinical trials, where patients might be excluded if they have multiple conditions. Also, the method can provide insights on the causal relationship between pathologies. Including the causal estimation can improve the performance of classification algorithms too. Overall, the method promotes the principles of responsible AI.

    Performance: The classification performance was improved when the causal relationships, extracted by the proposed approach, were included in other methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) In the results in table 2 and 3, the standard deviation is not provided. Additionally, the authors do not conduct statistical significance analysis of their results.

    2) The authors do not discuss the limitations of their method as well as the the future direction.

    3) There is a confusion issue between the use of “disease” and “lesions” in the paper. The choosen lesions, or diseases according to the text, that are present in the dataset can be the outcome of two different diseases. For example, both Diabetic Retinopathy and Age-related Macula Degeneration can demonstrate retinal/choroidal vascular abnormalities that could present with oclusions and hemorrhages in FFA images. The FFA mainly studies functional characteristics of the retina that might be caused by different diseases. Also, the main assumption that features of a given disease can be modeled based on its parents’ disease is not entirely clear. Given from the paper that the labels define the type of lesion (e.g lekage) without giving their origin in terms of disease, such as whether the leakage originates from diabetic retinopathy or another disease, it is not clear how this information is applied in the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) Why for the analysis of causal estimation (Table 3) the authors do not consider the cyGCN [22] method? From Table 2 it seems that is the most competitive against the author’s method.

    2) In Table 2 and 3 are the differences in the metrics statistically significant? It is recommended to the authors to include the standard deviation next to the metrics.

    3) What is the inference time of the proposed methods compared to the existing methods in the literature?

    4) Please clarify the difference between lesion and disease in the experimental setting.

    5) In the results on Table 2, what is the contribution of each lesion/decision prediction in the mAUC metric? From figure 2 it seems to me that the models are generally very good to recognize cataract, instead the performance drops significantly in the hypertension case. For the OIA-ODIR dataset hypertension has the lowest number of cases. How the proposed method performs in cases where the classes are unbalanced?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It a novel approach for the classification of images with multiple pathologies. The proposed approach could be applied in other application and imaging modalities. It deals with a relevant clinical problem of patients presenting with multiple diseases.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    After reviewing the author’s rebuttal, I would like to keep it as accepted.




Author Feedback

We thank the reviewers for their invaluable comments and recognition of our method’s novelty and its contributions to clinical applications. As stated in the abstract, the in-house FFA database and the source code will be released. Below, we respond to the major comments one by one, whilst the minor ones will be included in the camera-ready paper:

  1. The backbone selection in Sect 3.3(R1):Reply: We agree on the backbone’s role in the model’s performance. However, maintaining backbone consistency was aimed at establishing a standardized baseline for fair comparisons across different approaches. ResNet50 was chosen for practical reasons like computational efficiency whilst it may not rule out better performance employing other architectures like SENet50 and ResNet101, our approach ensures fair comparisons by adopting such a consistent experimental setup.
  2. The performance of the proposed method is higher than other methods in mAUC, but not in all of the evaluation metrics in Table 2 (R1, R3): Reply: Our method achieved the best mAUC scores on both in-house and public datasets. This underscores its overall proficiency in accurately ranking multiple labels concurrently, which is crucial in multi-label classification tasks. Unlike other evaluation metrics like MAP, MAR, and MAF1, which focus on single aspects and can be influenced by dataset imbalance, mAUC provides a comprehensive evaluation across all labels simultaneously, further highlighting our method’s superiority. Despite not consistently outperforming others across all metrics, our approach demonstrated superiority in most evaluation metrics, emphasizing its overall effectiveness in multi-disease detection tasks.
  3. Concern about the performance decrease for unlearned labels, and the overfitting problem due to bias when the dataset is small (R1). Reply: For the unlearned labels problem, our method is not designed to predict diseases that haven’t been encountered during training, so we may not fully address your specific concern, Further details would help us address it effectively. Regarding overfitting, our analysis in Section 3.5 showed that using the learned causal matrix significantly improves performance compared to the label co-occurrence matrix, suggesting effective mitigation of overfitting.
  4. The concern about the causal relationship: explanations, the necessity of A(W), constraints of A(W) (R3). Reply: (1) The concept of causal relationships was introduced in the introduction (page 2) and further elaborated in the method section (lines 2-3 on page 5). (2) Necessity of A(W), while we acknowledge that disease relationships are partially captured within the transformer decoder, they are not causal relationships, as they are computed based on a non-local mechanism. Additionally, Table 2 shows that Q2L Causal and ML causal outperform their respective baselines (Q2L and MLDecoder, which only contain transformer decoder) after integrating the causal estimation module, emphasizing the necessity and significance of A(W) in multi-disease detection. (3) The value of A(W) is constrained: Equation 5 enforces regularization on W through   w   to promote sparsity, while R_DAG(W) ensures a directed acyclic graph structure.
  5. Explaintion for numbers of channel in Fig. 3 (R3). Reply: Variations in the number of channels affect the depth of A(W): too few may hinder efficient causal relationship capture, while excessive channels might lead to overfitting.
  6. The difference of lesion and disease (R4). Reply: Thank you for raising this important point. In the OIA-ODIR dataset, images were categorized according to disease definitions, whereas in our LID-FFA dataset, annotations were based on lesion categories. However, the causal relationship persists in LID-FFA. For instance, a case with leakage in the early phase may show pooling in the later phases. We will ensure a clear distinction in the final paper to avoid any confusion.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I recommend the acceptation of the paper due to its innovative approach in multi-disease detection using causal relationships, effectively enhancing diagnostic methodologies. The authors have addressed the reviewers’ concerns in their rebuttal, promising further clarifications and improvements in the final manuscript. With its proven effectiveness and potential clinical impact, the paper meets the MICCAI standards.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I recommend the acceptation of the paper due to its innovative approach in multi-disease detection using causal relationships, effectively enhancing diagnostic methodologies. The authors have addressed the reviewers’ concerns in their rebuttal, promising further clarifications and improvements in the final manuscript. With its proven effectiveness and potential clinical impact, the paper meets the MICCAI standards.



back to top