Abstract

Graph neural networks (GNNs) represent a cutting-edge methodology in diagnosing brain disorders via fMRI data. Explainability and generalizability are two critical issues of GNNs for fMRI-based diagnoses, considering the high complexity of functional brain networks and the strong variations in fMRI data across different clinical centers. Although there have been many studies on GNNs’ explainability and generalizability, yet few have addressed both aspects simultaneously. In this paper, we unify these two issues and revisit the domain generalization (DG) of fMRI-based diagnoses from the view of explainability. That is, we aim to learn domain-generalizable explanation factors to enhance center-agnostic graph representation learning and therefore brain disorder diagnoses. To this end, a specialized meta-learning framework coupled with explainability-generalizable (XG) regularizations is designed to learn diagnostic GNN models (termed XG-GNN) from fMRI BOLD signals. Our XG-GNN features the ability to build nonlinear functional networks in a task-oriented fashion. More importantly, the group-wise differences of such learned individual networks can be stably captured and maintained to unseen fMRI centers to jointly boost the DG of diagnostic explainability and accuracy. Experimental results on the ABIDE dataset demonstrate the effectiveness of our XG-GNN. Our source code will be publicly released.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1756_paper.pdf

SharedIt Link: https://rdcu.be/dV1PP

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72069-7_43

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1756_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Qiu_Towards_MICCAI2024,
        author = { Qiu, Xinmei and Wang, Fan and Sun, Yongheng and Lian, Chunfeng and Ma, Jianhua},
        title = { { Towards Graph Neural Networks with Domain-Generalizable Explainability for fMRI-Based Brain Disorder Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {454 -- 464}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper leverages a meta-learning framework that boosts generalizability of a GNN model to unseen domains while manifesting its explainability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The key idea is well motivated and the paper is clearly written and easy to follow.
    2. Exploring explainability under domain shift is a interesting problem to solve in both medical imaging and for a broader scientific audience.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The level of novelty with respect to generic explainable models under domain shifts may be better justified from a “Related work” section.
    2. The domain shift is hypothesized for a single target domain
    3. The rationale behind a few choices is missing
    4. Questionable claim of novelty. The paper seems to specifically tackle a domain adaptation problem endowed with explainability properties. Generalizability is a more comprehensive term to use.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    There are a few issues in the paper that could benefit from further clarification(s):

    1. Existing and related work. Did the authors search all DL/GNN frameworks that aimed to combine explainability and domain adaptation? How does the proposed method stand with respect to the following references? a) Schmidt, Robin M. “Explainability-aided domain generalization for image classification.” arXiv preprint arXiv:2104.01742 (2021).

    b) Bobek, Szymon, et al. “Towards Explainable Deep Domain Adaptation.” European Conference on Artificial Intelligence. Cham: Springer Nature Switzerland, 2023.

    The authors seem to have missed landmark works investigating this problem using more focused research terms such as “domain adaptation”, which seem to better articulate the research question addressed in this paper.

    1. The strong claim of unprecedentedly introducing XG (explainable generalizability) needs further evidence. The paper seems to tackle explainability and domain adaptation using meta-learning. Perhaps it may be more reasonable to revise the paper and replace “generalizability” with “domain adaptation”.

    2. Results reproducibility. How reproducible are the explainable results under train/test data distribution perturbation? How sensitive is the proposed method to hyper-parameter tuning?

    3. Benchmarks are limited. The paper lacks comparison against SOTA methods —e.g., a) and b).

    4. Several methodological choices presented in Section 2.1 are not motivated or justified (e.g., concluding the BOLD mapping learner with an MHSA). What if this layer is removed? How would the results change?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall a good paper with clear writing, borderline novelty (in some parts questionable wrt literature) and satisfactory experimental design.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The rebuttal addressed my fundamental concerns. The authors will integrate and clarify these in the final version of the paper.



Review #2

  • Please describe the contribution of the paper

    The authors describe a strategy to perform classification of subjects into controls or abnormal (i.e. those with neurological disorders with Autism as the key example) using graph neural networks (GNN) that are focused on both domain generalization and explainability simultaneously. The approach/ architecture design includes a meta-level learning framework using an outer loop that focuses on group level differences combined with a more detailed inner loop that enhances the learning of more fine-grained, discriminative representations. This framework thus incorporates explainability-generalizable (XG) regularization and is designed to learn diagnostic GNN models (termed XG-GNN) from fMRI BOLD signals. Results are shown using the ABIDE public database.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The description of the problem and challenges for both explainability and generalizability are reasonable, with good referencing to key papers.

    The multiheaded self-attention-based graph network learner combined with the graph convolutional “diagnose” network is a reasonable architectural design for subject classification, although a relatively straightforward approach.

    Incorporating the notion of inter-subject group functional connectivity differences is interesting and may be helpful. It is incorporated during training/ meta-learning via the loss functions outlined in section 2.2 with helpful details about the procedure in the Supplementary material. The efficacy of all of this obviously depends heavily on the how representative and robust the training data are.

    Testing on the ABIDE dataset is promising with XG-GNN showing several percent improvements in ACC and AUC versus other SOTA methods (including some of not specifically developed for rs-fMRI).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the classification results are better with the new method, some improvements are incremental over some of the comparison methods.

    The notion that true explainability can be captured by using training data about cross-group and cross-acquisition-site different distributions seems questionable. The authors state that they “design complementary XG regularizations by leveraging fundamental prior regarding the early status of neuropsychiatric disorders (e.g. ASD).” This is mentioned once without any detail or references, and the lack of insight about this is a major weakness of the rationale for this work.

    The “explanation” results are rather weak and more or less reduced to a subjective interpretation of Figure 2c (and 2b). Given that this is a stated theme of this paper, these results are somewhat disappointing. The notion of a robust network that integrated explainability and generalizability is rather insufficiently supported.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Testing from data across a number of clinical centers is incorporated as part of the approach, but no clear statements are made about robustness.

    The algorithm description is helpful and could allow another researcher to reproduce the method (although may require more detail)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    More detail and insight as to what the authors mean by “leveraging fundamental prior regarding the early status of neuropsychiatric disorders” would be very helpful, and important to better describe this work.

    Additional text to help better understand the explanation results with respect to the brain regions involved with classification in Figures 2 b and c would be helpful. Perhaps highlight the limbic system regions that stand out.

    Performing statistical significance tests of the new results versus other methods might highlight whether the classifications improvements are meaningful or not.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall ideas presented are interesting, including the meta-learning design of the graph nets and the incorporation of the FC differences in the XG regularization module. However, the rationale regarding the incorporation of prior information related to ASD / neurodisorders and the final discussion regarding explainabililty in general is not well-focused and is difficult to follow clearly. Also, while the results are somewhat promising in terms of classification results in comparison to other SOTA methods, they remain somewhat incremental (perhaps statistical significance testing would help here).

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a new framework, called XG-GNN, to improve explainability and generalizability of Graph Neural Networks (GNN) in order to improve disease diagnosis from fMRI data. The study is justified by the need for more explanability and generalizability of predictions due to the high complexity of functional brain networks and the strong variations in fMRI data across different clinical centers. To do so, they developed a bi-level meta-learning framework in which they introduce two regularizations on the sparsity and cross-site differences of inter-group FC differences. They evaluate the performance of XG-GNN in a diagnostic task of Autism Spectrum Disorder (ASD) on the ABIDE dataset and compare the performance using 5 competitive models. XG-GNN outperformed the 5 models in terms of classification performance and identified domain-generalizable explanation factors. An ablation study was also performed to assess the contribution of each component.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The topic is of high interest: improving the generalizability and explanability of deep learning models is of high importance to extend their use to clinical practice.
    • The idea seems novel: other studies concentrate on improving the generalization of outcomes rather than the explanatory power, whereas XG-GNN attempt to unify both. Good description of state of the art on explainability and generalisability of GNN.
    • The use of meta-learning and regularizations on inter-group FCs also seems novel and is well described in the paper.
    • Proper comparisons were performed with several competitive models (with and without domain generalizability constraints) and XG-GNN outperformed these models.
    • Authors also justified the performance of their framework in identifying domain-generalizable explanation factors, which was the main goal of the improvements in the framework. Overall, paper is well organized and written, very easy to read.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Some details on the evaluation procedure remain unclear: preprocessing of fMRI data (registration, motion correction, etc.), number of repetition of the procedure (3 in the method section, 5 in the results).
    • Authors compared with 5 competitive models but it is hard to estimate what are the main variations between the proposed XG-GNN and the competitive models.
    • Explanation factors identified by the model were computed using group-wise difference of the learned functional connectomes and compared to the differences between functional connectomes (FC) computed using Pearson’s correlation. However, as stated by authors in the introduction, “It is important to note that linear FCs, such as those based on Pearson correlations, overlook the temporal order and struggle to fully capture the complexities of brain networks”. Thus, we might wonder why authors compared with these FCs and not with non-linear FCs learned without domain-generalizability and explanability regularization (such as in the ablation study).
    • Use of ABIDE dataset must be acknowledge in the Acknowledgement section, see https://fcon_1000.projects.nitrc.org/indi/abide/abide_I.html.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Authors stated that code would be released on GitHub upon publication. Details on the computation were provided, but there is no sufficient information on the model architecture to fully reproduce the paper. The participants used in the evaluation section are not available either.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is great. Further improvements could be made by adding some details to facilitate reproducibility (sharing the code, detailed information on preprocessing strategies, etc.). Authors could also emphasis on the main advantages of the proposed method, by comparing its architecture and regularization strategies with those of the competitive models more explicitely. The choice of comparative explanation features could also be improved by using non linear FCs (not necessary for rebuttal).

    Some typos errors:

    • Section 2.1 - “as the input the BOLD” –> “as input the BOLD”
    • Section 3.1 3) Implementation details - “domians” & confusion with Table 1 –> 3 repetitions vs 5 target domains ?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper present a novel framework on a high interest topic. Evaluation procedure is well performed and show satisfying performance. For these reasons, I recommand acceptance of the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank in-depth reviews and appreciate for affirming our contributions. The main concerns are addressed below. [R1]-Claim of novelty, fundamentally different to domain adaptation We believe that there should be some misunderstanding that causes the reviewer’s concerns regarding our method’s novelty. To clarify, our XG-GNN is a domain-generalization (DG) method rather than a domain-adaptation (DA) one, due to which we did not compare with existing DA but DG methods in the experiments. As has been described in Sec-2.2, our meta-learning process operates exclusively on the source datasets; and on the target domain, we conduct inference without any model fine-tuning. So XG-GNN is strictly aligned with the setting of DG, which is practically more challenging than DA. In the context of DG, the key novelty of XG-GNN is to learn explainability to boost diagnosis, leading to significant improvements from both aspects. We’ll revise the manuscript to more clearly describe the differences between our method and other related works. [R3&R4]-Reproducibility & significance of diagnosis results In Sec-3, we have conducted three consecutive reproducibility experiments, each with evaluations on different unseen centers (i.e., different test datasets with varying distributions). The variance of evaluation metrics obtained by our XG-GNN is much smaller compared to other methods, indicating its reproducibility and robustness. On the significance of improvements, we can see that the margins between XG-GNN and other methods are relatively large in most cases. We are confident that such improvements are statistically significant, which will be further analyzed in the future. [R1&R3]-Reproducibility & analysis of explainable results In Fig 2, we segmented ROIs into seven brain network regions, and the consistently highlighted areas across reproducibility experiments revealed stably captured connectivity patterns tied to ASD. This observation aligns with previous findings in neuroscience research and holds valuable insights into brain development studies. Following the reviewers’ great suggestion, we’ll update Fig. 2 and corresponding descriptions to showcase such reproducible explanation results from multiple centers, enhancing the persuasiveness of DG from the explainability perspective. [R1&R3]-Description of XG regularizations To enhance group-wise explanations and its cross-heterogeneous site generalization, we design dedicated XG regularizations under a very fundamental assumption. That is, independent of domain shifts, the group-wise connectome differences between ASD and TD are partially stable, and such differences are not whole-brain diffused (i.e., relatively sparse). Such an assumption is consistent with neuroscientific findings, and significantly improved the diagnosis performance according to our ablation studies. To support this design decision, we’ll update the paper to provide more relevant references. [R1&R4]-Competing methods In response to the comparison with methods based on non-linear functional connectivity, we have trained two models based on non-linear functional connectivity scores for generalization (FcNet, FBNetGen), whose performance is lower than our XG-GNN. We’ll explain in detail. [R1, R3&R4]-Contribution of each key component and ablation study The reviewers requested a more detailed discussion on each key component, i.e., the MHSA-Based Graph Learner, GCN-Based Diagnoser, and XG regularization. These components are designed to play distinct roles: 1) MHSA-Based Graph Learner ensures non-local representation learning, which is intuitive for connectome analysis; 2) GCN-Based Diagnoser leverages specialized mechanisms of graph CNN for graph-structured data; and 3) XG regularization enhances group-wise explanation and its generalization to heterogeneous sites. We didn’t include all experiments due to space limitations. We’ll further describe the design rationale citing relevant literature.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers were generally positive, noting strengths of interest in domain generalization work, novel proposed methods, clear paper, and experimental results/design. Many concerns were resolved with rebuttal, though some remain/could not be addressed in rebuttal, such as missing analysis of the consistency of the explanations across domains, which based on the goals/motivation of the work, is an important missing task. Still, based on the strengths and the trust that the authors well make the promised updates (especially any clarifications), I follow the majority of the reviewers and recommend accept.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Reviewers were generally positive, noting strengths of interest in domain generalization work, novel proposed methods, clear paper, and experimental results/design. Many concerns were resolved with rebuttal, though some remain/could not be addressed in rebuttal, such as missing analysis of the consistency of the explanations across domains, which based on the goals/motivation of the work, is an important missing task. Still, based on the strengths and the trust that the authors well make the promised updates (especially any clarifications), I follow the majority of the reviewers and recommend accept.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Upon reading the reviews, rebuttal and paper, I agree believe that all major concerns have been adequately addressed in the rebuttal process and would like to recommend acceptance.

    Two out of three reviewers are leaning towards an accept with reviewer 3’s score being a weak reject. Nevertheless, the reviewers seem to general agree on the quality of the experiments and paper and have generally praised the premise, evaluation, and methodology.

    If accepted, I would highly encourage the authors to follow through with their promise of providing clarifications to the interpretability section of the paper, the figures and results. It would be great if they could include rigorous statistical comparisons and add/swap out the necessary references to solidify their contributions to the application.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Upon reading the reviews, rebuttal and paper, I agree believe that all major concerns have been adequately addressed in the rebuttal process and would like to recommend acceptance.

    Two out of three reviewers are leaning towards an accept with reviewer 3’s score being a weak reject. Nevertheless, the reviewers seem to general agree on the quality of the experiments and paper and have generally praised the premise, evaluation, and methodology.

    If accepted, I would highly encourage the authors to follow through with their promise of providing clarifications to the interpretability section of the paper, the figures and results. It would be great if they could include rigorous statistical comparisons and add/swap out the necessary references to solidify their contributions to the application.



back to top