Abstract

Human brains are typically modeled as networks of Regions of Interest (ROI) to comprehend brain functional Magnetic Resonance Imaging (fMRI) connectome for Autism diagnosis. Recently, various deep neural network-based models have been developed to learn the representation of ROIs, achieving impressive performance improvements. However, they (i) heavily rely on increasingly complex network architecture with an obscure learning mechanism, or (ii) solely utilize the cross-entropy loss to supervise the training process, leading to sub-optimal performance. To this end, we propose a simple and effective Geometric-oriented Brain Transformer (GBT) with the Attention Weight Matrix Approximation (AWMA)-based transformer module and the geometric-oriented representation learning module for brain fMRI connectome analysis. Specifically, the AWMA-based transformer module selectively removes the components of the attention weight matrix with smaller singular values, aiming to learn the most relevant and representative graph representation. The geometric-oriented representation learning module imposes low-rank intra-class compactness and high-rank inter-class diversity constraints on learned representations to promote that to be discriminative. Experimental results on the ABIDE dataset validate that our method GBT consistently outperforms state-of-the-art approaches. The code is available at https://github.com/CUHK-AIM-Group/GBT.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2778_paper.pdf

SharedIt Link: https://rdcu.be/dY6fI

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_14

Supplementary Material: N/A

Link to the Code Repository

https://github.com/CUHK-AIM-Group/GBT

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Pen_GBT_MICCAI2024,
        author = { Peng, Zhihao and He, Zhibin and Jiang, Yu and Wang, Pengyu and Yuan, Yixuan},
        title = { { GBT: Geometric-oriented Brain Transformer for Autism Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {142 -- 152}
}


Reviews

Review #1

  • Please describe the contribution of the paper
    1. The method introduces the use of singular value decomposition (SVD) to eliminate less significant components from the attention matrix, thereby learning the most relevant representation. Although SVD is a standard technique in principal component analysis and widely used in machine learning, applying it to the weight matrix is innovative. The ablation study indicates that this component is critical for the majority of the accuracy improvements, validating its effectiveness.
    2. The paper proposes the application of low-rank and high-rank constraints on intra-class and inter-class representation learning, respectively. This approach is designed to enhance the discriminative power of the learned representations.
    3. The exceptional evaluation results effectively validate the efficacy of their proposed techniques.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The work provides very persuasive evaluation results, achieving an accuracy improvement of 6% over the previous state-of-the-art framework, demonstrating a significant advance in performance.

    1. Using singular value decomposition to selectively remove less important components from the weight matrix of a transformer represents a novel application, enhancing model efficiency and focus.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.Imposing rank-constraints in representation learning is not a novel concept, as evidenced by references [1,2]. 2.With various loss functions available that enhance discriminative inter-class and intra-class representation, like those in the baseline model FBNETGEN, the strength of applying rank-constraint loss is not clarified. This is particularly important given that the ablation study indicates that, in comparison to the AWMA module, rank constraints contribute minimally to accuracy improvements and instead lead to a significant increase in deviation.

    1.Gao, L., Li, Y., Huang, J. and Zhou, S., 2016, August. Semi-supervised group sparse representation: model, algorithm and applications. In Proceedings of the Twenty-second European Conference on Artificial Intelligence (pp. 507-514).

    2.Li, S. and Fu, Y., 2015. Learning robust and discriminative subspace with low-rank constraints. IEEE transactions on neural networks and learning systems, 27(11), pp.2160-2173.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    None

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. As mentioned under ‘Main Weakness’, a direct comparison between the proposed loss function and existing functions in generating discriminative intra- and inter-class representations would strengthen the study.
    2. It would be beneficial to evaluate this model across a broader range of datasets. Since the proposed methods are not disease-specific, more extensive testing results could enhance the credibility of the model’s strengths.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The evaluation results are outstanding, and the overall presentation of the paper is strong with clear explanations of their methods. However, the novelty of the ‘Geometric-oriented Representation Learning Module’ could be better emphasized, and its advantages are not adequately demonstrated.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My concerns have been addressed by the authors’ replies, but the analysis and evaluation of used rank-constraint loss need to be further improved.



Review #2

  • Please describe the contribution of the paper

    The paper introduced a new transform-based model Geometric-oriented Brain Transformer (GBT) with modified module of Attention Weight Matrix Approximation (AWMA). The AWMA applies low-rank approximation to the attention matrix and loss functions are applied to maintain low-rank intra-class consistency and high-rank inter-class diversity. The authors compared their methodology with the state-of-the-art (SOTA) models and ran ablation studies on ABIDE dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed attention weight matrix approximation (AWMA) facilitates network learning by linear decomposition and low rank approximation. The novel approach could reduce the number of parameters need to be trained and guide the network to learn more statistical significant components.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The visualization of Figure 2 demonstrated the capability to separate 2 synthetic data classes. However, the current figure and caption only showed that the alogrithm has the capability to overfit the data. If the samples crossed the class boundary, they should be mis-classified due to the limitation of the data. This suggests that the models could potentially overfit the data in the training dataset, but authors did not provide any regularization method in the paper to prevent it.

    The experiments were carried out solely on the ABIDE dataset by the authors. The performance ranges of the proposed method overlap with the second-best results, which may indicate that the performance is not statistically significant from the compared SOTAs.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, the paper is well structured, and the ablation studies are comprehensive and covered all of the innovations in the proposed method. The authors did well illustrating their proposed methodology and enhanced reproducibility for their work. However, figure 2 needs to be adjusted to better illustrate the authors’ point. The method needs to be tested on more datasets for the study to be valid.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is not comprehensively assessed in different datasets than ABIDE. This made the authors’ proposed method less convinceable. As previously mentioned, the authors provided visualization (Fig 2) that showed both the strength and the weakness of the algorithm. However, the authors did not provide solution to mitigate the weakness.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    My concerns have been addressed by the authors’ replies, but the visualization and explanation for the regularizations need to be further improved.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a geometric approximation-based transformer module and the geometric-oriented representation learning module for brain fMRI connectome analysis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper propose a AWMA-based transformer, which replaces the attention weight matrix with its rank k approximation. The low-rank constraint can compensate the weakness of only utilization of CE loss.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In the introduction of the methodology of subsection 2.2, the significance of the research should not be mentioned again. The experimental analysis is not deep enough and comprehensive.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please explain that why the proposed method can identify and compare brain function differences between different individuals to achieve distinguishable guidance toward embedding representation learning, promoting the learned representation to be discriminative. Regarding the analysis of the experiment, it is not recommended to list it in a point-by-point format. Instead, it is suggested to analyze it in a paragraph manner. SVD is the optimal solution of the matrix approximation by removing smaller singular valued components. However, how dose SVD improve the SEN metric. Please provide detailed analysis of it.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper has clear logic, good writing quality, and a certain degree of innovation

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The data used in this paper should also be released.



Review #4

  • Please describe the contribution of the paper

    1.An Attention Weight Matrix Approximation (AWMA)-based transformer module has been introduced, selectively removing smaller singular values from the attention matrix to streamline and optimize graph representation. This design helps to mitigate the overfitting problem and enhances the model’s ability to capture critical information in brain connectome data. 2.A geometric-oriented representation learning module is proposed, which imposes low-rank intra-class compactness and high-rank inter-class diversity constraints on the learned embeddings. This not only enhances the discriminative power of the embeddings but also effectively leverages the natural geometric properties of brain data. 3.The proposed methods are designed as plug-and-play modules, which can be easily integrated into existing network architectures, enhancing the applicability and flexibility of the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.By introducing a transformer module based on Attention Weight Matrix Approximation (AWMA), the method proposed in the article can selectively remove small singular value components from the attention matrix. This technique reduces the model’s sensitivity to noise during the learning process, thereby mitigating overfitting issues and making the model more robust. 2.The geometric-oriented representation learning module introduced in the paper enhances the discriminability of representations by imposing constraints of intra-class compactness and inter-class diversity during the embedding learning process. 3.The design of the GBT model adopts a modular and plug-in strategy, allowing it to be easily integrated into existing brain network architectures. This flexibility means that the model is not only suitable for diagnosing autism but can also be extended to other types of neurological disease diagnostics or other complex graph representation learning tasks. 4.The paper provides a thorough explanation of the research background and necessity, clear descriptions of the methods and principles as well as the mathematical formulas, detailed experimental design, comprehensive control groups, and complete evaluation metrics, ensuring the rigor and reproducibility of the research.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The article mentions the advantages of the GBT model in terms of model complexity and computational resource consumption, but it lacks sufficient experiments and discussion in this direction.
    2. The paper only uses one dataset and has not been evaluated on other datasets, so its generalizability is yet to be validated.
    3. Although the paper mentions the main parameters set for the model, it does not analyze the reasons for choosing these parameters or discuss the tuning process.
    4. The model lacks interpretability, and the results section is missing some visual performance evaluation images, which reduces intuitiveness.
    5. As a modular structure, the method does not discuss or demonstrate the potential and advantages of integration with other models.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper uses a public dataset, provides a detailed explanation of the method’s principles and implementation, and offers a thorough description of the parameter settings, which facilitates the reproducibility of the research. However, the paper does not mention any plans for open sourcing. It would be beneficial if the authors could provide information about the open sourcing of the related code to facilitate replication and expansion by other researchers.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Dear Author,

    Your paper provides a novel geometric-oriented brain Transformer model for the diagnosis of autism, effectively improving upon existing methodologies. Having thoroughly reviewed your work, I believe the topic and methodology possess significant academic value and practical potential. However, to enhance the quality of your research and ensure rigor, I have the following suggestions that could help refine your paper:

    Model Complexity and Computational Resource Consumption: Please provide more experimental data regarding the model’s complexity and computational resource usage. A comparative analysis showcasing the GBT model’s advantages over existing models in terms of processing time and memory usage would be beneficial. Additionally, discussing the model’s performance across different hardware configurations would illustrate its feasibility in practical applications. Dataset Usage and Model Generalization: To validate the model’s generalization capabilities, testing should be expanded to include multiple datasets from various sources and types. This could involve not only different autism datasets but also datasets for other neurological disorders. Detailed reports of experimental results, including performance comparisons across these datasets, are essential. Parameter Setting and Tuning Process: The paper should clearly explain the logic behind parameter selection and the tuning process. Conducting a parameter sensitivity analysis to demonstrate how variations in parameters affect model performance and the rationale for final parameter settings would enhance the transparency and reproducibility of the research. Model Interpretability and Result Visualization: To improve the interpretability of the model and the intuitiveness of the results, more visualization tools such as ROC curves, confusion matrices, and attention maps should be added. These tools would not only help explain how the model works but also visually demonstrate its performance in various aspects. Application and Comparison of Modular Structure: Given the modular design of the GBT model, the potential and advantages of integration with other models should be explored. Practical integration experiments demonstrating performance enhancements when the GBT module is applied to other network architectures would bolster the model’s applicability and flexibility. With these comprehensive modifications, I believe your paper will gain wider recognition and applicability in both the academic and practical fields. I look forward to seeing further improvements and enhancements to your work. Sincerely

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Innovation and Technical Contribution: The model proposed in this paper is innovative in the field of brain functional Magnetic Resonance Imaging (fMRI) analysis for autism. It introduces new techniques and methodologies that could significantly impact the current diagnostic processes.

    2. Experimental Design and Results Validation: The authors have conducted extensive experiments on a public dataset, demonstrating the effectiveness of the model. Furthermore, ablation studies were carried out to validate the contributions of individual modules within the model.

    3. Potential Application Value: The GBT model features a modular and plug-in design, which allows it to be integrated into existing network architectures, enhancing the practicality of the research. While the paper has these strengths, there are areas for improvement, such as validation of the model’s generalizability, detailed analysis of model complexity, and computational resource consumption.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the AC and Reviewers for the constructive comments. We are encouraged by positive comments: “significant academic value and practical potential” (R#1), “significant advance in performance/outstanding evaluation/comprehensive ablation studies” (R#1/4/6), and “very good clarity and organization/strong presentation/clear logic/well-structured/innovative/novel” (R#1/4/5/6). We clarify all concerns as follows.

Q (R#1/4/5/6): Reproducibility. A: We will make the code available for scientific reproducibility.

Q (R#1/4/6): Dataset usage. A: We chose the ABIDE dataset for three main reasons: (i) it is publicly accessible, ensuring reproducibility; (ii) it has widespread usage in prior Autism diagnosis studies, ensuring data consistency; (iii) its data is from multiple diverse sites, ensuring data heterogeneity. Thus, ABIDE is suitable for evaluating the effectiveness of the proposed method on Autism diagnosis. Nevertheless, we agree with the reviewers’ suggestion, and we have explored extensive datasets in our future work, e.g., our method has a 4.14% ACC improvement over the best comparison on the HCP dataset.

Q (R#1): Model complexity and its application. A: Our method is more feasible for practical applications, with 0.17G FLOPs and 5.42M parameters, compared to Com-BrainTF with 0.72G FLOPs and 6.15M parameters.

Q (R#1): Extensions. A: Our future work will further analyze the model’s generalizability, interpretability, parameter setting, and visualization. Thanks again for the valuable suggestions.

Q (R#4): Comparison with prior works. A: Our work differs from the two works given in two key aspects: (i) Topic: We focus on intra-class compactness and inter-class diversity representation learning to explore the intrinsic characteristics of brain data, unlike the prior work on affinity matrix self-expressiveness learning; (ii) Methodology: We use rank constraints to enhance the network’s discriminative embedding power and satisfy the natural geometric properties of brain data for Autism diagnosis. In contrast, the prior work conducted feature extraction based on an independent subspace assumption, which may face scalability challenges.

Q (R#4): Novelty compared with FBNETGEN. A: FBNETGEN uses group losses to extract GNN features but ignores global constraints. Our main innovation is utilizing global rank-aware constraints, making a 10.5% ACC improvement. In addition, we are the first to impose intra-class and inter-class rank constraints into the brain-aware transformer, effectively exploring the natural geometric properties of brain data.

Q (R#4): Ablation study of rank-constraints. A: Table 2 in the submission shows the low-rank and high-rank constraints ablation study, where our strategy of simultaneously considering the low-rank and high-rank constraints can make a 2.3% ACC improvement.

Q (R#5): Mechanism explanation of our method. A: Our geometry-oriented supervised loss can capture individuals within the same disorder group by differentiating the inherent geometric properties of brain data, achieving discriminative representation learning.

Q (R#5): Paper organization. A: We will shorten the introduction in Sec. 2.2 and show the experimental analyses in a paragraph manner.

Q (R#5): SVD improves sensitivity. A: SVD improves sensitivity by focusing on important features and regularizing the model.

Q (R#6): Overfitting. A: To avoid overfitting, we used the rank k approximation with SVD (in Sec. 2.2), the dropout regularization technique with randomly dropping out neurons, and the pooling strategy with aggregating local information. We will add the details and revise Fig. 2 in our revision.

Q (R#6): t-test statistic. A: Our method achieved statistically significant results compared to Com-BrainTF (P value=0.009944 on specificity) and BrainNetTF (P value=0.034862 on ACC). Moreover, our method (76.9±3.8%) had a higher average and a lower std on the specificity metric than the best comparison (65.7±6.4%).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers agree there are many strengths to this paper, including an interesting novel approach, strong evaluation with ablation studies, and clear writing/presentation. These outweighed the weaknesses for most reviewers, which include concerns about analysis on additional datasets and comparisons to related approaches that consider intra/inter-class representations. I add an additional note that while the results in the table look impressive, it is not clear if the splits/evaluation approach for the competing results is the same as the method presented in this paper, and the author’s rebuttal note on specificity gap is incorrect as the 2nd highest specificity is ~71 with large std that overlaps with the proposed model’s results. Still, I agree with the majority of the reviewers and recommend accept.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Reviewers agree there are many strengths to this paper, including an interesting novel approach, strong evaluation with ablation studies, and clear writing/presentation. These outweighed the weaknesses for most reviewers, which include concerns about analysis on additional datasets and comparisons to related approaches that consider intra/inter-class representations. I add an additional note that while the results in the table look impressive, it is not clear if the splits/evaluation approach for the competing results is the same as the method presented in this paper, and the author’s rebuttal note on specificity gap is incorrect as the 2nd highest specificity is ~71 with large std that overlaps with the proposed model’s results. Still, I agree with the majority of the reviewers and recommend accept.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top