Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Multi-modal brain networks represent the complex connectivity between different brain regions from both functional and structural perspectives, which is of great significance for brain disease diagnosis. However, existing methods are limited to information fusion in the feature dimension, failing to fully exploit the complementary information between functional and structural connectivity networks. To address these issues, this paper proposes a cross-modal brain graph transformer (CBGT) method for brain disease diagnosis, which also provides an in-depth analysis of coupled functional-structural connectivity networks. Specifically, CBGT consists of two main modules: the cross-modal Transformer module enhances the attention mechanism by utilizing structural connectivity features extracted through machine learning methods, capturing long-range dependencies in the cross-modal brain network. The cross-modal topK pooling module combines information from both functional and structural connectivity networks to select significant regions of interest (ROIs) during the reconstruction of the pooled graph, aiming to retain as much effective information as possible. Experiments conducted on the ABIDE and ADNI datasets demonstrate that the proposed method outperforms state-of-the-art approaches. Interpretation analysis reveals that the proposed method can identify multi-modal biomarkers associated with brain diseases.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1238_paper.pdf

SharedIt Link: https://rdcu.be/eHc4z

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05162-2_24

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{FenJin_CrossModal_MICCAI2025,
        author = { Feng, Jingxi AND Xu, Heming AND Cai, Junhao AND Chang, Yujie AND Zhang, Dong AND Du, Shaoyi AND Wang, Juan},
        title = { { Cross-Modal Brain Graph Transformer via Function-Structure Connectivity Network for Brain Disease Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {247 -- 256}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper constructs a cross-modal brain graph Transformer (CBGT) method by applying cross-modal Transformer modules and topK pooling modules.It is used for the diagnosis of brain diseases and provides an analysis of the functional-structural connectivity network.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper pays good attention to the limitations of brain network analysis at the multi-modal fusion level and proposes a method for integrating complementary information in functional and structural connection networks.
2. The description in the method section of the article is relatively clear, which can help readers understand the overall architecture and implementation details of the method.
3. The performance of the model is evaluated on both the ABIDE and ADNI datasets, showing consistent performance and achieving moderate improvements.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1.The introduction has poor logical coherence which fails to systematically sort out the related work and clearly elaborate on the motivation and advantages of this method. 2.The key parameter settings are not described in detail, which affects the reproducibility of the method. 3.The ablation experiment is incomplete, making it difficult to fully verify the independent functions of each module.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The introduction lacks logical coherence and fails to clearly articulate the limitations of existing methods. While there have been numerous studies on both cross-modal learning and Transformer-based approaches in brain network analysis, the manuscript does not provide a comprehensive review of these works or clearly explain how the proposed method addresses their shortcomings.
2. The experimental section lacks detailed descriptions of key parameter settings used in the proposed method, such as the threshold for structural feature extraction and the selection of the k value in the top-k pooling operation. The absence of such details compromises the reproducibility of the results.
3. The ablation study is incomplete, ignoring the cases of only Transformer-unimodal, Transformer-unimodal + topK pooling-unimodal.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.
1. The motivation presented by the author in the rebuttal can well illustrate the limitations of current methods. It indicates that integrating the Transformer model and multi-modal data analysis for brain network research is of certain significance.
2. In the rebuttal, the author elaborates in detail on the parameter settings that lack descriptions, such as the k value in top-k pooling. Overall, these settings are reasonable and appropriate.
3. The explanation of the ablation experiment in the rebuttal is reasonable.

Review #2

Please describe the contribution of the paper

The paper introduces CBGT, a cross-modal brain graph transformer method that combines structural (DTI) and functional (fMRI) brain information for pathology diagnosis. They extend a cross-modal transformer with topK pooling to prioritize the most important ROI, and a soft voting mechanism to average decision over graph representation at each layer level. The approach is evaluated on 4 tasks using ADNI and ABCD datasets for autism and mild cognitive impairments. Their approach outperforms 9 baselines across the 4 classification tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is clear, and the method is well motivated and explained. The evaluation used relevant benchmarks, and baselines and sufficient ablation study.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

-Novelty: The contribution is iterative, given that the contributions combined have already been demonstrated relevant in previous work.

-Soundness: While the approach outperforms the baselines in most of the settings, when considering the confidence intervals, the methods seem to be equivalent.

-Impact: While the problem of combining functional and structural information is an active topic of research, and given the approach is using Transformers that are today capable of handling volumetric data, it is unclear why not combining all voxels from DTI images. The intermediate filtering and processing is definitely losing information. Can the authors elaborate why such “raw” approach is not considered?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The contribution seems to be too iterative, borrowing the mechanisms from existing work. The performance seem to overlap with previous approaches performance.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

I have read the authors answer and they did not adress my main critcism about the lack of novelty compared to approaches mentionned explicitly.

Of my other remarks they were not clearly adressed. For instance, the authors suggest that using Transformers to handle volumetric brains is not tractable, while it has been used the case for generatives tasks (much more complex than the classification task covered in this work) since 2022 and the ResViT paper by Delmaz et al. [1]

For my soundness criticism, the answer only mentions it is the case for “a majority of metrics”. This does not address the fact that the new approach is “comparable” to much older techniques, while being more complex.

By checking the other reviews and authors answers, I am moving my score to Reject.

[1] https://arxiv.org/abs/2106.16031

Review #3

Please describe the contribution of the paper

The paper proposes a cross-modal brain graph transformer (CBGT) method for brain disease diagnosis. It extracts the long-term dependencies in the cross-modal brain networks with the key features from the structural connectivity network and the enhanced mask from the functional connectivity network. Besides, a cross-modal top-K pooling approach is applied to combine the information between functional and structural connectivity networks for ROIs in the disease location. Multiple experiments on benchmarks support the effectiveness of the proposed methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

o The proposed CBGT explores the long-term dependencies in brain networks and supplies functional connectivity with structural connectivity, which is somewhat novel. o The proposed CBGT method outperforms the selected baseline methods and proves its effectiveness. o The paper seems well-written and is almost clear to me.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

o The proposed CGBT seems more complex than baselines, which may lead to higher computational consumption. According to the pipeline in Figure 1, the proposed CGBT holds more complicated modules to establish long-term dependencies and enable the cooperation between the structural and functional information, suggesting higher computational consumption. However, CGBT’s computational consumption is not mentioned in the literature, and it raises the question of whether the gained performance is due to the technical novelty of the proposed method or just the additional modules. o The influence of the layer amount in CGBT should be evaluated. The authors directly state that the optimal number of layers in CGBT is 2 while detailed analysis is missing, which makes the statement less convincing. o There still exist several mistakes and some are listed as follows: (1) Equation 10 ends with a comma while the following sentence begins with a capital letter; (2) The sentence “The cross-modal topK pooling module …” is missing its predicate.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

o The proposed CGBT seems more complex than baselines, which may lead to higher computational consumption. According to the pipeline in Figure 1, the proposed CGBT holds more complicated modules to establish long-term dependencies and enable the cooperation between the structural and functional information, suggesting higher computational consumption. However, CGBT’s computational consumption is not mentioned in the literature, and it raises the question of whether the gained performance is due to the technical novelty of the proposed method or just the additional modules. o The influence of the layer amount in CGBT should be evaluated. The authors directly state that the optimal number of layers in CGBT is 2 while detailed analysis is missing, which makes the statement less convincing. o There still exist several mistakes and some are listed as follows: (1) Equation 10 ends with a comma while the following sentence begins with a capital letter; (2) The sentence “The cross-modal topK pooling module …” is missing its predicate.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #4

Please describe the contribution of the paper
1. The development of a novel Transformer-based architecture specifically designed to integrate and leverage complementary information from both functional and structural brain connectivity for improved brain disease diagnosis.
2. This paper introduces a specific transformer module where features derived from structural connectivity are used to directly modulate the self-attention mechanism operating on the functional connectivity graph.
3. A novel pooling strategy is proposed that selects and retains the most significant ROIs by considering node importance derived from both functional features and structural connectivity information simultaneously.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper presents a well-motivated approach to multi-modal brain graph analysis. Integrating structural information directly into the attention and pooling mechanisms of a functional graph transformer is a significant step beyond simple feature fusion.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. I don’t understand how XGBoost can help select the important features here. Shouldn’t there be a task here to help select the important features?
2. The paper doesn’t specify how k (the number of nodes kept after pooling) is determined. Is it fixed, or does it vary per layer, and how it is decided. This is an important hyperparameter. It should be mentioned in the paper.
3. With such models developed, I would want to see its performance on a larger scale dataset.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a clever and well-motivated approach to multi-modal brain graph analysis. The strong results and clear ablation study make it convincing for the effectiveness of the proposed CBGT method.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We appreciate all the valuable comments. [R1,R4] Review and Motivation: The introduction of the paper provides an overview of representative methods from existing work. In the revised version, we will include a more comprehensive review of related studies. Previous Transformer-based methods have overlooked the complementary information between structural and functional connectivity. Meanwhile, existing cross-modal brain network learning approaches, primarily graph-based, tend to ignore the long-range dependencies. In contrast, CBGT effectively captures long-range dependencies in cross-modal brain networks, fully exploiting the complementary information between functional and structural connectivity, thereby enhancing the diagnosis accuracy.

[R1] Soundness：Based on five-fold cross-validation, we calculated the performance gap between CBGT and each baseline to construct 95% confidence intervals. For the vast majority of metrics, these confidence intervals were positioned to the right of zero. Additionally, t-tests further confirmed that CBGT significantly outperformed the baselines across the vast majority of metrics in various classification tasks (p < 0.05).

[R1] Raw Approach：Combining all voxels from DTI images results in high-dimensional inputs and significant computational cost. This study focuses on the interaction between functional and structural connectivity in cross-modal brain networks; therefore, we primarily use structural connectivity information from DTI.

[R3] Computational Consumption：The computational complexity of CBGT is O(N^2), where N represents the number of brain regions. The training time of CBGT across various classification tasks is consistently shorter than that of other multi-modal baselines. This is due to the cross-modal TopK pooling, which reduces the number of nodes, effectively decreasing the graph size and computational cost in subsequent layers. Moreover, ablation studies demonstrate that the performance gains of CBGT primarily arise from the synergy between the novel cross-modal Transformer and cross-modal TopK pooling, which efficiently capture long-range dependencies and critical ROIs in cross-modal brain networks.

[R3,R4,R5] Parameter Settings and Impact：Using grid search, the structural feature selection threshold p is set to 3, and the optimal number of CBGT layers L is determined to be 2 . Fewer layers limit the modeling of cross-modal complementarity and long-range dependencies, while more layers increase the risk of overfitting. The k value in the topK pooling varies by layer and is set to approximately 60% of the number of nodes in that layer. Smaller k may result in the loss of important information from key ROIs, while larger k introduce redundancy and overfitting.

[R4] Ablation Study: The existing ablation settings in the paper have demonstrated the individual contributions of the cross-modal Transformer and the cross-modal topK pooling module to performance improvement. Additional ablation study for “unimodal Transformer” and “unimodal Transformer + unimodal topK pooling” are not essential, as they do not involve cross-modal fusion strategies and represent more basic settings. Their performance is inevitably inferior to the ablation cases already presented. In fact, we conducted these additional experiments during the study, and the results confirmed this point. However, due to space limitations, they are not included in the paper.

[R5] Selection of Important Features: Extracting important features from the structural connectivity (SC) network is an independent task conducted prior to CBGT training. The normalized SC matrix is flattened into a feature vector, which is then fed into XGBoost for classification to obtain the importance scores of each feature. Important features are subsequently selected using a threshold-based filtering method.

[R5] Larger scale dataset: We are currently processing more data and preparing to conduct experiments on larger scale dataset.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Reviewers agree that this paper presents an interesting structural + functional model for multimodal neuroimage analysis. After rebuttal, which helped clarify concerns regarding missing details/analyses, appropriate motivation for the methodology, and experimental settings, all reviewers agree that this paper should be accepted.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The reviewers raised no objections post-rebuttal. The paper addresses a timely and meaningful topic, i.e., multimodal brain networks. I recommend acceptance.

back to top

Cross-Modal Brain Graph Transformer via Function-Structure Connectivity Network for Brain Disease Diagnosis

Author(s):