Abstract

Asymptomatic neurocognitive impairment (ANI) is a predominant form of cognitive impairment among individuals infected with human immunodeficiency virus (HIV). The current diagnostic criteria for ANI primarily rely on subjective clinical assessments, possibly leading to different interpretations among clinicians. Some recent studies leverage structural or functional MRI containing objective biomarkers for ANI analysis, offering clinicians companion diagnostic tools. However, they mainly utilize a single imaging modality, neglecting complementary information provided by structural and functional MRI. To this end, we propose an attention-enhanced structural and functional MRI fusion (ASFF) framework for HIV-associated ANI analysis. Specifically, the ASFF first extracts data-driven and human-engineered features from structural MRI, and also captures functional MRI features via a graph isomorphism network and Transformer. A mutual cross-attention fusion module is then designed to model the underlying relationship between structural and functional MRI. Additionally, a semantic inter-modality constraint is introduced to encourage consistency of multimodal features, facilitating effective feature fusion. Experimental results on 137 subjects from an HIV-associated ANI dataset with T1-weighted MRI and resting-state functional MRI show the effectiveness of our ASFF in ANI identification. Furthermore, our method can identify both modality-shared and modality-specific brain regions, which may advance our understanding of the structural and functional pathology underlying ANI.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0262_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0262_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Fan_AttentionEnhanced_MICCAI2024,
        author = { Fang, Yuqi and Wang, Wei and Wang, Qianqian and Li, Hong-Jun and Liu, Mingxia},
        title = { { Attention-Enhanced Fusion of Structural and Functional MRI for Analyzing HIV-Associated Asymptomatic Neurocognitive Impairment } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed an attention-enhanced structural and functional MRI fusion (ASFF) framework for diagnosing asymptomatic neurocognitive impairment (ANI) in HIV-infected individuals. ASFF integrated structural and functional MRI data, employed attention mechanisms to capture inter-modality relationships, and introduced semantic constraints for feature consistency.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • The paper addresses the limitations of existing methods by proposing a comprehensive framework that integrates both structural and functional MRI modalities for ANI analysis. • The paper introduces novel fusion techniques, including attention mechanisms and semantic constraints, to effectively integrate information from multiple modalities. These techniques enhance the interpretability and diagnostic accuracy of the proposed framework. • The effectiveness of the proposed framework is demonstrated through rigorous experimental validation on a dataset comprising subjects with HIV-associated ANI.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Provide more insight into the rationale behind selecting specific architectural choices (e.g., the number of convolutional blocks, kernel size) in the convolutional neural network (CNN) for data-driven feature extraction.

    Clarification is needed regarding the selection criteria for parameters such as the number of regions-of-interest (ROIs), the choice of the AAL atlas, and the justification behind employing the Transformer for modeling temporal features.

    Derivation of the new vectors (query, key, value) lacks clarity, especially regarding the choice of the learnable weight matrices (WQS, WKS, WVS).

    Provide insights into how these matrices are initialized and updated during training.

    Justification is needed regarding the selection of hyperparameters (especially for batch size and learning rate schedule) and optimization strategy (e.g. Adam).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Provide more insight into the rationale behind selecting specific architectural choices (e.g., the number of convolutional blocks, kernel size) in the convolutional neural network (CNN) for data-driven feature extraction.

    Clarification is needed regarding the selection criteria for parameters such as the number of regions-of-interest (ROIs), the choice of the AAL atlas, and the justification behind employing the Transformer for modeling temporal features.

    Derivation of the new vectors (query, key, value) lacks clarity, especially regarding the choice of the learnable weight matrices (WQS, WKS, WVS).

    Provide insights into how these matrices are initialized and updated during training.

    Justification is needed regarding the selection of hyperparameters (especially for batch size and learning rate schedule) and optimization strategy (e.g. Adam).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Provide more insight into the rationale behind selecting specific architectural choices (e.g., the number of convolutional blocks, kernel size) in the convolutional neural network (CNN) for data-driven feature extraction.

    Clarification is needed regarding the selection criteria for parameters such as the number of regions-of-interest (ROIs), the choice of the AAL atlas, and the justification behind employing the Transformer for modeling temporal features.

    Derivation of the new vectors (query, key, value) lacks clarity, especially regarding the choice of the learnable weight matrices (WQS, WKS, WVS).

    Provide insights into how these matrices are initialized and updated during training.

    Justification is needed regarding the selection of hyperparameters (especially for batch size and learning rate schedule) and optimization strategy (e.g. Adam).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a multi-modal fusion of sMRI and fMRI via cross attention mechanism. its contributions are: 1- multi-modal fusion of MRI-based datasets 2- A mutual cross-attention fusion module designed to identify underlying relationship between sMRI and fMRI. This type of fusion is interesting and powerful in capturing distinctive features via attention mechanism. 3- A semantic inter-modality constraint was also introduced to encourage consistency of multimodal features. facilitating effective feature fusion.

    This type of multimodal fusion and using cross attention is interesting and recent in the literature, e.g., combination of transformer and CNN and then using cross attention.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I would say the cross attention layer used to fuse the feature modalities is the major strength of this paper. Downstream analysis that visualizes the top signals to the brain is also good to get insights into the post-analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In Table 1, the proposed learner is compared with 10 competing methods. What I expected to see is that the proposed method should be mostly compared with attention based SOTA algorithms while most of the competing methods are not equipped with an attention mechanisms except ViT and GAT.

    My main concern is for the three types of losses used to define the loss function of the proposed method. It is not obvious which loss has the great impact on the classification metrics. It would be useful if authors could measure the loss values and show which of the three losses has significant contribution to the obtained results. Perhaps, using additional hyper-parameters influencing on each loss could emphasize/de-emphasize their contributions to the model performance if seeking for higher accuracy.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper introduces a new multi-modal cross attention based mechanism for Analyzing HIV-Associated ANI. The proposed method outperformed many competing learners. The downstream analysis and ablation study further support and justify their type of modelling. This kind of cross attention mechanism is recent, innovative and interesting in the literature. The paper is also well-written and well-organized and is relatively easy to understand.

    However, there are some concerns that should be addresses:

    1- the fundamental of this multi-modal analysis is on cross attention module. Hence it should be compared with SOTA attention based models. Although, authors compared their model with GAT and ViT, it would be great if they could add more attention based methods.

    2- The formulation of the three loss functions and their combination should be better elaborated. And it is not obvious which one of the losses has a major impact on the model performance.

    3- There could be an experiment to reveal what ROIs could be verified as spurious or additional important ROIs after the addition of sMRI. For example, authors can set two types of experiments/training: 1- visualize the top ROIs of fMRI on brain using only fMRI dataset. 2- Visualize the top ROIs of fMRI on brain using both fMRI and sMRI. visualization of 1 and 2 could reveal which ROIs are added/removed by supporting sMRI signals. This is very specific comment and the authors can add the results of this experiment to the supplemental file.

    I recommend for the acceptance of the paper after addressing the above comments.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The recent and novel approach of multi-modal cross attention method and the downstream analysis are the main factors for my overall score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Author proposed a fusion model, which integrates features from two different modalities. One modality is T1-weighted MRI and another one is Functional MRI. Features from these modalities are fused using a attention module. The results show the efficacy of the proposed work.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novelty in terms of fusing the structural and functional modalities for the detection of HIV-associated impairment.
    2. Authors propose multi-cross attention fusion module.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Authors have not performed experiments with recent works (only once recent work). Most of the papers in the experiment are not new.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Author proposed a fusion model, which integrates features from two different modalities. One modality is T1-weighted MRI and another one is Functional MRI. Features from these modalities are fused using a attention module. The results show the efficacy of the proposed work. There are few comments:

    1. Authors have not performed experiments with recent works (only once recent work). Most of the papers in the experiment are not new.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The novelty of the proposed method.
    2. The ablation study show the effectiveness of the fusion module.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We greatly appreciate the Area Chairs and the three Reviewers (R1&R3&R4) for their detailed review of our work, and we are honored to receive the feedback of being “early accepted”. The constructive comments provided by the reviewers greatly improve the quality and clarity of our manuscript. In response to the comments, we have made several clarifications to address the concerns raised, as outlined below.

Q1: Elucidate the rationale behind our selection of specific architectural choices in CNN for data-driven feature extraction, as well as elaborate on the selection of network hyperparameters (R1). A1: The architectural choices and hyperparameter selection in our CNN are based on empirical experience. Besides, we observed that modest changes in the architecture and hyperparameters do not affect the model’s performance. For example, we currently employ four convolutional blocks, and the classification results show minimal variance while using three blocks instead.

Q2: Provide the rationale behind employing the Transformer for modeling temporal features (R1). A2: In fMRI feature extraction, we utilize the sliding window technique to segment fMRI time series into multiple segments and leverage the Transformer to capture temporal fMRI features. The rationale behind this lies in the Transformer’s ability to capture long-range dependencies in sequential data, such as fMRI data. Given that neural activity captured by fMRI evolves over time, Transformer excels at capturing the dependencies across multiple segments by processing the entire fMRI sequence simultaneously.

Q3: Regarding the reproducibility of the proposed method (R1). A3: We’ll release our source code to support reproducible research, and the GitHub link will be provided in the final version.

Q4: Regarding the suggestion to add new experiments, such as SOTA methods and attention-based methods (R3&R4). A4: We thank the reviewers for their valuable suggestion. We intend to incorporate additional experiments, including comparisons with SOTA methods and attention-based approaches, in extended versions of our work in the future.

Once again, we express our gratitude to the Area Chairs and the Reviewers for their invaluable feedback.




Meta-Review

Meta-review not available, early accepted paper.



back to top