Abstract

Advances in neuroimaging have dramatically expanded our ability to probe the neurobiological bases of behavior in-vivo. Leveraging a growing repository of publicly available neuroimaging data, there is a surging interest for utilizing machine learning approaches to explore new questions in neuroscience. Despite the impressive achievements of current deep learning models, there remains an under-acknowledged risk: the variability in cognitive states may undermine the experimental replicability of the ML models, leading to potentially misleading findings in the realm of neuroscience. To address this challenge, we first dissect the critical (but often missed) challenge of ensuring the replicability of predictions despite task-irrelevant functional fluctuations. We then formulate the solution as a domain adaptation, where we design a dual-branch Transformer with minimizing Wasserstein distance. We evaluate the cognitive task recognition accuracy and consistency of test and retest functional neuroimages (serial imaging measures of the same cognitive task over a short period of time) of the Human Connectome Project. Our model demonstrates significant improvements in both replicability and accuracy of task recognition, showing the great potential of reliable deep models for solving real-world neuroscience problems.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2322_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Din_AWasserstein_MICCAI2024,
        author = { Ding, Jiaqi and Dan, Tingting and Wei, Ziquan and Laurienti, Paul and Wu, Guorong},
        title = { { A Wasserstein Recipe for Replicable Machine Learning on Functional Neuroimages } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a novel approach to enhance the replicability of task fMRI using a dual-branch transformer and Wasserstein distance. By employing domain adaptation techniques, the method effectively generates labels for unlabeled retest fMRI data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a compelling approach to address the replicability issue in task fMRI by leveraging domain adaptation techniques. Domain adaptation is particularly well-suited for handling data heterogeneity, which is a key challenge in achieving accurate retest fMRI results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In my view, data heterogeneity in fMRI often arises from differences in acquisition parameters or individual variability. However, the paper attempts to address this issue by directly comparing test fMRI and retest fMRI, which may limit its generalizability to other datasets. A more effective approach could be to apply domain adaptation to different components within the test fMRI and then transfer this knowledge to the retest fMRI. Alternatively, conducting group analysis could also be a viable option.
    2. The experimental results presented in the paper seem insufficient to fully demonstrate the effectiveness of the domain adaptation technique. To provide a clearer understanding of how the method works, it would be beneficial to include more comprehensive results.
    3. Regarding the details of the paper, Figure 3 could benefit from a better design. It would be helpful to include the inputs from both domains in the figure to improve readability and enhance the overall clarity of the presented information. The current figures lacks of expertise.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Regarding the domain adaptation technique, it appears that the method may not effectively address the heterogeneity between test and retest fMRI. To bolster the paper’s claims, more convincing results, including validation on additional datasets, should be provided to strengthen the evidence supporting the proposed approach.
    2. The description of Figure 3 requires improvement for better clarity and comprehensibility. For instance, while the paper has defined the inputs as x, the specific symbols representing x have been overlooked. Such oversights detract from the manuscript’s quality and should be addressed to enhance the overall presentation of the results.
    3. The authors should add more description how the transformer and Wasserstein distance contribute to increrase the replicability, because both these methods are generally used in deep learning research.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper aims to enhance the replicability between test and retest fMRI; however, the domain adaptation technique applied directly between test and retest fMRI may not be effective. A more reasonable approach could be to apply domain adaptation to address differences in fMRI parameters or individual variability. To validate the efficacy of the proposed method, the paper needs to include additional results to support its motivation and demonstrate its effectiveness.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Thanks for the answers. But in my view, the simple utilization of Wasserstein distance between test and retest data might not work well. The Wasserstein distance is widely used in deep learning research. To better convince me, the manuscript should contain more results. I increase one score, thanks for the answer.



Review #2

  • Please describe the contribution of the paper

    The paper proposed a neural network architecture to address the test-retest reproducibility problem in applying ML models to neuroimaging data. The network achieved this by using the Wasserstein loss to minimize the feature distribution between source (test) and target (retest) domains, where the features are outputs of each layer of two transformer branches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper addressed a significant problem, test-retest reproducibility, in applying ML to neuroimaging data and showed the effectiveness of the proposed method.
    • The authors compared the proposed method to a number of baselines, showing its superior performance.
    • The authors also plotted the attention maps of trained transformers, showing what brain regions were important in the classification tasks. This analysis also proves the effectiveness of their method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors failed to discuss in depth why their method had better performance than other methods.
    • The authors failed to mention other works on domain adaptation in neuroimaging (e.g. https://www.sciencedirect.com/science/article/abs/pii/S1361841522003358)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The paper’s major focus was introducing a domain adaptation (DA) technique to improve the test/retest reproducibility of fMRI task classification. Therefore, it would be beneficial to discuss in more depth why the proposed method is better than DANN, MCD, and DIRT-T. How vision tasks and fMRI differ?
    • There are other methods that addressed the DA problem in fMRI. Since the three DA baselines were proposed for vision tasks, how about those for fMRI?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is overall well-written with clear language and structure. The method itself is not very innovative as it’s not unusual to use Wasserstein loss on features of source and target domains, neither is the transformer architecture. But the paper addressed an important problem and experiments are comprehensive, showing the superior performance of the proposed method.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a novel dual-branch transformer model using Wasserstein distance for enhancing the replicability and accuracy of DL models on functional data. It uses the HCP dataset for carrying significant improvements in the task prediction using a domain adaptation approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Proposes a dual-branch transformer model with Wasserstein distance for robust domain adaptation.
    2. Provides integration of feature distribution alignment and end-to-end learning for improved task recognition.
    3. The brain mapping for task recognition is notable when carries with both test and retest data.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited generalizability due to reliability on a homogenous dataset. Should include more task based fMRI datasets to compare the generalizability of the model at least for testing.
    2. The experimental setting lack lot of information and is unclear.
    3. Lacks information on training parameters like epochs and other layer information like the use of dropout or any kind of activation like ReLU.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to the weakness section. Here are some additional comments:

    1. It would be interesting to carryout an evaluation on GNN based models to see its performance.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposal of Wasserstein distance is a great contribution and even there is weakness like generalizability of the model, if the authors could try the method on independent dataset out of HCP-task that would increase the validation of the approach.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I value that the authors explicitly understood and try to answer each of the concerns and I am satisfied with it.

    I want to still stand that this should be Accepted.




Author Feedback

We are glad that the reviewers have shown enthusiasm for our work, and we’re grateful for their constructive feedback.

R1: We’ll ensure that all references mentioned by this reviewer are included in the final version. Meanwhile, we will add more background introduction on fMRI studies to engage a broader audience. (And please refer to Q2 below for the in-depth discussion on our method.)

R3: We will add more details on experimental setting and model parameters in the final version. And we will include GNN models such as GCN, GIN, GSN and GNN-AK (we can still outperform them by testing). In addition to homogeneous dataset like HCP, evaluation on clinical dataset is certainly part of our future work.

R4: Q1: Overlook external data heterogeneity issue and intrinsic subject-to-subject variance.

A1: We completely agree with the reviewer’s observation that variations in neuroimaging data often arise from a combination of multiple sources. However, we would like to clarify that the effective solution for disentangling data heterogeneity is to use a data harmonization approach. Although both approaches share some common methodology components, we are addressing a different problem of learning replicability which is unique in fMRI studies. Our method is designed to establish a mapping between phenotypic traits and functional neuroimages(by characterizing inter-subject variations) and produce consistent results across different fMRI experimental settings. In this context, our proposed replicable deep model is an important piece (but not explored yet in MICCAI field) in fMRI research. Together with data harmonization approaches, we can deliver an integrated computational solution for performing data-driven studies using fMRI data.

Q2: Insufficient discussion on methodological insights.

A2: We will include the following discussions in the final paper. (1) Effectiveness of Wasserstein distance in fMRI data. Due to large inter-scan variance and substantial amount of external noise in the BOLD signals, the conventional measurements, based on the differences between two instances of time series, often have limited power to capture the intrinsic task-relevant variations. To address this issue, we formulate the problem into a distribution-to-distribution matching scenario. Our experiments also (empirically) demonstrate the advantage of Wasserstein distance on this problem.

(2) Compare with other DA models (such as DANN and MCD). Our method stands out by using two distinct feature extractors for the two domains, minimizing the distance at the feature level. Specifically, we observed that samples of different classes exhibit similar shift scales, indicating a relatively neat shifting pattern. This shift arises from (i)the different phase encoding directions for scan1 and scan2 during the HCP-task data acquisition; (ii) in the continuous fMRI acquisition, changes in the order of tasks will lead to different fluctuations in the processed BOLD. These factors suggest group-level differences rather than individual variability. Thus using the same feature extractor for two domains (used in DA models) might confuse the model. Meanwhile, DANN uses domain discriminators to close two domains, but with strong feature extraction (as we use Transformer), feature distribution matching becomes weak, leading to class mismatch. MCD relies heavily on the discrepancy discrimination by two classifiers. The neat shift makes it difficult for the model to identify challenging samples in the target domain, limiting MCD’s effectivenesses.

(3) Extensive evaluations. Our focus lies in benchmarking replicability specifically on HCP WM because of its widespread utilization. If page limit allows, however, we are committed to show more experimental results on (i)comparison with GNN models, (ii) task-related brain mapping for WM, (iii) tSNE plot of feature distribution before and after domain adaptation, (iiii) enhanced Fig.3 to improve clarity.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    A majority of reviewers lean towards accepting this work, while the third recommends a weak reject (Rev. 4). Upon reading the reviews, paper and the rebuttal, it seems to me that the approach is fairly reasonable for aligning distributions across test-retest paradigms to maintain performance of predictive models. While there are some parallels to methods addressing data-heterogeneity across sites, this is a broader problem which is probably beyond the scope of this work. On the strength of the experimental results, comparisons presented, I believe all major concerns have been addressed during the rebuttal.

    If this paper is to be accepted, I would like to remind the authors that the promise of adding additional results during the camera ready (and in fact the rebuttal rpocess) explicitly violates the MICCAI submission policies. Please also pay attention to missing inline references within text.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    A majority of reviewers lean towards accepting this work, while the third recommends a weak reject (Rev. 4). Upon reading the reviews, paper and the rebuttal, it seems to me that the approach is fairly reasonable for aligning distributions across test-retest paradigms to maintain performance of predictive models. While there are some parallels to methods addressing data-heterogeneity across sites, this is a broader problem which is probably beyond the scope of this work. On the strength of the experimental results, comparisons presented, I believe all major concerns have been addressed during the rebuttal.

    If this paper is to be accepted, I would like to remind the authors that the promise of adding additional results during the camera ready (and in fact the rebuttal rpocess) explicitly violates the MICCAI submission policies. Please also pay attention to missing inline references within text.



back to top