Abstract

In MRI-based mental disorder diagnosis, most previous studies focus on functional connectivity network (FCN) derived from functional MRI (fMRI). However, the small size of annotated fMRI datasets restricts its wide application. Meanwhile, structural MRIs (sMRIs), such as 3D T1-weighted (T1w) MRI, which are commonly used and readily accessible in clinical settings, are often overlooked. To integrate the complementary information from both function and structure for improved diagnostic accuracy, we propose CINP (Contrastive Image-Network Pre-training), a framework that employs contrastive learning between sMRI and FCN. During pre-training, we incorporate masked image modeling and network-image matching to enhance visual representation learning and modality alignment. Since the CINP facilitates knowledge transfer from FCN to sMRI, we introduce network prompting. It utilizes only sMRI from suspected patients and a small amount of FCNs from different patient classes for diagnosing mental disorders, which is practical in real-world clinical scenario. The competitive performance on three mental disorder diagnosis tasks demonstrate the effectiveness of the CINP in integrating multimodal MRI information, as well as the potential of incorporating sMRI into clinical diagnosis using network prompting.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4296_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HuXin_Learning_MICCAI2025,
        author = { Hu, Xingcan and Wang, Wei and Xiao, Li},
        title = { { Learning 3D Medical Image Models From Brain Functional Connectivity Network Supervision For Mental Disorder Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    Network prompting protocol which leverages only 3D T1w MRI images from suspected patients and a small amount of FCNs from different patient classes for diagnosis of mental disorders.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper has a strong introduction, clearly articulating the problem statement being addressed in the work.

    2. The proposed concept of network prompting is particularly suitable for scenarios with limited downstream data, where the classes are predefined. This approach leverages part-based significance, potentially enhancing performance in classification tasks under constrained data availability and wide heterogeneity.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The authors introduce the concept of network prompting for final classification. Although this approach shows improved performance compared to linear probing on the ABIDE dataset, it still falls short compared to FCN-based methods. This raises the question of whether using structural MRI alone is an optimal strategy for generalizing across datasets.

    2. Another limitation of the proposed method is that it requires some FCN samples to be available during the evaluation phase. To overcome this dependency, the authors might consider using hyper-networks or similar techniques to generate subject-specific FCN embeddings directly from structural MRI data.

    3. Despite the clearly defined problem statement and somewhat promising results (inconsistent across datasets), the methodological components utilized in this work are already well-established. Therefore, the novelty of the contribution appears limited.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Considering that paired structural MRI and FCN data are available during training, it might be worthwhile to explore the possibility of learning weighted multi-modal embeddings that jointly represent structural MRI and FCN features. The weighing mechanism may provide a way to balance the impact of MRI vs FCN during evaluation. To overcome the constraint of only MRI data availability, the authors might consider using hyper-networks or similar techniques to generate subject-specific FCN embeddings directly from structural MRI data. This approach could potentially improve performance in both the network prompting and linear probing evaluations.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please check weaknesses.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed all of the concerns.



Review #2

  • Please describe the contribution of the paper

    Authors introduce CINP (Contrastive Image-Network Pretraining), a contrastive learning framework for mental disorder diagnosis that learns visual representations of 3D structural MRI (sMRI) using functional connectivity networks (FCNs). The proposed approach uses 3 self-supervised objectives: (1) image-network contrastive learning, (2) masked image modeling, and (3) image-network matching, to align and enhance feature representations across modalities. A key contribution is the proposed network prompting protocol, which enables diagnosis using only sMRI and a small number of FCNs from each diagnostic class. The method is trained on four public datasets (HBN, HCP, QTIM, and CNP) and evaluated on three public datasets (ABIDE, ADHD, SRPBS) for mental disorder classification, and compared with sMRI-based, FCN-based, and multi-modal baselines. The proposed approach demonstrates overall competitive performance. An ablation study is done on the different pretraining objectives, and the network prompting approach demonstrates promising results in low-resource FCN settings.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths:

    • Overall, the paper is well-written and organization is clear.
    • The paper tackles a clinically relevant challenge: enabling accurate mental disorder diagnosis using structural MRI, while compensating for the limited availability of functional MRI (fMRI) data.
    • The proposed network prompting protocol is interesting, allowing the model to make diagnostic inferences using only a small subset of functional connectivity data, which makes sense in real-world clinical constraints.
    • The experimental setup is overall robust, with four datasets used for pretraining and three for evaluation across different diagnostic tasks. While the model doesn’t outperform every baseline, the overall performance is competitive, and the main contributions lie in diagnostic protocol.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Weaknesses:

    • While the core methodology is solid, some aspects of the methods section could be clearer, particularly the mathematical notation and the formulation of the network prompting strategy. These issues are relatively easy to fix, but they currently hinder full comprehension.
    • The evaluation is generally strong, but some key experimental details are missing, such as more information on data splits, training procedures, and implementation details, which limits the reproducibility and interpretability of the results (see major comments for more details).
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Major comments:

    • The masking ratio used in the masked image modeling task is an important hyperparameter. How it was selected or tuned?
    • The image-network contrastive loss appears to follow a CLIP-like formulation, but the mathematical notation is dense and hard to follow.
    • For the image-network matching loss, it is unclear whether all image-network pairs in the batch are used or if only a subset is selected (e.g., hard negatives). This should be clarified.
    • Loss weights alpha and beta in the final loss function (Eq. 5). How were these set or tuned? Was a validation set used?
    • The network prompting mechanism is an interesting idea, but its formulation is heavy. In particular, the distinction between k classes and r subsets per class is unclear. Why not set r equal to the number of classes, or clarify how subsets are defined and sized?
    • As I understand it, the pooling of embeddings within each subset is done via averaging. Why was this choice made? Were other strategies considered? Similarly, when computing similarity between image and reference networks, why average the embeddings rather than the similarity scores themselves?
    • In the implementation section, the SVM classifier is introduced before clearly explaining how it’s used within the evaluation protocols (e.g., linear probe). A clearer description earlier in the methods or experimental setup would help.
    • The paper states that baseline models were fine-tuned for only 10 epochs, which seems low. Was the proposed method also fine-tuned for the same number of epochs? If not, this could introduce an unfair advantage.
    • In the result analysis, it would be interesting to discuss whether the benefit of CINP is more evident in datasets where uni-modal FCN models already perform well. This could help interpret whether CINP is transferring rich functional information, or simply boosting weak sMRI baselines.

    Minor comments:

    • Figure and Table captions could be expanded to be more self-contained.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written, with interesting technical ideas, clinical relevance, and a strong experimental setup across multiple datasets and diagnostic tasks. However, I have some concerns regarding clarity, particularly in the methodological formulations, and would welcome additional details and analysis to strengthen the work. I therefore recommend score 4. Weak Accept.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel framework called Contrastive Image-Network Pre-training (CINP), which leverages contrastive learning between structural MRI (sMRI) and functional connectivity networks (FCNs) to enhance visual representation learning and cross-modal alignment.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a novel framework called Contrastive Image-Network Pre-training (CINP), which leverages contrastive learning between structural MRI (sMRI) and functional connectivity networks (FCNs) to enhance visual representation learning and cross-modal alignment. Specifically, paired 3D T1w MRI images and FCNs are fed into a visual encoder and a network encoder, respectively, to extract embeddings. A cosine similarity matrix between the image and network embeddings is used to compute a contrastive loss. The method integrates masked image modeling and image-network matching to boost representation quality. Furthermore, the authors introduce a network prompting, which enables mental disorder diagnosis using only the MRI of a suspected patient and a few FCNs from different patient groups.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Data selection lacks transparency and hinders reproducibility: The authors conduct experiments on ABIDE, ADHD, and SRPBS datasets, but the number of subjects used does not match the full datasets. It is recommended to clarify the selection criteria, reference any prior works used for sample selection, and ideally provide subject IDs to improve reproducibility.
    2. Lack of generalization validation for the prompting method: The proposed network prompting strategy is only applied within the CINP framework. The authors are encouraged to apply this method to other contrastive pretraining baselines to test its generalizability and effectiveness across models.
    3. Conclusions need literature support: The paper concludes that sMRI is more effective for ADHD diagnosis, and that ASD identification may require more functional information. This important claim should be backed by relevant literature to ensure that it is not solely based on experimental outcomes.
    4. Unexpected results require clarification: In Table 2, the model using only FCNs achieves significantly higher MCC on the SRPBS dataset compared to other models. The authors should provide explanations for this outlier result—whether it is due to data imbalance, model-specific advantages, or experimental artifacts.
    5. This paper compares different pre-trained model approaches by directly using the pre-trained weights of other models. However, due to differences in the datasets used during the pre-training phase, such direct comparisons may introduce a degree of unfairness.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Inspired by the idea of CLIP, the authors propose a novel brain disorder-oriented pretraining framework that integrates two different modalities: T1w structural images and functional connectivity (FC) data, enabling cross-modal joint modeling. However, the reproducibility of the method is limited, and the experimental results do not provide sufficient support for the proposed framework, which undermines the overall credibility of the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The author addressed some of my questions, but certain concerns remain unresolved and were not fully explained.




Author Feedback

We appreciate the reviewers for their constructive feedback. We clarify main concerns as follows (R: Reviewer; W: Weakness; C: Comment).

[R2W3] This paper is among the first to employ contrastive learning between sMRI and FCN, which transfers functional knowledge from FCNs to enhance sMRI embeddings. We further propose a network prompting protocol that enables CINP to generalize to unseen datasets and disorders within a few-shot learning paradigm. In addition, as acknowledged by R1, our work addressed a clinical challenge: enabling accurate mental disorder diagnosis using sMRI while fMRI are limited or unavailable. We apologize for the omission of some experimental details due to page limits. We will incorporate them in the revised version. Besides, we make the following explanations to clarify some main concerns. [R1W2&R3W1] We recruited as many samples as possible; however, some were excluded due to low quality (e.g., NaN values in FCNs). The data split is presented on Page 6. Subject IDs can be obtained on requests. [R1C1] We selected the masking ratio following Ref. 23. [R1C4] Since the importance and scale of three losses are comparable, alpha and beta were set to 1 based on the validation set performance. [R1C8&R3W5] We fine-tuned the sMRI-based methods for 10 epochs to account for modality differences between the pre-training and target datasets. Both linear probing and fine-tuned results are reported in Tab 2. CINP was not fine-tuned because there is no modality gap, and we aimed to test its generalizability on unseen datasets and diseases. [R2W2] Using CINP with linear probing does not require FCNs. Since it is feasible to obtain some FCNs in clinical scenarios, the CINP can be generalized to other mental disorders or data domains using a few-shot learning paradigm. We appreciate R2’s suggestion regarding the generation of FCNs from sMRI, which is a promising direction for future exploration. [R3W2] The network prompting is specifically designed for CINP and is difficult to apply directly to other contrastive methods. However, we believe our approach can inspire further research and benefit the community. [R2W1&C1] Since fMRI is not routinely collected, most patients do not have FCNs for diagnosis. This limitation motivates our approach: transferring functional information from FCNs to sMRI embeddings, enabling diagnosis using sMRI alone. Our CINP only falls short on ABIDE compared to FCN-based methods, which is likely due to the varying diagnostic efficacy of MRI modalities across different disorders, as discussed on Page 7. [R3W3] Our conclusion is indeed drawn from our experimental observations. To avoid potential misunderstandings, we will correct the statement and further investigate this in future work. [R3W4] The main reason for the outlier result is data imbalance. The difference in MCC is relatively minor and may not be statistically significant. CINP’s superior accuracy suggests more consistent predictions across the dataset. [R1C2,7,9,10] Thank you for pointing out these issues. We will revise them in a more concise and understandable way. [R1C3] As stated on Page 4, image-network pairs which do not originate from the same subject but exhibit high similarity are sampled for the INM loss. [R1C5] The k represents the number of subject classes (e.g., ASD, MDD and HC→k=3). If each disorder includes 30 FCNs, when r=5, we average every 6 FCNs to construct 5 reference networks (RNs); when r=3, every 10 FCNs are averaged to obtain 3 RNs. [R1C6] As differences between classes (e.g., HC and ASD) are primarily observed at the group level (Refs. 11, 30), we construct group-level RNs by averaging the embeddings to mitigate individual variability. The final classification score is computed by averaging the similarity scores between the RNs and the sMRI embedding. This ensemble strategy help improve performance.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    In my opinion, the paper was already close to acceptance before the rebuttal. This is an interesting idea with novel technical aspects and a good experimental setup. The results also showed how even in the lack of resting-state fMRI, CINP could provide results that were higher than other methods that only relied on T1-weighted imaging.

    With the rebuttal, the authors answered all the questions and concerns raised with the reviewers. Therefore, there is no doubt that the paper should be accepted.

    My only minor concern, which does not detract from the methodology, is the fact that autism and ADHD are disorders that are hard to diagnose and can probably not be reduced to a process of image analysis, even if the results show there are cases that can be clearly distinguished. My point is that I would have preferred a focus on other neurological problems that might be better suited. But, once again, this does not detract from the quality of the paper and it does not affect my recommendation for acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers agree that the article has merit and is of interest to the MICCAI community. Although the comparison with state-of-the-art methods is not complete, for example, teacher-student approaches based on knowledge distillation are not included, the article presents interesting preliminary results.



back to top