Abstract

Existing medical image representations are typically processed into grid or sequence structures via Convolutional Neural Network (CNN) or Vision Transformers. However, these methods struggle to flexibly capture irregular lesion regions and reveal relationships between lesions, especially in 3D medical imaging. To address this, we transform medical images into graph structures and propose MedGNN, a general recognition network based on Graph Neural Network (GNN) visual representations. We first segment the image into patches and treat each patch as a node, constructing graph visual embeddings via the K-Nearest Neighbor algorithm. Then, we propose multi-scale dynamic max-relative graph convolution for feature aggregation and updating. To mitigate over-smoothing in graph models, we design a feature-enhanced feed-forward network to refine feature representations. Experiments show that MedGNN achieves strong competitive performance across various 2D and 3D medical image recognition datasets. Moreover, it visualizes lesion relationships through graphs, enabling interpretable analysis based on graph structures.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0526_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/IMCTGD/MedGNN

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YeJia_MedGNN_MICCAI2025,
        author = { Ye, Jiayu and Zeng, An and Pan, Dan and Chen, Junhao and Cheng, Guanwei},
        title = { { MedGNN: General Medical Image Recognition Network via GNN Visual Representations } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {334 -- 343}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Summary: this paper proposes a graph-based approach for binary classification tasks on 2D and 3D medical images. Specifically, they introduce a procedure to transform an input volume into a graph by manually setting the neighborhood of each node with a KNN strategy and they propose to encode the generated graph with a custom GNN-like neural network. The model is evaluated on three brain imaging datasets (ABIDE, OASIS, ADNI) of subjects with brain disorders and two chest X-rays datasets of subjects with pneumonia and healthy controls. The code should be released upon acceptance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Encoding medical images as graph offers a new perspective in patterns recognition and it could offer insight in neuroscience to analyze the structural brain networks involved in brain disorders
    • Both 2D and 3D medical datasets are considered in this study
    • The proposed model provides competitive results with standard networks (CNN and Transformers) for the classification tasks
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • One main concern about this approach is the definition of the nearest neighbors in the first stage. This is critical since it sets the prior knowledge about the image (which node is connected to which), that is then used by the GNN to encode it. Authors mention “K-Nearest Neighbor” without giving details on the procedure. I have two interpretations: 1) they used a standard Euclidean distance between 2D (chest X-ray) or 3D (brain MRI) input patches to define the graph. This should be very sub-optimal as different intensities or contrasts can completely bias the graph; 2) they used a ViT-like way to “patchify” the input (akin to ConvNext) where 4x4 non-overlapping convolution is applied to produce a list of patches and they Euclidean distance is used to define k-NN. This procedure is very close to the original ViT since the graph is learned during training. Could the authors clarify how they performed it?
    • The second novelty of this work – custom graph convolutions and “feature-enhanced network”- does not appear clearly motivated and no ablation on these components is performed. Specifically, I am still skeptical on the proposed modules, and I would like to see a comparison with standard GNN or GAT to encode the input graph. Considering the very small size of the input datasets (N < 1000 for all 3D brain MRI datasets and N < 10k for 2D chest X-ray datasets), I wonder whether the authors just over-fit on these datasets by adding more parameters to the network (e.g. “overfitting by observer” [1])
    • Baseline experiments: while several baselines are shown to compare the proposed approach (ResNet, Vision Transformer, PointNet, etc.), there is no reference on how the models were tuned/adapted for the input data. For instance, I do not know which ResNet model was chosen and what was the hyper-parameter tuning strategy. This is at least as critical as the results of the proposed model (again, see [1]).

    [1] Machine learning for medical imaging: methodological failures and recommendations for the future, Varoquaux et al., npj digital medicine 2022

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall comment: while this study has merit regarding the proposed graph-based analysis of medical datasets, I have major concerns regarding the cross-validation of the models and the ablation of the main components proposed in the paper, especially because of the small sample size and the risk of over-fitting.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose MedGNN, a graph neural network for structural MRI. The main blocks of the proposed network are Multi-Scale Dynamic Max-Relative Graph Convolution and Feature-enhanced Feed-forward Network. The authors extensively evaluate it against a rich set of benchmarks including CNNs, transformers and other architectures and demonstrate the superiority of the proposed method.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Evaluation of the proposed method against a rich set of baselines.

    A comprehensive literature review was carried out.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    As there are no standard error intervals in the paper, there is no statistically significant evidence that the proposed method outperforms the baselines.

    The absence of graph neural network baselines is therefore a significant shortcoming. The application of graph methods to structural MRI analysis is not new (for example, it has already been done in “GraphMriNet: a few-shot brain tumor MRI image classification model based on Prewitt operator and graph isomorphic network”, “Detection of Alzheimer’s Disease Using GraphRegularized Convolutional Neural Network Based on Structural Similarity Learning of Brain Magnetic Resonance Images”, “Graph Transformer Geometric Learning of Brain Networks Using Multimodal MR Images for Brain Age Estimation”, “Interpretable Graph Convolutional Network Of Multi-Modality Brain Imaging For Alzheimer’s Disease Diagnosis”).

    As the optimum results for the ablation over weight coefficient alpha were obtained for a value of 0.9, and no experiment has been conducted for alpha=1.0 when Extended Deformable Convolution (EDC) degenerates to a linear mapping. It remains unclear whether there is any advantage to linear mapping.

    There is no ablation over the Feature-enhanced Feed-forward Network. Furthermore, there is an absence of any theoretical motivation that would justify the functionality of this block in”mitigates oversmoothing in graph models and further enhances data augmentation”.

    There is a paucity of information regarding the methodology employed in the model to derive the final prediction.

    The specific distance function employed for the identification of the k nearest neighbours via the K-Nearest Neighbour algorithm remains undisclosed.

    The findings of the study indicate that MedGNN-S exhibits superior performance in comparison to MedGNN-L in the classification task of ASD versus HC (ABIDE I). This outcome stands in opposition to the assertion that “MedGNN-B and MedGNN-L consistently achieve optimal performance across different tasks”.

    The ‘Experiment and Results’ section is lacking in information regarding the number of models and parameters that are to be compared. The authors have asserted that it is ‘important to note that the comparison models use the same parameter and structural configurations’.

    There is an absence of any decoding of abbreviations for the term “MCI” in “Experiment and Results” section.

    There is no information about class ratio for the dataset Chest X-Ray Images (Pneumonia).

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary factors contributing to the “could be rejected, dependent on rebuttal” score are as follows: the absence of graph neural network baselines, the inability to compare with and the lack of ablations, which demonstrate the advantages of the proposed neural network blocks.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is the development of MedGNN, a new network for medical image recognition that uses GNNs. Instead of treating images as grids (like CNNs) or sequences (like Transformers), MedGNN turns images into graphs. It divides the image into patches, treats each patch as a node, and connects them based on proximity (using K-Nearest Neighbor) to form a graph. This graph structure is then processed by the GNN to recognize patterns, aiming to better capture irregular shapes (like lesions) and their relationships. The authors also propose specific components like multi-scale dynamic max-relative graph convolution (MGC) for feature processing and a feature-enhanced feed-forward network (FFFN) to tackle common GNN problems like over-smoothing.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The core idea of representing 3D medical images, specifically anatomical scans like sMRI, as graphs processed by a GNN is relatively novel for recognition tasks. While GNNs have been used in fMRI analysis, applying them directly to the spatial structure of sMRI for disease classification by turning image patches into graph nodes is an interesting approach. It moves away from standard grid/sequence assumptions, potentially offering a more flexible way to model non-local relationships between image regions (lesions).
    2. The graph-based approach offers a way to visualize and potentially understand the relationships the model learns between different image patches (nodes). The paper shows visualizations suggesting learned connections correspond to known brain regions or lesion associations, which could be valuable for clinical trust and understanding.
    3. The paper presents results across multiple 2D and 3D datasets (ADNI, OASIS, ABIDE I, SARS-CoV-2, Chest X-Ray). MedGNN consistently performs well, often outperforming established CNN, Transformer, and even recent Mamba-based models on these classification tasks, according to their tables. They report noticeable improvements, especially on the 3D datasets.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The authors explicitly state that MedGNN performs poorly on segmentation tasks. They mention that graph models struggle with the fine-grained, pixel-level predictions needed for segmentation because they are usually optimized for node or graph-level tasks. This significantly limits the “generality” claimed in the title.
    2. While using KNN reduces complexity compared to all-pairs distance calculation, constructing the graph (especially determining the optimal ‘k’ and patch size) adds preprocessing steps and introduces hyperparameters that could affect performance and may need tuning for different data types or resolutions. The reliance on KNN based purely on feature similarity might not always capture the most semantically meaningful connections compared to methods incorporating spatial priors more explicitly.
    3. Despite the FFFN, the paper acknowledges that some information loss still occurs, particularly for small-scale features within lesions or brain regions. This could be a drawback in medical scenarios where subtle details are diagnostically important. The process of patching and graph construction might inherently smooth over or miss very fine textures compared to dense methods like CNNs.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, this paper presents an interesting and well-executed study exploring a GNN-based approach for medical image classification. The novelty of using graph structures derived from image patches for 3D anatomical image recognition is a significant plus. The strong performance across multiple diverse datasets compared to strong baselines demonstrates the potential of this method. The specific architectural contributions (MGC, FFFN) and the potential for interpretability are also valuable.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Dear Editor and Reviewers: We appreciate for your constructive comments and suggestions on our manuscript. We summarize the core issues raised by the reviewers and respond to them below: [Q1] Rationale for using nearest neighbors. [A1] We construct the graph via a nearest neighbor approach, primarily to ensure the efficiency of MedGNN. The low computational complexity of KNN helps improve the efficiency of MedGNN, particularly for 3D medical imaging. We also explored alternative graph construction methods, such as KMeans-based clustering to connect feature-similar patches. However, these methods typically incur higher computational costs and yield unstable results on 3D data. [Q2] Execution process of GVE. [A2] The GVE process is as follows: we divide the medical image into patches, which serve as nodes in the graph, and construct edges by connecting each patch to its nearest neighbors. Therefore, the reviewer’s understanding is correct. [Q3] Motivation behind MGC and FFFN. [A3] The core idea of MedGNN is to establish a GNN-based general visual representation paradigm for medical imaging. MGC serves as the key feature extraction module. Since the relationships between lesion regions are not solely based on spatial adjacency, MGC leverages multi-scale dynamic feature updating to better capture these complex interactions. FFFN is inspired by the feed-forward network in the Transformer architecture. It is designed to enhance node discriminability and introduce non-linear representations, mitigating the over-smoothing issue in GNNs. [Q4] Differences from the references suggested by the reviewer. [A4] (1) Since MedGNN does not rely on ROIs, it allows graph construction without incorporating prior domain knowledge. (2) MedGNN is a general-purpose backbone for medical imaging that can be directly applied to 2D or 3D data without requiring complex structural modifications or adaptation. (3) MedGNN operates directly on voxels, eliminating the need for handcrafted features and complicated data preprocessing commonly seen in conventional models. (4) Its multi-scale dynamic feature updating enables more effective capture and representation of critical lesion-related information.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top