List of Papers Browse by Subject Areas Author List
Abstract
Longitudinal magnetic resonance imaging (MRI) is essential for diagnosing and monitoring multiple sclerosis (MS), a chronic central nervous system disorder. Tracking brain lesion evolution over time is essential for predicting MS progression, yet this process is time-consuming and subject to intra- and interobserver variability. While deep learning models such as convolutional neural networks (CNNs) and vision transformers (ViTs) have been applied to lesion detection, they often struggle to fully capture spatial, structural and temporal relationships. Vision graph neural networks (ViGs) present a novel approach with the potential to improve performance in these tasks by effectively capturing relational and structural information. We introduce DEFUSE-MS, a Deformation Field-Guided Spatiotemporal ViG-Based Framework for detecting MS new T2-weighted lesions. The framework features a Heterogeneous Spatiotemporal Graph Module (HSTGM), which functions as both an encoder and decoder. Evaluated on the MSSEG-II dataset, DEFUSE-MS achieves state-of-the-art performance with a lesion detection F1 score of 0.65, sensitivity (SensL) of 0.74, positive predictive value (PPVL) of 0.65, and a mean segmentation Dice score of 0.55, outperforming the state-of-the-art methods. These results highlight DEFUSE-MS’s efficacy in MS new lesion detection. The code is available
at https://github.com/BioMedIA-MBZUAI/DEFUSE-MS.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1815_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/BioMedIA-MBZUAI/DEFUSE-MS
Link to the Dataset(s)
N/A
BibTex
@InProceedings{SalMos_DEFUSEMS_MICCAI2025,
author = { Salem, Mostafa and Hassan, Salma and Papineni, Vijay Ram Kumar and Elsayed, Ayman and Yaqub, Mohammad},
title = { { DEFUSE-MS: Deformation Field-Guided Spatiotemporal Graph-Based Framework for Multiple Sclerosis New Lesion Detection } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15971},
month = {September},
page = {258 -- 268}
}
Reviews
Review #1
- Please describe the contribution of the paper
The authors proposed DEFUSE-MS, for MS new lesion detection, which formulates lesion identification in 3D MRI as a heterogeneous spatiotemporal GNN within an encoder-decoder framework.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Writing and figure are good.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Page 6 Empenn Vol. mm3 4.26 ± 9.0 should be revised.
- For (w/ DF), the No.FPs and Vol. mm3, why DEFUSE-MS series methods are worse than than UNet?
- In conclusion part, “ In attention-based models, stacking the DF with baseline/follow-up images leads to decreased performance due to information overload, noise, and misalignment, which confuses the attention mechanism and hinders its ability to focus on relevant features. These models, optimized for learning contextual relationships within image features, struggle with the spatial transformations captured by the DF.” Please show some experimental results to prove.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
method
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
The paper describes a new framework able to detect and segmentate new lesions of Multiple Sclerosis in T2-w. To do that the authors propose an enriched U-net architecture that leverage a spatio-temporal graph.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is well written and the introduction of spatio-temporal graph with the Deformation Field represents a good level lof novelty. The comparison with state of the art and the ablation studies confirms the goodness of the method. Also the formal description (section Metodology, page 4) of the methos is very good.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Table 1 describe the results. MS lesions (moreover the new) are often very small and the choice to exclude lesions smaller than 3 mm could represents an “easy win” for the authors. They shold explain why they decided to exlude those lesions.
Many methods related to the detection/segmentation of MS demonstrated how they are often related to the dataset variability. The authors should test the method with an external dataset w.r.t. the training set (such as ISBI 2015).
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Page 5, Dataset; the authors wrote “This study uses the updated consensus GT”. It should be detailed.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a good level of novelty and it is described in the right way, but it need to be completed with the suggested information.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The authors develop a heterogenous spatiotemporal graph module (HSTGM) and train a vision GNN where HSTGM are used as building blocks in both the encoder and decoder to segment new MS lesions.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Inclusion of spatial proximity, feature similarity and learned deformation field embeddings in a graph structure to capture specific aspects of the problem space is novel. Ablation studies conducted to assess the value of the deformation field (DF) and spatial edge attributes provides valuable insights:
- DEFUSE-MS trained without temporal and spatial edge attributes performed best for null cases with no progression (does this variant reduce to an intensity based method?)
- Among the DEFUSE-MS trained with one set of edge attributes (DFMax or spatial), the performance of the two models is similar for progressors; for non-progressors the model trained with temporal edge attributes performed better and resulted in lower volume of predicted false positive lesions.
- Between using a max estimate and learned embeddings of the DF, the learned embeddings performed markedly better.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The models used for comparison in Table 1 may not be best-suited as they include larger transformer and CNN variants (number of parameters) and were trained here on a smaller dataset (40 patient volumes). Concatenation of the DF along the channel dimension does not lend to its effective utilization for the task of detecting / segmenting new MS lesions as the features in the DF differ drastically from those of structural MRI.
Additionally these models were not specifically designed to leverage longitudinal information. Tailored approches such as the one in ref [29] would provide a better comparison. A higher performance on progressors (Dice=0.6382; lesion F1=0.6196) is reported in ref [29] compared to the reference models in Table 1. However, this is on a cross-validation dataset of the 40 training samples as opposed to the 60 test samples used here.
The current approach utilized 18x18x18 patches for training and no details on inference are provided, prompting concerns about its performance during inference.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
The exact form of the deformation filed is not specified: a vector of spatial coordinates or the Jacobian matrix specifing volume change per voxel. The proposed module and framework is intuitive and elegant considering both spatial and temporal features. Suggest providing a trade off assessment of the current graph approach against transformer and CNN models considering: model size (number of parameters), training sample size requirements (smaller vs larger datasets) and inference performance.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed HSTGM module is simple and elegant. However, the comparison with other models in the literature may be biased as they are larger models trained on a smaller dataset and were not designed to handle temporal information.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors have addressed most of the points raised by the reviewers.
Author Feedback
We thank the reviewers for recognizing the novelty (R1,R2) & clarity (R1,R2,R3) & SOTA comparisons and the ablation studies (R1,R2) of our work. Below, we address the comments.
[R1] Small lesions Exclusion: We do not exclude small lesions from GT in train or test. During the post-processing stage, we apply a size threshold to filter out spurious small lesion candidates (<3 mm3) from the model’s output only, not from GT. This practice aligns with the MSSEG-II evaluation protocol and clinical standards. We will improve this description in the paper.
[R1] External validation: ISBI 2015 includes longitudinal data but does not annotate new lesions, making it unsuitable for evaluating our new lesion detection method. Deriving such annotations requires a non-trivial potentially error-prone post-processing pipeline. We plan to evaluate DEFUSE-MS on the ICPR 2024 MSLesSeg dataset, which focuses on new lesion detection and will soon be publicly released. We will consider them for future work.
[R2] Tailored longitudinal methods Comparisons: For a relatively fair comparison, when [29] is trained only on MSSEG-II, its performance is only (Dice=0.5849, F1=0.5135) on cross-validation, ours is (Dice=0.55, F1=0.65) on the testset. The improved performance (Dice=0.6382, F1=0.6196) was obtained using additional cross-sectional data (MS-23v1) beyond MSSEG-II, while DEFUSE-MS is trained only on MSSEG-II. MSSEG-II testset includes previously unseen scanners, increasing generalization difficulty. However, DEFUSE-MS performs robustly. For a fair comparison, we reached out [29] to reproduce their results on the testset but due to time constraints this was not possible. Future work will expand comparisons to more longitudinally tailored models as additional datasets become available.
[R2] Fairness and baseline selection: To fairly benchmark DEFUSE-MS (17.41M), we compare it with a spectrum of CNN and transformer-based models, including large (TransUNet 45.63M, UNETR 35.78M) and lightweight (UNet 4.77M, UNeXt 2.94M, SlimUNETR 1.62M, SegFormer3D 0.21M) architectures. All models were trained under identical conditions. While many baselines lack temporal modeling, they remain widely used standards in medical segmentation, even under limited data conditions, and their inclusion underscores DEFUSE-MS’s unique value in capturing longitudinal dynamics. Regarding DF concatenation in baselines, this design choice was made to maintain architectural consistency and allow a fair comparison across methods, without introducing model-specific adaptations. However, we agree that future work could explore more tailored DF integration strategies in standard baselines
[R2] Inference&DF details: During inference, the same paradigm is applied using a sliding window of 16×16×16 patches. DF is a dense 3D vector field (voxel-wise displacements), without derivatives like Jacobians
[R3]Empenn Vol. mm³ 4.26 ± 9.0: Refering to [7] which shows MICCAI 2021 MSSEG-2 challenge quantitative results, we confirm that the reported values are correct. We will revise the formatting to improve clarity
[R2,R3] FPs in non-progressor cases: We agree that DEFUSE-MS (w/DF) can show slightly higher FP counts/vols compared to UNet. This is due to its heightened sensitivity to subtle longitudinal changes captured via DF, which may result in over-segmentation, particularly in cases with artifacts or inter-scan variations. Nonetheless, ablations in Table 2 show that DF improves overall detection metrics (Dice,F1), despite the FP increase. We will clarify this performance trade-off
[R3] Experimental support for DF’s impact on attention-based models: Experimental results shown in Table 1 show attention-based models like TransUNet and UNETR perform worse when DF is included, compared to DF-free variants, which supports that stacking DF with images can confuse attention mechanisms, likely due to DF’s complex and noisy structure. We discussed this in ‘Model Architecture Comparison’ in the Discussion
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A