Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Attention-based methods have demonstrated exceptional performance in modelling long-range dependencies on spherical cortical surfaces, surpassing traditional Geometric Deep Learning (GDL) models. However, their extensive inference time and high memory demands pose challenges for application to large datasets with limited computing resources. Inspired by the state space model in computer vision, we introduce the attention-free Vision Mamba (Vim) to spherical surfaces, presenting a domain-agnostic architecture for analyzing data on spherical manifolds. Our method achieves surface patching by representing spherical data as a sequence of triangular patches derived from a subdivided icosphere. The proposed Surface Vision Mamba (SiM) is evaluated on multiple neurodevelopmental phenotype regression tasks using cortical surface metrics from neonatal brains. Experimental results demonstrate that SiM outperforms both attention- and GDL-based methods, delivering 4.8 times faster inference and achieving 91.7% lower memory consumption compared to the Surface Vision Transformer (SiT) under the Ico-4 grid partitioning. Sensitivity analysis further underscores the potential of SiM to identify subtle cognitive developmental patterns. The code is available at https://github.com/Rongzhao-He/surface-vision-mamba.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4598_paper.pdf

SharedIt Link: https://rdcu.be/eHwLT

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_57

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Rongzhao-He/surface-vision-mamba

Link to the Dataset(s)

http://www.developingconnectome.org

BibTex

@InProceedings{HeRon_Surface_MICCAI2025,
        author = { He, Rongzhao AND Zheng, Weihao AND Zhao, Leilei AND Wang, Ying AND Zhu, Dalin AND Wu, Dan AND Hu, Bin},
        title = { { Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {599 -- 608}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a surface vision mamba network for neurodevelopmental phenotype regression tasks. Extensive experiments has been conducted to demonstrate the faster inference speed and less computational cost.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well-written and well-formatted.

The motivation to adapt the bi-directional vision mamba blocks to the neurodevelopmental phenotype regression tasks is quite interesting.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

From my perspective, I don’t think Mamba is an attention-free architecture. It is a special variant of a linear-attention, but it just not the quadratic attention like Transformers.

The bi-directional vision mamba blocks in this paper is adopted from Vision Mamba[1], but the paper did not the original paper. Moreover, how the bi-directional works is not explained in the paper, either.

There is class token added to the visual token sequence in the middle between the left and right hemispheres. I am wondering if it is the right approach to handle the class token in this way. What about calculating the average of the visual token sequences instead of adding a class token, according to VMamba’s implementation?

How is the auto-aggressive training conducted? There is no detailed description in the paper.

The comparisons focus on older GDL and attention-based models (e.g., Spherical UNet, HRINet) but omit newer state-of-the-art approaches for spherical data (e.g., recent graph transformers or hybrid architectures).

The predicted neurodevelopmental outcomes (e.g., PMA, Bayley-III scores) are not rigorously linked to clinical utility. For instance, the ±0.5-week MAE for PMA prediction lacks context on whether this precision is meaningful for real-world diagnostic or prognostic use.

The model’s performance according to Table 4 highly rely on the pre-trained weight. Do you try to use self-supervised pre-training method, like masked auto-encoder pre-training in your own private data to test the performance?

[1] Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. 2024. Vision mamba: efficient visual representation learning with bidirectional state space model. In Proceedings of the 41st International Conference on Machine Learning (ICML’24), Vol. 235. JMLR.org, Article 2584, 62429–62442.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The motivation of the paper is quite clear. However, in terms of the design of the network (usage of the visual tokens) and the experiments in the paper, it is not well delivered. Therefore, I don’t recommend to accept the paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I really appreciate the response from the authors. They have addressed most of my concerns. Therefore, I raise my rating to accept.

Review #2

Please describe the contribution of the paper

Attention-based methods excel in modeling spherical cortical surfaces but face efficiency issues. The author introduces Surface Vision Mamba (SiM), an attention-free model using state space theory, representing data as triangular patches from subdivided icospheres. SiM outperforms attention/GDL methods in neonatal neurodevelopmental predictions, achieving 4.8x faster inference and 91.7% less memory usage than SiT, while identifying subtle developmental patterns.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper first proposes Surface Vision Mamba (SiM), an attention-free bidirectional state space model inspired by Vision Mamba, applied to the analysis of spherical cortical surfaces. This breakthrough overcomes the limitations of traditional attention-based models (such as SiT), which suffer from high memory usage and slow inference due to the quadratic complexity of sequence length, as well as the shortcomings of geometric deep learning (GDL) methods that struggle to extract global patterns from large-scale complex data.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1).The paper mentions that self-supervised pretraining (autoregressive pretraining) yielded limited improvements in model performance, with only some small-scale models (such as SiM-T/3 and SiM-S/3) performing better than training from scratch, while large-scale models (e.g., SiM-Base) showed negligible benefits. 2). Some of the GDL methods compared in the paper (e.g., MoNet, ChebNet) are designed for graph-structured data rather than specifically optimized for spherical manifolds, and the paper does not specify whether these methods were adapted for spherical data. For instance, S2CNN (spherical CNN) utilizes spherical harmonic basis functions to handle rotational invariance, yet its performance in the paper (MAE=0.69±0.45) is significantly lower than that of SiM, suggesting the comparison may lack fairness.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

In the methods section, it is advisable to introduce as many model variants and implementation details as possible, and explain the rationale behind choosing such approaches for improvements and implementations.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

1).The first introduction of Vision Mamba’s state space model into spherical manifold analysis solves the efficiency bottleneck of attention mechanisms in non-Euclidean data, providing a new paradigm for geometric deep learning. This cross-domain migration not only enhances model performance,but also expands the application space of “attention-free efficient modeling” in biomedical imaging. 2).Under the Ico-4 grid, it achieves a 4.8x inference speedup and 91.7% memory savings, holding significant implications for large-scale clinical data processing. 3). Some GDL methods (e.g., MoNet, ChebNet) were originally designed for graph-structured data and not optimized for spherical manifolds (e.g., without custom Laplace operators), and the performance of S2CNN may not reflect its optimal configuration, introducing potential bias in the comparison.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Accept as same as before.

Review #3

Please describe the contribution of the paper

This paper proposes the Surface Vision Mamba (SiM) to explore adaptation of the Vision Mamba for neurodevelopmental phenotype regression without relying on standard self-attention. Three training strategies are employed to assess the effectiveness of SiM in characterizing the cortical spherical manifold. Extensive experiments demonstrate the superior performance, strong generalization ability and high computational efficiency.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper introduces Mamba to cortical surface analysis for the first time, establishing a novel benchmark in the field. Experimental results show that the proposed model achieves superior or comparable predictive performance to existing graph-based and attention-based methods, while maintaining high computational efficiency, underscoring its practical value.
2. This paper presents a comprehensive set of experiments to evaluate the proposed model across diverse scenarios. The authors assess its performance on tasks such as PMA prediction, neurodevelopmental phenotype regression (language and motor scores), various training strategies, and alternative surface patching methods. Ablation studies further investigate the effects of sequence length and different pretraining strategies. These extensive evaluations offer a solid foundation for future research.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The overall structure could be improved. For example, substantial emphasis is placed on dataset partitioning and hyperparameter settings, while methodological details and the results of the neurodevelopmental phenotype regression tasks receive relatively limited attention. This imbalance weakens the clarity and impact of the paper’s contributions.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper introduces the Mamba to cortical surface analysis for the first time and provides extensive experimental validation. The proposed method achieves superior or comparable predictive accuracy to existing approaches while offering high computational efficiency.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Great paper. The authors have provided a clear and satisfactory response to my concerns. I recommend acceptance.

Author Feedback

We sincerely thank all Reviewers for their valuable comments. The source code will be released upon acceptance. Below are our responses to the main concerns. R1Q1: Our dataset is relatively small (526 infants), which limits the benefits of self-supervised pretraining—especially compared to large-scale settings like ImageNet. Moreover, while increasing model size can improve representation capacity, the performance gain tends to plateau beyond a certain scale due to diminishing returns. R1Q2: Spherical manifolds can also support the construction of relationships between cortical regions and thus be reasonably treated as graph-structured data, making GDL methods applicable. However, their limited performance may stem from reliance on local neighborhoods, difficulty modeling long-range dependencies, and assumptions such as signal smoothness or rotational symmetry, which often do not hold in real-world neuroimaging data. R2Q1: Thanks for your correction and we agree with your viewpoint, as noted by MLLA [1]. Our description is intended to express that Mamba avoids the quadratic complexity of Transformer. R2Q2: Thanks for pointing this out. We will update the reference in the next version. Standard Mamba processes the input sequence in a causal manner, limiting access to future context. While this is desirable in language modeling, it is suboptimal for visual inputs where spatial context is important. Bi-directional Mamba block addresses this by using two parallel SSMs to process the sequence in both forward and backward directions, enabling better modeling of global spatial dependencies on spherical manifolds and reducing directional bias. R2Q3: Yes, it is right. Our design of inserting a class token between the left and right hemispheres follows the implementation of Vision Mamba. As shown in their ablation study (Sec. 4.4), placing the class token in the middle outperforms both average and max pooling. This setup better preserves spatial structure and facilitates global integration. Moreover, this design is both effective and well-justified, where hemispheric distinction and integration are both crucial. R2Q4: Due to page limitation, we did not introduce the implementation of auto-regressive pretraining. Our decoder follows the same design as in ARM [2], which use Transformer blocks. Please note this decoder is dropped when finetuning on downstream tasks. R2Q5: Thanks for the suggestion. Our comparisons aim to establish a representative benchmark for spherical surface data, focusing on methods that have been widely adopted as baselines in this domain. Moving forward, we plan to further develop our SiM by incorporating architectural innovations and conducting comprehensive comparisons with emerging hybrid architectures. R2Q6: The ±0.5-week MAE for PMA prediction indicates how much the brain development of individual deviates from normal trajectory, which helps identifying atypical development (e.g., neurodevelopmental delays) like those commonly seen in premature infants. This supports early diagnosis and personalized intervention. Predicted Bayley-III scores directly reflect developmental outcomes and are clinically relevant for assessment. R2Q7: We did not adopt MAE-based pretraining because ARM (Table 9) [2] suggests that autoregressive pretraining is more effective than MAE for Vision Mamba. R3&R1: Thanks for your advice. We focused on dataset splitting and hyperparameter settings to ensure reproducibility and transparency of our experiments, but ignored the methodological and result details. We will strive to balance the content better and place greater emphasis on these aspects in the next revision. [1] Han, Dongchen, et al. “Demystify Mamba in Vision: A Linear Attention Perspective.” Advances in Neural Information Processing Systems 37 (2025): 127181-127203. [2] Sucheng Ren, Xianhang Li, et al. “Autoregressive Pretraining with Mamba in Vision.” The Thirteenth International Conference on Learning Representations. 2025.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

accepts

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
1. The data partitioning remains unclear. While the paper mentions that “Subset 1 and 3 were split into training, validation, and testing datasets in an 8:1:1 ratio,” the role and usage of Subset 2 are not explained.
2. Some technical terms and symbols are not well-defined in the paper. For example, abbreviations such as ‘MAE’ and ‘MSE’ should be explicitly described upon first use.
3. The authors should provide a more in-depth analysis of why GPU memory consumption for SiT increases dramatically when using an Ico-4 grid—significantly higher than SiM—whereas the memory usage differences between the two are relatively small under Ico-1, Ico-2, and Ico-3 configurations.

back to top

Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation

Author(s):