Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

We propose a novel uncertainty measure for cortical surface reconstruction (UNSURF) of clinical brain MRI scans of any orientation, resolution, and contrast. It relies on the discrepancy between predicted voxel-wise signed distance functions (SDFs) and the actual SDFs of the fitted surfaces. Our experiments on real clinical scans show that traditional uncertainty measures, such as Monte Carlo variance, are not suitable for modeling the uncertainty of surface placement. Our results demonstrate that UNSURF estimates correlate well with the ground truth errors and: (i) enable effective automated quality control of surface reconstructions at the subject-, parcel-, mesh node-level; and (ii) improve performance on a downstream Alzheimer’s disease classification task.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0304_paper.pdf

SharedIt Link: https://rdcu.be/eHxfr

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05325-1_61

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{GopKar_UNSURF_MICCAI2025,
        author = { Gopinath, Karthik AND Mehta, Raghav AND Glocker, Ben AND Iglesias, Juan Eugenio},
        title = { { UNSURF: Uncertainty Quantification for Cortical Surface Reconstruction of Clinical Brain MRIs } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {640 -- 650}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper introduces an uncertainty measure for cortical surface reconstruction from clinical brain MRIs. The method leverages discrepancies between predicted and actual signed distance functions to estimate uncertainty. Experimental results show that the proposed method provides reliable error estimates and improves performance on downstream tasks such as Alzheimer’s disease classification.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed method effectively handles diverse MRI orientations, resolutions, and contrasts for a wide range of clinical studies.
- The proposed method enables straightforward evaluation of the quality of reconstructed cortical surfaces.
- The uncertainty measure provides guidance for users in deciding whether to include uncertain data in their studies. This potentially influences the effect size in statistical analyses.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
While the proposed method is interesting in terms of clinical surface data assessment, there are several issues that hinder a clear understanding of the methodology and its evaluation of effectiveness. In particular, since the proposed framework relies heavily on existing methods, the evaluation becomes a crucial aspect of this work. However, some design choices appear to lack adequate justification.
- The proposed method appears to have limitations in capturing topological correctness during volumetric evaluation. The cortical surfaces are reconstructed from the predicted SDFs and then used to compute the proposed metric. However, this process raises a concern: how can the proposed method validate the predicted SDF when the surface representation is discretized into volumetric space? This potentially loses structural information particularly in narrow sulcal valleys. What are the key assumptions underlying the assessment of topological correctness? It would be helpful to include examples showing how the uncertainty measured by the proposed method might impact topological issues.
- One of the primary issues with this manuscript is its organization, which makes it difficult to follow the flow of the methodology. For example, the description of the input and output data is unclear until the experimental design section, and it took multiple readings to fully understand the context. Specifically, Figure 1 does not provide a clear overview of the process; I only realized that the input data are volumetric segmentations after revisiting the manuscript several times.
- The effective size may be misleading because Cohen’s d uses sample variance in its denominator. Hence, it is unclear whether the increased effect size is due to a reduction in uncertainty or an inaccurate estimation of the sample variance that may be influenced by the reduced sample size.
- Without a ground truth, it is difficult to confidently claim that the proposed method improves sensitivity across different brain regions as shown in Figure 4.
- Please include metrics such as AUC or other relevant performance indicators in addition to the effect size to provide a clearer understanding of how the classification performance is improved.
- Mapping uncertainty onto the cortical surface could be limited particularly for pial surfaces with narrow sulcal valleys, which may be affected by partial volume effects.
- Why not other geometric features used for quality evaluation such as curv or surface area?
- “input brain MRI (X^v)” presumably indicates synthetic data, but not explicit.
- What is \phi? This symbol is never defined.
- The term “actual” is unclear - please clarify its meaning in the context. The word “thickness” seems to refer to both slice spacing and cortical thickness at different points in the paper. Please make the distinction explicit to avoid confusion.
- It would also be helpful to explain how the SCC and PCC metrics relate to evaluating good or bad performance in this context.
- Some acronyms are not defined such as RA and RAC.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The manuscript’s organization makes it difficult to follow, and key design choices lack adequate justification, particularly regarding topological correctness and the use of the predicted SDF. Important performance metrics like AUC are missing. Several terms and symbols are undefined, and the manuscript would benefit from clarifying ambiguous concepts and providing a more comprehensive evaluation of the method’s performance.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

Thank the authors for their rebuttal. While some initial concerns have been addressed, it remains unclear how the proposed method specifically relieves PVE particularly at the pial surfaces, where conventional voxel-based approaches often encounter significant challenges. This issue is critical as surface reconstruction tools have long prioritized addressing PVE to ensure accurate quality assessments. Although the rebuttal acknowledges that the proposed approach alleviates PVE to some extent, it does not clearly explain how this is achieved or discuss the potential impact on the reported surface quality.

Review #2

Please describe the contribution of the paper

This work introduces a novel efficient approach for quantifying vertex-level uncertainty in cortical surface reconstruction building on the recon-all-clinical processing pipeline. The approach makes use of a signed-distance function prediction network for the surfaces, followed by some traditional geometric processing to produce the surfaces - the uncertainty is estimated based on the distance between the estimated SDF and the SDF calculated following geometry processing. This approach is compared against an ensemble dropout approach for the SDF network, and shown to compare favourably.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper proposes a novel and fairly lightweight approach to estimating uncertainty in cortical thickness estimation. Although the theoretical justification for the approach is somewhat limited (being based purely on inconsistency) the empirical results demonstrate it’s efficacy on real data being correlated with true errors, and that dropping high-uncertainty data improves effect size. Overall, the paper is well put together and the experiments seem well thought through.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Overall I felt this paper was good but I wonder if the baseline model could be improved: perhaps a regular ensemble, rather than employing test-time dropout, might be more effective and requiring less tweaking of the dropout rate. Ensembles have been shown to be effective approaches to capture uncertainty (although still computationally expensive). Moreover, it would have been worthwhile to run an experiment (even on a couple of subjects) to evaluate the variance after geometry processing. Given the practical infeasibility this doesn’t diminish the contribution of the paper, but may provide a stronger point of comparison.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Can the authors comment on whether the level of uncertainty was different in the AD patients vs CN? Also, why was data removed rather than downweighted using the uncertainty measure as the noise variance in the linear model?

RAC (fig 4 caption) is not defined.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

It’s a novel, effective and computationally efficient method for calculating uncertainty in cortical thickness - this may be impactful in clinical studies.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

Given the sensitivity of surface analysis to data quality, this paper presents a method for estimating the uncertainty of cortical surfaces reconstructed from heterogeneous clinical data. Leveraging a two-stage implicit reconstruction framework, the proposed approach quantifies uncertainty by computing the distance between the geometry-processed surface and the predicted signed distance function (SDF). Experiments reveal a strong positive correlation between the predicted uncertainty and cortical thickness error, highlighting its potential as a quality control metric. This facilitates the identification of surfaces with significant reconstruction errors and supports more robust and reliable surface analysis in large-scale clinical neuroimaging studies.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper is the first to introduce uncertainty estimation into cortical surface reconstruction in clinical settings, providing a valuable foundation for more comprehensive and reliable surface analysis.
2. The proposed approach incorporates uncertainty quantification into the reconstruction pipeline in a simple yet effective manner.
3. The manuscript is clearly written and well structured, allowing for smooth and intuitive understanding.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While this paper is the first to introduce uncertainty estimation into cortical surface reconstruction, its scope is limited to implicit-based methods and does not extend to the widely adopted deformation-based approaches. Although Section 2.2 notes that “existing surface deformation methods are currently limited to high-resolution T1 scans,” this statement lacks sufficient rigor. For example, “Cortical Surface Reconstruction from 2D MRI with Segmentation-Constrained Super-Resolution and Representation Learning” published in MICCAI 2024 proposed a deformation-based reconstruction method using low-resolution clinical MRI. A more comprehensive discussion of this aspect would help develop a more generalizable and robust solution.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

An increasing number of methods focus on reconstructing high-quality cortical surfaces from clinical data to enable comprehensive analyses. However, data quality remains a significant limitation for effective analysis. This paper presents an effective quality control method that addresses current challenges and holds practical significance for future surface-based research.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Great paper. The authors have provided a clear and satisfactory response to my concerns. I recommend acceptance.

Author Feedback

We thank the reviewers for recognizing the value of our contributions, particularly the integration of uncertainty estimation in cortical surface reconstruction and its relevance for clinical MRI analysis. Below, we clarify methodology and design choices raised in the reviews.

Topology and Surface Representation (R1): Our method uses implicit representations to enable contrast- and resolution-agnostic reconstruction. Final surfaces are produced via FreeSurfer-based topology correction, ensuring topological validity. Although implicit models do not explicitly encode highly convoluted geometry, the corrected surfaces maintain consistency, and the resulting vertex-wise uncertainties correlate well with reconstruction error. UNSURF captures discrepancies from surface extraction, particularly in sulcal and other complex regions, and supports improved AD vs CN group discrimination.

Method Organization and Notation (R1, R2): During training, synthetic MRIs generated from segmentation maps are used to predict isotropic SDFs; at test time, real clinical MRIs are input. “Input brain MRI (X^v)” refers to synthetic data during training and real data during testing. φ denotes geometry processing parameters, and “actual SDF” is the recomputed distance map from the extracted surface. We will disambiguate “thickness,” which refers to both slice spacing and cortical thickness depending on context, and define acronyms such as RA (recon-all) and RAC (recon-all-clinical) on first use. Labels will be added to Figure 1 to improve clarity.

Effect Size and Evaluation Metrics (R1, R2): Cohen’s d is used to quantify group-level differences in cortical thickness, not classification performance. With N=200 subjects, our estimate of sample variance is stable. Filtering noisy measurements increases the group mean difference (numerator) and decreases variance (denominator). We will include both components in Figure 4 for transparency. SCC and PCC in Figure 2 measure alignment between predicted uncertainty and true thickness error, reflecting how our model localizes unreliable regions.

Validation Without Ground Truth (R1): We use two validation strategies. In synthetic clinical data (Figs. 2–3), we compare against silver-standard reconstructions from high-res scans. In real clinical data (Fig. 4), where ground truth is unavailable, we use AD vs CN group differences—an accepted proxy in neuroimaging.

Uncertainty Mapping and Feature Choice (R1): Uncertainty mapping in narrow sulcal regions is limited by partial volume effects. Our voxel-based SDF formulation helps mitigate this but cannot fully resolve it. We prioritized cortical thickness as a biomarker due to its widespread use in neurodegeneration studies. While additional features like curvature or surface area are valuable, we reserved them for future work due to space limits.

Baseline and Method Design (R2, R3): We chose ensemble dropout for its balance of uncertainty quality and computational cost; full surface-level variance estimation is impractical due to processing time. While deformation-based methods, such as Wu et al. (MICCAI 2024), offer advances via segmentation-constrained SR and feature alignment, they focus on 2D-to-3D synthesis for neonatal data and lack demonstrated generalizability. Our method complements these approaches by supporting heterogeneous clinical inputs with uncertainty-aware prediction. We will cite this work and consider integrating UNSURF into such pipelines in future work.

Filtering, Downweighting, and Metrics (R1, R2): We used filtering to isolate the impact of high-uncertainty data, but agree that incorporating uncertainty as weights in linear models could further improve sensitivity and is a natural extension. Since AUC is closely related to effect size, we will also include it in Figure 4. Due to space constraints, we prioritized core results (uncertainty mapping and AD/CN effects), with additional evaluations reserved for a future journal version.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper introduces a novel method for estimating vertex-level uncertainty in cortical surface reconstruction from clinical MRI data. Reviewers highlighted its clinical relevance, clear structure, and demonstrated improvements in identifying reconstruction errors and enhancing downstream analyses (e.g., AD classification). Although Reviewer #1 raised concerns regarding partial volume effects and methodological clarity, these were sufficiently addressed in the authors’ rebuttal. Given its methodological novelty, practical impact, and favorable evaluation from Reviewers #2 and #3, this paper is recommended for acceptance.

back to top

UNSURF: Uncertainty Quantification for Cortical Surface Reconstruction of Clinical Brain MRIs

Author(s):