Abstract

To address the challenge of few annotated datasets for training brain magnetic resonance imaging (MRI) segmentation models, we propose to use voxel-level brain age prediction as a domain-specific pretext task for self-supervised learning before adapting models to a segmentation downstream task. We combined publicly available T1-weighted, normative brain MRI datasets to create a large (N = 1,710), representative dataset with a balanced distribution across age groups and sexes, minimizing potential biases in our model. We then compared three state-of-the-art architectures, Swin UNETR, UNETR, and UNET, on the voxel-level brain age prediction pretext task. Swin UNETR achieved the best performance with a mean absolute error (MAE) of 5.9 ± 4.4 years, outperforming UNETR (MAE: 7.2 ± 4.4 years) and UNET (MAE: 6.2 ± 4.2 years). Based on this performance, we selected Swin UNETR for a brain MRI segmentation downstream task to evaluate the effectiveness of the voxel-level brain age prediction as a self-supervised learning pretext task. We fine-tuned it and compared its performance against two baselines: (1) training from scratch and (2) fine-tuning a model pretrained on an image inpainting task, a non-domain-specific pretext task. The Swin UNETR model pre-trained on voxel-level brain age prediction achieved the highest Dice coefficient on an out-of-distribution test set and performed comparably to the inpainting-pretrained model on an in-distribution test set. These results demonstrate the potential of voxel-level brain age prediction as a domain-specific pretext task for self-supervised learning in neuroimaging, improving segmentation performance, especially in challenging, low-data scenarios.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3217_paper.pdf

SharedIt Link: https://rdcu.be/eHxep

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05325-1_28

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/TasneemN/BrainAgePrediction_asPretextTask

Link to the Dataset(s)

CORR dataset: https://fcon_1000.projects.nitrc.org/indi/CoRR/html/ IXI dataset: https://brain-development.org/ixi-dataset/ ABIDE I and II datasets: https://fcon_1000.projects.nitrc.org/indi/abide/ OASIS-1: https://sites.wustl.edu/oasisbrains/home/oasis-1/ ADNI: https://adni.loni.usc.edu/ Cam-CAN: https://camcan-archive.mrc-cbu.cam.ac.uk/dataaccess/ Calgary Campinas dataset: https://sites.google.com/view/calgary-campinas-dataset/home OASIS-2: https://sites.wustl.edu/oasisbrains/home/oasis-2/

BibTex

@InProceedings{NasTas_Investigating_MICCAI2025,
        author = { Nasser, Tasneem AND Souza, Roberto AND El-Sheimy, Naser},
        title = { { Investigating Voxel-level Brain Age Prediction as a Pretext Task for Brain MRI Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {289 -- 299}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper explores the use of voxel-level brain age prediction as a self-supervised pretext task to improve downstream brain MRI segmentation. The authors leverage a diverse, multi-site dataset and use SynthSeg-generated segmentations as pseudo-labels for training.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-The authors have done an excellent job assembling a demographically diverse dataset, which strengthens the generalizability. -They acknowledge their limitations, which is appreciated.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

-The authors mention that poor SynthSeg outputs were removed after visual inspection. However, it’s unclear what proportion of the data was excluded, who performed the quality control (e.g., how many raters, level of expertise), and what the specific inclusion/exclusion criteria were. Providing this detail is essential for assessing the reliability of the training data. -A demographic breakdown of both the training and test datasets is missing. The authors should report mean age, age range or IQR, sex distribution, and—if available—ethnicity/race. This information is especially important given the study’s emphasis on dataset diversity.

-Some of the referenced datasets (e.g., ABIDE, ADNI) include participants with clinical conditions. It’s unclear whether these subjects were included in training or excluded.

-The manuscript positions domain-specific pretraining as a central contribution. It would be valuable for the authors to briefly discuss any existing work in this area, or highlight how their approach differs from prior methods.

-In its current form, Figure 3 makes it hard to distinguish between ground truth and predictions. A difference map (e.g., a voxel-wise distance or error map) would better highlight where the model succeeds or fails and improve interpretability.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The exclusion criteria for poor-quality labels are not clearly defined, and there is no demographic summary of the dataset, despite the stated emphasis on diversity. Additionally, the handling of clinically affected subjects and the absence of strong baseline comparisons or prior work discussion limit the strength of the contribution.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors use voxel-level brain age prediction as a domain-specific pretext task for self-supervised learning before adapting models to a segmentation downstream task.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Authors show that voxel level brain age prediction can help improve segmentation accuracy. The idea is novel since it mitigates the need for a lot of annotated data.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Paper presentation, especially figures can be improved. A figure can be added to discuss how correct brain age prediction in a given voxel can affect the segmentation label of that [particular voxel. This will help in establishing the merit of voxel based age prediction as pretext. The voxels that are not correctly segmented, do they have any correlation with poor brain age prediction.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Idea of brain age as pretext is novel and has great potential to improve segmentation.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The paper investigates the use of voxel-based age prediction as a training strategy in self-supervised segmentation. Results show that the proposed model that is using Swin UNETR, pre-trained on voxel-based age prediction outperforms the models rained from scratch or based on in-painting (usual pre-training task) if the dataset is sufficiently small.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- the paper addresses an important problem in medical image segmentation - self-supervised learning when limited data is available
- a carefully selected dataset is used
- the authors did an ablation study to select the best model architecture for voxel-based age prediction between UNET, UNETR and Swin UNETR.
- as a side note, the analysis also give insights on the number of training datasets needed when training from scratch, which is practically important
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- even though the idea of using voxel-based brain age in the segmentation context is new, the work is mainly incremental from a technological view point. this is ok in this context.
- writing could be improved in some parts - I found too much description in datasets (pg5) and results (pg6). A more concise presentation would be better, maybe enhancing table 1 and Fig 2.
- the methodology section would benefit from a diagram to clarify which layers are frozen, what is fine-tuned etc. - could add to Fig 1.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- WM/GM segmentation - what about other regions ? subcortical for example
- Fig 2 - text on labels too small
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper addresses an important topic in image segmentation and presents an interesting and valid solution.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the reviewers for their comments. We would like to clarify some points that will be incorporated in the camera-ready version:

To support reproducibility, all code used in our experiments will be made available as part of the camera-ready submission.

The dataset of 1,710 samples is perfectly balanced across males and females. The age ranges from 18 to 80. The age groups were split into bins of size 5 (i.e., [20,25), [25,30),…) except for the first bin ([18,20)). The distribution is uniform across age bins, except for the first one. Since this bin one only covers a 2 year spam, the number of samples were scaled accordingly (i.e., 2/5).

We only used the healthy samples from the datasets described.

The balanced dataset of 1,710 is after the exclusion of samples that failed during pre-processing.

We experimented different fine-tuning approaches. Fine-tuning all layers yielded optimal results.

We acknowledge many relevant comments from the reviewers, but due to space limitations, we are unable to add further analysis to the paper.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

Investigating Voxel-level Brain Age Prediction as a Pretext Task for Brain MRI Segmentation

Author(s):