Abstract

Lung cancer remains the leading cause of cancer-related mortality in the United States, despite the adoption of low-dose computed tomography (LDCT) and updated screening guidelines from the United States Preventive Service Task Force (USPSTF) [19]. Limited infrastructure and financial costs continue to hinder widespread LDCT adoption, while the increasing detection of indeterminate pulmonary nodules (4–20mm) challenges accurate diagnosis and clinical decision-making. We address these limitations by pretraining masked autoencoders (MAE) on the COPDGene dataset, which captures chronic lung inflammatory disease features. Emphysema and airway disease, two distinct subtypes of COPD, are pathophysiological manifestations of chronic lung inflammation [4,15]. Incorporating these features may enhance the model’s ability to distinguish between malignant and benign pulmonary nodules. By exploring multiple masking strategies, we optimize network attention on parenchymal and perinodular features, improving the extraction of relevant image biomarkers. Our results demonstrate that pretraining on the COPDGene dataset using random masking (r-masking) achieves superior classification performance, with a sensitivity of 88.79%, specificity of 86.27%, and an AUC of 0.931, when compared to self-pretraining on National Lung Cancer Screening Trial (NLST), and supervised learning on NLST. This highlights the importance of leveraging chronic disease datasets for self-supervised learning and underscores the potential of MAE-based approaches to improve nodule classification in clinical settings.Code available at https://github.com/axemasquelin/RegionalMAE

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4708_paper.pdf

SharedIt Link: https://rdcu.be/eHxaE

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_47

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/axemasquelin/RegionalMAE

Link to the Dataset(s)

NLST dataset: https://www.cancerimagingarchive.net/collection/nlst/ COPDGene Dataset: Data Sharing Agreement Required

BibTex

@InProceedings{MasAxe_Pretraining_MICCAI2025,
        author = { Masquelin, Axel H. P. AND San José Estépar, Raúl},
        title = { { Pretraining on Chronic Lung Inflammatory Disease Datasets to Enhance Indeterminant Lung Cancer Classification using Masked Autoencoders } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {485 -- 495}
}

Reviews

Review #1

Please describe the contribution of the paper

This manuscript presents a method for enhancing indeterminate pulmonary nodule classification by leveraging masked autoencoders (MAEs) pretrained on a chronic lung disease dataset (COPDGene). The authors explore various masking strategies to direct the network’s attention and improve feature learning. Their experiments demonstrate that MAEs pretrained on COPDGene with random masking improves the classification performance compared to models pretrained on ImageNet, self-pretrained on the NLST dataset, and models trained from scratch.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors use COPDGene, which is a non-cancer dataset in self-supervised pretraining for cancer classification tasks. It is a novel use of chronic disease data.

The manuscript thoroughly compares multiple training strategies.

The manuscript reports consistent improvements in various performance measurements.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Since the base ViT model is an encoder-only architecture, It would be beneficial if the authors elaborated the ViT architecture utilized in the paper and how the decoder is designed for the MAE framework.

Some references are cited as “[?]”, which should be corrected .
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper involves a novel use of chronic disease data and reports consistent improvements in various performance measurements.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper proposes a novel application of Masked Autoencoder (MAE) pretraining on the COPDGene dataset to improve pulmonary nodule classification performance on the NLST dataset. The study explores three masking strategies (random, tumor-masking, and parenchyma-masking) and demonstrates that pretraining with random masking on COPDGene yields superior classification performance compared to pretraining on NLST or ImageNet/from-scratch initialization.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) The authors leverage COPDGene — a chronic disease dataset — for self-supervised learning, which addresses the domain mismatch issues typical of ImageNet pretraining.

2) The use of r-, t-, and p-masking strategies is well designed, and the results show that r-masking captures the most informative features for the downstream classification task.

3) There are comprehensive ablation studies to evaluate the importance of various parameters.

4) The proposed method achieves an AUC of 0.931, outperforming other strategies listed in the paper (with or without MAE).
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1) As the authors have mentioned, COPDGene has a lot more data than NLST. I am wondering if the authors use a subset of COPDGene data that has a comparable size to the NLST dataset, would the result still be better? If so, this will make a stronger claim that diseases like COPD influence lung cancer detection;

2) There are limited technical innovations as the paper mainly utilizes existing AI/ML architectures.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors have attempted to address an important clinical problem and have shown good results. There are thorough ablation studies to investigate the effect of various parameters and the impact of MAE. However, there are limited technical innovations and it is also a bit unclear how much better this framework is compared to other similar state-of-the-art algorithms.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The paper tackles a problem of classification of lung nodules (of 4mm-20mm size) into malignant vs. benign based on centered 64x64x64 CT crops. To improve the results when training on limited labeled NLST dataset, authors propose to pre-train a ViT architecture using masked autoencoder (MAE) method. They show that MAE pre-training significantly improves classification quality compared with training from scratch. They also demonstrate that pre-training on a larger set of automatically detected nodules in COPDGene dataset (patients with chronic lung inflammation) is more beneficial compared with pre-training on the NLST dataset. Authors provide extensive experiments comparing different masking strategies and other MAE’s hyperparameters.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Paper tackles very important problem. Pre-training for this classification problem is a meaningful and promising idea and well-motivated research direction.
- Extensive experiments with MAE. Authors provides a comprehensive ablation study comparing different pre-training datasets, masking strategies, model size, fine-tuning vs. linear probing setups and other MAE’s hyperparameters.
- Strong empirical results. Authors show that MAE pre-training significantly improves the classification quality, especially when pre-training on COPDGene dataset.
- Thorough and honest discussion. I appreciate that authors explicitly mention that pre-training on COPD may be more beneficial just due to the larger dataset size and not to the inclusion of lung inflammatory patients.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Missing related work section. Authors do not overview many existing works on lung nodules malignancy prediction. Applying MAE to this task seems to be novel, but authors should explicitly comment on the position of their paper with respect to the related works.
- Method needs clarification. Do you apply ViT to 3D crops or 2D crop slices? From Section 2.1 it seems that you apply ViT to slices. In this case how do you make predictions for 64x64x64 crops? And why don’t you train a ViT on 3D crops? Aggregation of the 3D context seems to be very useful for malignancy prediction, doesn’t it?
- Experiments description also needs clarification. On the one hand, in the last paragraph of Section 2 authors say that COPDGene data is set to be equal to NLST in order for proper comparison of these pre-training distributions. On the other hand, in Discussion authors say that COPDGene dataset is much larger than NLST which can be the main reason of superior performance of MAE pre-trained on COPDGene.
- No experiments with other pre-training methods. I believe the contribution of the paper would be stronger, if authors included other pre-training methods, e.g. contrastive pre-training (SimCLR) or SOTA methods like DINOv2 in their experiments.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a meaningful, well-motivated, novel research and strong empirical results. However, authors should include Related work section or subsection and clarify several aspects of their method and experiments.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We are pleased that the reviewers recognized the novelty of pretraining on co-occurring diseases using masked autoencoders (MAEs) in relation to pulmonary nodule classification. The reviewers noted that the paper presented well-motivated, rigorous research and strong empirical results related to indeterminate (4mm-20mm in diameter) pulmonary nodule classification. We value the feedback provided by the reviewers and address the major points of critique below while providing clarifications where necessary. RELATED WORKS: Multiple reviewers noted the absence of a related works section in the manuscript. We acknowledge this omission and concur that such a section would typically be included in a more complete publication. However, due to space limitations, we prioritized providing general citations to established MAE methodologies. In the final version, we will ensure that a subsection of the introduction highlights prior works related to the applications of MAEs as a form of pretraining. METHODOLOGY: We thank the reviewers for identifying ambiguity in the manuscript regarding the use of 3D crops versus 2D crop slices in Section 2.1, and how predictions are made for a given 64×64×64 region. We will address this ambiguity in the methodology section of the final version. For clarity, although a 64×64×64 region of interest (ROI) is extracted around each nodule, the MAE processes only a 2D (64×64) axial slice of the nodule. During training, random sampling is utilized for data augmentation, while during inference, the central slice of the nodule is selected as it typically exhibits the largest nodule diameter. DATASETS: R2 and R3 raised similar concerns regarding the lack of clarity in the methodology with respect to COPDGene and the contribution of its size when compared to the National Lung Screening Trial (NLST). We will elaborate in the datasets section that 10,000 individuals from the COPDGene phase 1 trial were screened for pulmonary nodules using a nnU-net. This screening resulted in approximately 23,000 pulmonary nodules being detected, of which only a random sample of 5,000 was utilized for pretraining. This quantity was selected to balance training time against performance gain. As was noted in the discussion, further study will be necessary to properly evaluate whether the performance improvement stems from pretraining on a larger cohort or from the inclusion of pre-inflammatory markers associated with lung cancer. EVALUATION: R3 noted that the contribution of our paper would be strengthened by including alternative pretraining methodologies as comparisons, such as contrastive pretraining (SimCLR) or state-of-the-art approaches like DINOv2. We had considered that comparative experiments between SimCLR or DINOv2 and MAE had been thoroughly documented in the literature; therefore, we did not deem such comparisons necessary for our experiment. However, we acknowledge that these methodologies should be further evaluated to continue exploring whether the incorporation of pathophysiologically relevant features associated with a disease improves network performance and generalizability. INNOVATION: R2 and R3 expressed concern that the proposed methodology offers limited technical innovation. We acknowledge that no novel technological or technical AI/ML contributions are made through this paper. However, we contend that the use of COPDGene, a chronic obstructive pulmonary disease dataset, for pretraining represents an innovative approach, as the pulmonary nodules detected in these individuals cannot be definitively classified as malignant or benign. The paper’s innovation focuses primarily on the impact of incorporating pathophysiologically relevant features of lung cancer and its associated co-occurring diseases to improve model performance and, in future work, generalizability.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

Please fix the problems with the citations

back to top

Pretraining on Chronic Lung Inflammatory Disease Datasets to Enhance Indeterminant Lung Cancer Classification using Masked Autoencoders

Author(s):