Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Large-scale pre-training holds the promise to advance 3D medical object detection, a crucial component of accurate computer-aided diagnosis. Yet, it remains underexplored compared to segmentation, where pre-training has already demonstrated significant benefits. Existing pre-training approaches for 3D object detection rely on 2D medical data or natural image pre-training, failing to fully leverage 3D volumetric information. In this work, we present the first systematic study of how existing pre-training methods can be integrated into state-of-the-art detection architectures, covering both CNNs and Transformers. Our results show that pre-training consistently improves detection performance across various tasks and datasets. Notably, reconstruction-based self-supervised pre-training outperforms supervised pre-training, while contrastive pre-training provides no clear benefit for 3D medical object detection. Our code is publicly available at: https://github.com/MIC- DKFZ/nnDetection-finetuning.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4555_paper.pdf

SharedIt Link: https://rdcu.be/eHaYF

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04965-0_58

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MIC-DKFZ/nnDetection-finetuning

Link to the Dataset(s)

MSD Pancreas: https://drive.google.com/drive/folders/1HqEgzS8BV2c7xYNrZdEAnrHk7osJJ–2 RibFrac: https://ribfrac.grand-challenge.org/dataset/ KiTS21: https://github.com/neheller/kits21 LIDC: https://www.cancerimagingarchive.net/collection/lidc-idri/ DUKE Breast: https://www.cancerimagingarchive.net/collection/duke-breast-cancer-mri/ LUNA16: https://zenodo.org/records/3723295, https://zenodo.org/records/4121926 PN9: https://jiemei.xyz/publications/SANet CTA-A: https://zenodo.org/records/6801398

BibTex

@InProceedings{EckKat_The_MICCAI2025,
        author = { Eckstein, Katharina AND Ulrich, Constantin AND Baumgartner, Michael AND Kächele, Jessica AND Bounias, Dimitrios AND Wald, Tassilo AND Floca, Ralf AND Maier-Hein, Klaus H.},
        title = { { The Missing Piece: A Case for Pre-Training in 3D Medical Object Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},
        page = {615 -- 626}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a comprehensive study on the impact of large scale pre-training strategies for 3D medical object detection. The authors have used both supervised and self supervised pre-training approaches across different architecture combinations, evaluated them on 8 different medical imaging datasets.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper systematically studies large scale pre-training specifically for 3D medical object detection.

It evaluated multiple dimensions with 8 different medical imaging datasets.

The paper finds that reconstruction based self supervised pre-training outperforms supervised pre-training.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

This paper primarily applies existing techniques in a new context over proposing new architecture or any methodological innovation

There is no sufficient statistical analysis.

No proper computational efficiency and hardly explained.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Lack of statistical analysis of the results, incomplete dataset descriptions, limited innovation, insufficient metrics on computational considerations, no source code reference or any mention of how cross frameworks were integrated. Addressing these could strengthen the paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper investigates the role of pre-training in 3D medical object detection by evaluating a range of supervised and self-supervised learning strategies across multiple model architectures. The experimental results indicate that self-supervised learning based on reconstruction consistently yields the best performance across diverse datasets in the context of 3D object detection.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper presents comprehensive experiments exploring various pre-training strategies across different model architectures. Additionally, ablation studies were conducted to determine the optimal hyperparameter configurations prior to comparing the methods
- While the paper does not directly introduce a novel methodology, it provides valuable insights into training strategies for 3D medical object detection—an area that has been relatively underexplored compared to segmentation and classification tasks
- The evaluation on multiple datasets involving different target objects for detection demonstrates the generalizability of the approach across diverse medical imaging applications
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- While dataset details are provided, the training, validation, and test splits are not clearly defined. Although Table 1 outlines the division between training and test sets, Section 2.3 references a validation set, which creates confusion regarding the actual data split protocol
- The methodology, though described in detail, can be difficult to follow due to the use of multiple models and pre-training strategies. It is not always clear which pre-training weights are transferred to which model, potentially causing confusion for readers trying to understand the experimental setup
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

I recommend that the authors revise the paper to more clearly define the pre-training and fine-tuning procedures used for each model. Additionally, a clearer explanation of the train, validation, and test set splits would greatly enhance the clarity and reproducibility of the study
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I believe this paper offers valuable insights into pre-training strategies for 3D medical object detection, which can serve as a useful reference for future research in this underexplored area. With clearer organization and presentation, the paper has strong potential to be a solid candidate for publication
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The paper systematically compares how different pre-training paradigms (supervised vs. self-supervised) and methods (Multitalent, Model Genesis, MAE, SparkMAE, VoCo) can improve the quality of Retina U-Net and Deformable DETR models for medical 3D object detection task. The paper does not introduce a novel method, but provide a comprehensive empirical study of different combinations of existing pre-training techniques and object detection architectures.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- I agree with authors, that despite often being overlooked, 3D object detection in medical images is very important task. Study of pre-training methods for this downstream task is well-motivated.
- Comprehensive experiments and analysis. Authors provide experiments on 8 datasets. They compare supervised pre-training with four self-supervised pre-training methods covering both reconstruction-based and contrastive approaches. Baselines trained from scratch are also included. Experiments are provided for SOTA convolutional and SOTA transformer-based architectures: Retina U-Net and DETR. Figure 2 is very illustrative and shows that MAE pre-training of DETR backbone yields the best results.
- Good organization of the paper.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Ambiguous results and conclusions. As can be seen in Table 3, and as authors mention in the Discussion section, there is no single pre-training method outperforming all others on all datasets. Moreover, on some datasets models trained from scratch outperform pre-trained models. Of course, it is not authors fault. However, I would expect some more definite conclusions or advices for practitioners and researchers in the Discussion section.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite that the paper does not propose a novel method and does not demonstrate impressive and actionable empirical results, it presents a novel benchmark - an excellent study of pre-training methods in the context of medical 3D object detection. I find this benchmark valuable, as it objectively shows limited benefits of the existing pre-training methods for medical 3D object detection. I believe it can motivate the research community to develop new pre-training methods and evaluate it on this benchmark, and show what practitioners can expect from employing pre-trained models for developing medical object detectors.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely thank all reviewers for their thoughtful and constructive feedback. We are pleased that the reviewers recognized the value of our work as the first comprehensive study on the impact of pre-training for 3D medical object detection, an area that is underexplored compared to segmentation and classification. Due to the strict space limitations of the MICCAI paper format and the missing option to add supplementary materials, we had to make careful decisions on what to include in the main paper. We acknowledge that, as a result, some details had to be omitted, which we address and clarify in our responses below.

On novelty and contribution: Rather than introducing a new architecture or method, our study contributes a systematic benchmark across diverse detection tasks, datasets, and pre-training strategies. We believe the novelty lies in the scope and synthesis of our work: It is the first evaluation of both supervised and self-supervised pre-training approaches on multiple state-of-the-art 3D detection architectures (CNN-based and Transformer-based), spanning eight diverse detection datasets with distinct clinical targets. Multiple reviewers recognized the value of this approach: R3 emphasized the importance of 3D detection in medical imaging and the benchmark’s potential to motivate further research. R4 noted that, while not proposing a novel method, we offer valuable novel pre-training insights for 3D medical object detection and demonstrate generalizability across varied detection targets.

On experimental insights and statistical analysis: R2 raised concerns regarding the limited statistical testing. We applied bootstrapping with 1,000 iterations on the test set results to assess the robustness and variability of our findings, a method commonly used to analyze ranking stability [1]. While Fig. 2 in the main paper shows aggregated rankings, we will additionally provide per-dataset rankings in the supplementary material of an upcoming arXiv version. R3 noted the lack of a universally superior pre-training method across all datasets. We fully agree, and this observation reflects a key takeaway of our work: the benefits of pre-training for 3D medical object detection are highly context-dependent.

On clarity and reproducibility: R4 expressed confusion regarding the construction of the validation set. While we specified in Section 2.3 that we separated hold-out test sets from all datasets and split the remaining data 80/20 into training and validation sets, we understand that Table 1 may have caused confusion by only showing the combined (training+validation)/test split. We will clarify the exact training/validation/test splits in the camera-ready version. R2 and R4 raised concerns about reproducibility and requested a clearer explanation of the pre-training/fine-tuning setup. To address this, we intend to release all code and configuration files for both pretraining and fine-tuning experiments, ensuring full reproducibility. Additionally, we will include a detailed protocol for both pretraining and fine-tuning in the supplementary material of an upcoming arXiv version.

On computational aspects: We acknowledge that computational efficiency and training time were not discussed in depth in the main submission. However, we would like to emphasize that pre-training does not impact the runtime of the models, as we perform full fine-tuning. As a result, the runtime remains comparable to that of the default nnDetection models. To clarify this, we will add a short discussion on computational efficiency in the camera-ready version.

In summary, we are grateful for the feedback and encouraged by the reviewers’ recognition of the study’s value. We are committed to improving clarity and releasing all code and experimental details to support reproducibility and future research.

[1] Maier-Hein, L., Reinke, A., Godau, P. et al. Metrics reloaded: recommendations for image analysis validation. Nat Methods 21, 195–212 (2024).

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

The Missing Piece: A Case for Pre-Training in 3D Medical Object Detection

Author(s):