Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Intraoperative X-ray imaging represents a key technology for guiding orthopedic interventions. Recent advancements in deep learning have enabled automated image analysis in this field, thereby streamlining clinical workflows and enhancing patient outcomes. However, many existing approaches depend on task-specific models and are constrained by the limited availability of annotated data. In contrast, self-supervised foundation models have exhibited remarkable potential to learn robust feature representations without label annotations. In this paper, we introduce DINO Adapted to X-ray (DAX), a novel framework that adapts DINO for training foundational feature extraction backbones tailored to intraoperative X-ray imaging. Our approach involves pre-training on a novel dataset comprising over 632,000 image samples, which surpasses other publicly available datasets in both size and feature diversity. To validate the successful incorporation of relevant domain knowledge into our DAX models, we conduct an extensive evaluation of all backbones on three distinct downstream tasks and demonstrate that small head networks can be trained on top of our frozen foundation models to successfully solve applications regarding (1) body region classification, (2) metal implant segmentation, and (3) screw object detection. The results of our study underscore the potential of the DAX framework to facilitate the development of robust, scalable, and clinically impactful deep learning solutions for intraoperative X-ray image analysis. Source code and model checkpoints are available at https://github.com/JoshuaScheuplein/DAX.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0521_paper.pdf

SharedIt Link: https://rdcu.be/eHw3n

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05127-1_14

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/JoshuaScheuplein/DAX

Link to the Dataset(s)

N/A

BibTex

@InProceedings{SchJos_DINO_MICCAI2025,
        author = { Scheuplein, Joshua AND Rohleder, Maximilian AND Maier, Andreas AND Kreher, Björn},
        title = { { DINO Adapted to X-Ray (DAX): Foundation Models for Intraoperative X-Ray Imaging } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {138 -- 148}
}

Reviews

Review #1

Please describe the contribution of the paper

The main contribution is a methodological adaptation of the DINO self-supervised learning framework with a large-scale, diverse dataset of over 632,000 intraoperative X-ray images.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

DAX represents a novel adaptation of the DINO framework incorporating domain-specific preprocessing and tailored augmentation strategies, significantly enhancing applicability to clinical intraoperative data. The dataset used is also significantly surpasses publicly available medical X-ray datasets like MIMIC-CXR and CheXpert in terms of size and feature diversity. Finally, the results demonstrate clear advantages over baseline methods, particularly notable in ViT-based models for classification and ResNet-based models in segmentation tasks.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

First, the study restricts its evaluation to three downstream tasks, which may not fully represent the breadth of real-world clinical scenarios. Second, the screw object detection task utilizes a synthetic dataset, potentially limiting real-world applicability, as they might not fully capture the variability and complexity present in real clinical imaging. Lastly, and most importantly, the manuscript lacks direct comparative evaluation with other existing foundation models adapted for medical imaging.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Unfortunately, I feel the weaknesses greatly outweigh the strengths presented in this work, in particular the lack of comparison to any baseline foundation models, like MedSAM, even though they mentioned in the introduction.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper
- application of DINO to medical imaging
- very large pretraining dataset for orthopedic xrays ( 632000)
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- work is well laid out and contributions are clear
- the three chosen downstream tasks are representative of orthopedic xray diversity and tasks.
- The work is clear and would aid a lot of downstream research done on bone /orthopedic imaging.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Section 2 - Dataset - the data needs a further breakdown than just general statements, this is important for reproducibility or extension of the work, how many images for each body part, there is a lot of different types of “extremities” imaged.
- Related to the above - you mention in your limitations an imbalanced dataset, this would be visible if you discussed your dataset further in your methods - maybe a table is best suited to present your pretraining dataset
- section 2 - body region classification - what are the 11 classes - I see them in figure 2 but reference them in the text
- downstream tasks - you mention for each of them the size of the task specific datasets, however there is no mention of how many within that set were used for training/ testing of the downstream task. I see in the table descriptions you mention 5 fold cross validation - did that use the entire dataset? there is no held out test set on which you applied the final model ? these details are important to contextualize your results
- You started a good discussion on the difference in performance of the VIT and resnet backbones for the downstream tasks. I think it would be good to discuss it a bit further, how does one know which backbone to use depending on the downstream task. If someone has to make a pretraining decision based on their desired downstream task, how is the pretrained backbone considered generalizable or separate.
- writing note- you mention throughout the text and in the title “intraoperative”. I understand that orthopedic imaging is often performed intraoperatively but none of the work or downstream tasks really apply themselves to true intraoperative tasks. I think the word might be misleading / not get to the desired audience, orthopedic may be better?
- reproducibility - will you be releasing code / data? given the page limitations of this submission, and the amount of experiments described, a code release could significantly improve the reproducibility of this work.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Section 2 - implementation details - it sounds like the pretraining on medical images was done on the architectures without their natural image pretrained weights. Is this correct? might be helpful to add a clarifying statement - most papers using these methodologies that you discuss load the trained models available from natural images, so it would be good to have the distinction here.
- I know there is a page / word limit - but It would be interesting for future work consider applying an new, unseen orthopedic body part (hip) in one of the downstream tasks and seeing how the pretraining handles it.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think this work is very relevant to what is occurring in the literature and is well written and clear. I think with dataset details added and code released, it would be very useful to other researchers in the same space. The pretraining dataset is not irrelevant to the performance of the downstream tasks, so readers need to know what it contains so they can contextualize the results of the three tasks.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I think if the author completes all that is promised in the rebuttal (open source code, more specific dataset details etc. ) I still think the word “intraoperative” is not appropriate for this work on the sole bases that a portion of the pretraining data may come from surgical context. All three downstream tasks explored (and the example given to R3) exist in a diagnostic or presugircal setting, and although it may at some point have an intraoperative application, this is not a specific feature of this work.

Review #3

Please describe the contribution of the paper

The main contribution of this paper is adapting DINO to X-ray (DAX) to orthopedic imaging for classification and segmentation of body regions, metal implants, and screw objects. This application extends to a large set of unclassified DICOM images that can be rapidly classified and segmented for analyses.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Strength: Application of DINO and Transformer models to large data. Comparisons to RESNET showing superiority.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Weaknesses: Can the authors describe some real-world examples of use cases of this? How will it change clinical management or enable analyses specifically?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall good contribution and adaption of DINO to orthopedic imaging.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have adequeately addressed this reviewer’s concerns and provided context for their model for clinical use.

Author Feedback

We sincerely thank all reviewers for the time and effort dedicated to evaluating our work and for the thoughtful, constructive feedback. We are especially grateful for the recognition of the paper’s strengths: the adaptation of DINO to orthopedic imaging was highlighted as a novel contribution (R1, R2, R3); our large and diverse pretraining dataset of over 632,000 X-ray images was considered a major asset, surpassing existing public datasets in scale and diversity (R1, R2, R3); and the selected downstream tasks were found to be representative of the domain of orthopedic X-ray imaging (R2). Reviewers also commended the manuscript’s clarity and organization (R2, R3), as well as its value for future orthopedic imaging research (R2). We appreciate the suggestion to compare with existing foundation models (R1) and agree that such evaluations would further enrich our work. However, our main objective was to develop task-agnostic, general-purpose feature extractors that can be fine-tuned for various applications. In contrast, models like MedSAM are typically optimized for specific tasks such as segmentation, making direct comparisons across our broader set of tasks - including classification, segmentation, and detection - nontrivial. Moreover, space constraints did not allow for an in-depth benchmarking study, which we plan to address in a follow-up publication. As R1’s critique centers on this missing comparison, while also recognizing our “novel adaptation of the DINO framework” and its “significantly enhanced applicability to clinical intraoperative data,” we respectfully believe this should not “outweigh the strengths presented in this work” (R1). We agree that additional downstream tasks would add even more value (R1, R2). However, the three tasks were deliberately selected to reflect distinct levels of visual understanding - from image-level to pixel-level - providing a focused yet balanced evaluation of model generalizability. For the third task, we used a simulated dataset (R1) due to the current lack of clinically annotated data for implant detection. We are actively working with clinical partners to curate such a dataset for future validation. More detailed dataset statistics and split information would indeed help better contextualize our results (R2). We will include the requested details (e.g., train/test split) for the downstream datasets in the final version. For the pretraining dataset, detailed metadata such as patient age or gender is not available, as data was provided in anonymized form by multiple clinical partners. Nevertheless, we will include additional information about the body region distribution. We thank R2 for pointing out the potential confusion around the term “intraoperative”. While our downstream tasks focus on orthopedic imaging, the pretraining data also includes images from other surgical contexts, such as pulmonology. Thus, we consciously chose “intraoperative” to distinguish our setting from diagnostic imaging. However, we will clarify this terminology and mention that all pretraining was performed from scratch (R2), without ImageNet initialization, to avoid domain bias. To address R3’s request for real-world use cases: DAX models could serve as a foundation for convenient AI applications in mobile C-arm systems (e.g., automatic selection of organ programs). For each acquired image, feature maps could be computed by default, enabling surgeons to perform specific downstream tasks via lightweight head networks. This setup not only reduces development time, but also the required training data, computational load, and memory compared to separate task-specific models. Finally, we will release the full source code for pretraining and downstream tasks upon acceptance, with a repository link added to the final paper. Thank you again for your constructive and encouraging reviews. We hope our clarifications address the concerns raised and look forward to the opportunity to present our work at MICCAI 2025.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper presents an adaptation of DINO to a large-scale X-ray dataset. All reviewers acknowledge the significance of the contribution, particularly the scale of the dataset and the thoughtful adaptation of the methodology to the clinical imaging domain. However, concerns were raised regarding the lack of comparison with existing medical foundation models such as MedSAM. The rebuttal addresses these points convincingly. Overall, the paper shows strong potential for broad adoption, and I recommend acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper proposes adapting the DINO framework to develop a foundation model for X-ray imaging. After reviewing the paper, the reviewer feedback, and the authors’ rebuttal, this Area Chair has two primary concerns that inform the current recommendation:
1. Limited Technical Novelty: The work involves a direct adaptation of DINO, which has already been applied to various datasets. The methodological contribution appears incremental, lacking sufficient innovation.
2. Insufficient Comparison to Other Foundation Models: For a foundation model claim, comparison with other approaches is essential. DINO is known to emphasize global feature matching, which often results in suboptimal performance on segmentation tasks. Including comparisons with alternative strategies, such as Masked Image Modeling, would significantly strengthen the paper and clarify its advantages and limitations.

back to top

DINO Adapted to X-Ray (DAX): Foundation Models for Intraoperative X-Ray Imaging

Author(s):