Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Digital training simulators play a growing role in orthopedic surgery, offering realistic, standardized, and risk-free learning environments without the need for constant expert supervision. To enable simulators with realistic tactile feedback and haptic sensations, accurate tracking of surgical tools and anatomical structures in real-time is required. However, existing object tracking solutions are often expensive, difficult to integrate into training workflows, or lack robustness. To address these limitations, we propose a novel visual-inertial 6D object pose tracking system for orthopedic surgical training. Our approach features a custom fiducial object that combines multiple ArUco markers with an Inertial Measurement Unit, a dual-camera setup to improve occlusion robustness, and a sensor fusion algorithm that integrates high-frequency IMU data with vision-based tracking while ensuring precise coordinate and time synchronization. In our evaluation, we achieve a fiducial object pose accuracy of 0.9 mm/0.5° and extract drill hole metrics in a mock surgical procedure with average position, angle, and length errors of 1.7 mm, 2.0°, and 1.0 mm, respectively, while demonstrating low occlusion rates. Our cost-effective and easily integrated solution meets clinical training requirements and marks a step towards scalable and widely accessible digital orthopedic simulators. The tracking code is available at https://github.com/MountainCoot/fusionpose.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4160_paper.pdf

SharedIt Link: https://rdcu.be/eHw0S

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05114-1_2

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MountainCoot/fusionpose

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HogMaa_6D_MICCAI2025,
        author = { Hogenkamp, Maarten AND Stauffer, Tobias AND Lohmeyer, Quentin AND Meboldt, Mirko},
        title = { { 6D Object Pose Tracking for Orthopedic Surgical Training using Visual-Inertial Sensor Fusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15968},
        month = {September},
        page = {13 -- 23}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper describes the design and testing of a hybrid tracking system that could be used for surgical training and skills assessment. The system combines low cost tracking equipment (ArUco tags and inertial measurement units) to create a lower cost system that has accuracy comparable with commercial tracking systems. Accuracy is compared by tracking the same object with the proposed system and an existing commercial optical tracking system.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Development of lower cost tracking systems is an important tool to improve access to state of the art training and research methods. The authors demonstrate that such a system can be built and provides acceptable tracking accuracy for some applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

None of the presented work appears to be novel. Sensor fusion itself has been around for many years. The system described in the paper combines existing open source software libraries for object tracking (OpenCV, dodecapen) and sensor fusion/filtering (gtsam). If the system described in the paper were itself an open source application, allowing researchers to easily build and use similar systems, then the lack of novelty would probably be OK. But I don’t think that is the case, at least there is no link nor reference to the software in the paper.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
This is a well designed system and well written paper that can help address a real world problem. I encourage the authors to make their code available as an opensource application to allow researchers to apply to their own tracking problems. As written I think it lacks the novelty to make it into MICCAI, but would certainly be publishable elsewhere if accompanied with software. I’ve got a few small questions arising from the paper.
- In figure 1 you show the raw sensor data (from optical tracking and IMU) splitting with some going into sensor fusion and pose estimation and some appearing to go nowhere. Where is the raw data going? I’m guessing it is being saved to disc for possible later analysis but this is not clear.
- You describe an iteration to remove individual markers to reduce re projection errors. Can you expand on the logic of how this design decision was reached. My recollection of the ArUco library is that it provides functionality to estimate tracking quality for individual markers. Also depending on what thresholds you chose you may be better off with more markers but a higher re projection error. Phrasing it the mathematically well established terms of fiducial localisation error, fiducial registration error, and target registration error (see Fitzpatrick et al 1998), as you reduce the number of markers used, the fiducial registration error(re-projection error) will reduce, all the way down to using only one marker. But that doesn’t correlate with a reduced target registration error(actual object tracking accuracy).
- Why do only sample the cameras at 20Hz? Their advertised frame rate is 75Hz.
- Figure 4 may be very hard to interpret for colour blind readers.
- The inclusion of the drilling simulation is of limited value, as most of the errors appear to come from other (non-tracking) sources. You could omit it to allow more space for technical discussion of your system.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There’s a lack of novelty in the system as described, it appears to be an assembly of existing methods. If the authors supported the paper with a full opensource implementation then it would be of interest to the MICCAI audience, enabling other researchers to build their own affordable tracking systems.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have committed to publishing their software which addresses a significant weakness of the original submission. The rebuttal also addresses some of my concerns about novelty. The authors should ensure that the revised submission highlights the novel parts of their approach.

Review #2

Please describe the contribution of the paper

The paper proposes a real-time tracking system based on visual and inertial cues that consists of two cameras and uses multi-marker fiducials with IMU sensors attached to objects. An algorithm for integrating camera and IMU data is also proposed.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The idea of integrating data from camera and IMU sensors is interesting and allows for faster tracking of the fiducials when compared to solely using cameras.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- This is more a systems integration paper than a medical paper. Without showing any real application in the medical field, I do not consider this submission completely appropriate for MICCAI.
- It is not evident that the errors in the hole drilling experiment are sufficiently low for an orthopedic application.
- An alternative experiment that could be executed to assess the system’s precision that does not depend on optical tracking is to rigidly attach two or more fiducial objects and measure how their relative pose changes.
- It would also be interesting to know the system’s performance when considering only the two cameras.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Minor comments:
- Subsection “Digital Twin”: demonstrated quantitatively in Table 1b
- Table 1: Typically errors are provided as RMS or mean +- std. RMS already takes into account std
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although I acknowledge the practical relevance of the proposed system, I have some concerns regarding the experimental section that I would like the authors to comment on in the rebuttal.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

All reviewers raise relevant points about this submission, e.g lack of novelty (R1) and lack of clarity on whether the proposed system’s accuracy meets the medical requirements (R2). As initially stated, I share these concerns and thus I am keeping my original recommendation.

Review #3

Please describe the contribution of the paper

The main contribution of the paper is a fusion framework that integrates visual camera data and inertial measurements to for 6D pose estimation of a tool. The authors proposed the method in the context of surgical simulation, specifically for orthopedics, and the approach can be further generalized in image guided interventions where tracking systems are used. The proposed method leverages the complementary strengths of vision-based tracking (high spatial accuracy) and IMU data (high frame rate and resilience to occlusions).
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well written and organized. The approach leverages existing approaches, including DodecaPen for vision-based tracking, and a robust fusion algorithm to combine visual and inertial data for 6D pose estimation. The methods section included sufficient mathematical formulations to support the proposed fusion method.

The experimental design utilizes an optical tracking system to provide ground truth for accuracy assessment. And the experiments involve both ideal setup with a tracked stylus, and more clinically relevant setup with a drill.

Results are clearly presented, including quantitative accuracy comparisons against ground truth as well as partial system components, which serves as an ablation study, and the results are encouraging for clinical applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The following weaknesses are noted:
1. The paper did not define an acceptable range for pose accuracy or frame rate in surgical simulation. The accuracy results are compelling, but it is unclear whether the reported results meet clinical requirements.
2. The specifications of the ground truth system (optical tracker) is not specified, including tracking accuracy, spatial resolution of the tracked instrument, and refresh rate. These numbers are important to interpret the measurable errors in the components and in the overall system.
3. When using the tracked stylus, the performance is evaluated over a relatively large spatial volume (200 mm × 200 mm × 300 mm; Figure 4b), especially compared to the drill experiment (Figure 5b). It is unclear from the paper what the practical workspace of typical orthopedic surgical procedures might be and the authors should include this information and discuss the clinical relevance to provide context.
4. The temporal resolution of the system should be further clarified. For instance, Figure 4b shows a 2.9s which is not explained in the paper, and the latency between real-world motion and corresponding VR representations is not presented, which is different than the frame rate. Additionally, with the fusion algorithm it may be implied that the system performance matches the higher frame rate of the IMU (200 Hz) rather than the slower camera frame rate (20Hz), but this important detail was not explicitly described in the paper. A direct comparison against the ground truth frame rate from the optical tracking system should also be included.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

On page 7 under Digital Twin, “qualitatively” seems to be a typo. Table 1b is quantitative.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The overall score is primarily due to clinical significance, technical contribution, experimental design, and overall performance achievements of the paper. The paper contributes towards practical, low-cost 6D tool tracking without relying on optical or electromagnetic tracking systems. The approach builds on existing works, leveraging advantages from different systems to improve the accuracy and efficiency of the proposed system. The results are promising when compared to gold standard optical tracking. While the paper could benefit from further clarification and discussion, the weaknesses can be addressed through rebuttal.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Rebuttal is satisfactory

Author Feedback

We would like to thank all reviewers for their constructive comments. We are pleased that our contribution in addressing affordable object tracking in orthopedic surgical training is recognized (R1-R3) with encouraging results (R1, R2), and we value the comments on the paper’s clarity and writing (R1, R2). Below, we address major concerns, followed by specific points. Abbreviation: Inertial Measurement Unit (IMU) (R1, R3) Limited reproducibility: We will release our fiducial tracking code for research use on GitHub before the conference and will provide a link. (R1) Lack of novelty: We fully agree that ArUco tracking and sensor fusion are established techniques. Our novelty lies specifically in combining ArUco and IMU for pose estimation in orthopedic training while tackling unaddressed practical limitations: (1) Our pipeline directly incorporates spatial-temporal IMU-camera calibration, which is key for accurate fusion. (2) We mitigate IMU drift (see Fig. 4b) with factor-graph-based smoothing (compared to filtering as in Enayati et al. 2015). (3) Our system supports multi-fiducial tracking, enabling object interaction. We will highlight the novelty more explicitly. (R3) Concern about applications: We acknowledge that our paper focuses on system integration of the proposed method. However, our evaluation demonstrates the ability to easily track tool-bone interactions in a relevant surgical setup with robustness to occlusions. Notably, we can readily track additional tools and more realistic anatomical models to enable applications such as AR guidance, skill assessment, or data analysis of poses and IMU/video streams. (R2, R3) No reference values for tracking requirements: We agree that our results require discussion within a clinical context. Varying accuracy bounds for orthopedic surgery are suggested in literature (e.g., up to 5° angular offset for hip implants, 3° for knee arthroplasty (DiGioia et al. 2004) and 1mm/5° for pedicle screw placements (Rampersaud et al. 2001)). With a RMSE of ≈1mm/0.5° (fiducial marker) and ≈1-2mm/2° (mock setup), our system is suitable for training of many orthopedic surgery cases. Moreover, our camera field of view (≈59°x50°) and working distance (0.4-0.7 m) result in a tracking volume that fits a general phantom setup (Fig. 1 and 4b). Lastly, we achieve a frame rate comparable to commercial systems to capture fast events (e.g., tissue penetration during drilling). We will incorporate a brief discussion about these points. (R2) Ground truth specifications: We will add the specifications (RMSE of 80 µm up to 2 m @ 335 Hz). (R2) Temporal resolution and offset: We provide poses at 200 Hz (IMU frequency) with a latency equal to the smoother lag (100 ms), which can be reduced for lower latency at the cost of accuracy. We will clarify the “2.9 s” in Fig. 4b, which is the snippet duration of the plotted poses. (R1) Mock surgery relevance: While we acknowledge that some errors are not tracking-related, we argue that the overall error, with our system error at its source, allows for comparison to clinical context. (R1) Marker removal strategy: We agree that lower reprojection error does not guarantee better poses. With this step we reject clear outliers that can occur due to partially occluded markers with wrong corner detection. We will clarify this paragraph. (R1) Camera sampling rate: By detecting ArUco markers at 20 Hz, we limit the most CPU-heavy step, which is beneficial for lower-end PCs. Moreover, we did not observe better fusion results at higher camera frame rates. (R3) Additional experiments: (1) A dual-camera evaluation might be interesting but would change our approach significantly, with changes to the state vector (no bias) and temporal synchronization (currently IMU-reliant). (2) We acknowledge the validity of the relative transform test for a future evaluation. (R1-R3): We thank the reviewers for pointing out small issues in figures, tables, and typos, all of which will be addressed.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Despite its relevance to CAI and possible practical application, the novelty and experimentations are limited.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The authors have provided a strong rebuttal that adequately addresses concerns around novelty, clinical relevance, and reproducibility. Their commitment to open-sourcing the software improves the value of the work to the MICCAI community, and the clarified positioning within the CAI category strengthens the case for acceptance. While the work is more integrative than groundbreaking, it is well-executed, clinically motivated, and demonstrates practical utility—justifying acceptance.

back to top

6D Object Pose Tracking for Orthopedic Surgical Training using Visual-Inertial Sensor Fusion

Author(s):