Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The number of kidney stone cases in the U.S. has tripled since 1980. Unfortunately, 23\% of kidney stone removal surgeries require repeat procedures within 20 months. The high repeat surgery rate is often due to surgeons missing stones in the initial treatment. Effective training can reduce the need for re-operation, yet learning opportunities in the operating room (OR) are limited, as patient care must take priority. Augmented reality (AR) can improve the effectiveness of training in the OR by providing real-time visual feedback, but the interface must be carefully designed not to interfere with clinical workflow. Building on prior design guidelines and AR training tools, we design and evaluate the effectiveness of enhancing training through three AR gaze markers. Our AR training system tracks the expert’s eye gaze and projects the marker onto the trainee’s head-mounted display to provide visual guidance. Eight trainees performed a simulated ureteroscopy task of identifying kidney stones in high-fidelity kidney phantoms while guided by an expert. We record the number of stones they found, time, and eye-gaze metrics. At the end of each trial, trainees provide subjective feedback on task load and performance through the NASA-TLX questionnaire. Results show that while some gaze markers increased perceived mental demand, they enhanced engagement and performance. Gaze metrics revealed that marker shape affects cognitive load, as measured by the fixation-to-saccades ratio. By translating prior design principles into an AR-based guidance system, this work supports intraoperative training and highlights AR’s potential in surgical education.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4731_paper.pdf

SharedIt Link: https://rdcu.be/eHw56

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05141-7_21

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/4731_supp.zip

Link to the Code Repository

https://github.com/li-fangjie/Kidney-Hololens-Gaze-Track

Link to the Dataset(s)

N/A

BibTex

@InProceedings{AtoJum_From_MICCAI2025,
        author = { Atoum, Jumanh AND Li, Fangjie AND Acar, Ayberk AND Kavoussi, Nicholas L. AND Wu, Jie Ying},
        title = { { From Sight to Skill: A Surgeon-Centered Augmented Reality System for Ureteroscopy Training } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {209 -- 218}
}

Reviews

Review #1

Please describe the contribution of the paper

Addressing the need to improve surgical training due to its complex and time-intensive nature, the authors present an Augmented Reality (AR)-based system designed for intraoperative ureteroscopy training. The motivation stems from the fact that surgical training for certain procedures is primarily restricted to real surgeries in the operating room, where feedback for trainees is limited and typically delivered through verbal communication with expert surgeons. To enhance the learning process, the authors propose incorporating visual holographic aids using the Microsoft HoloLens 2. Specifically, they investigate the potential benefits of three different holographic gaze markers that provide visual guidance to surgical trainees during kidney treatment. The experimental setup involved one expert surgeon and eight surgical trainees. ArUco markers were attached to a monitor displaying the ureteroscopy live video feed, allowing the HoloLens 2 to register the monitor and create a shared holographic gaze marker experience between the expert and trainees. A custom-built kidney phantom with artificial kidney stones was used in the study, where surgical trainees performed simulated ureteroscopies under expert guidance. Using different holographic gaze marker conditions, the trainees were tasked with identifying kidney stones within the phantom. To assess the effectiveness of the approach, the study employed both subjective and objective measures. Subjective feedback was gathered using the NASA-TLX questionnaire, while objective metrics included eye gaze data and task completion times. The results suggest that holographic eye gaze markers can potentially reduce cognitive load and improve task engagement, with outcomes varying based on the specific gaze marker design.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Overall, the paper is well thought out, with a relevant clinical motivation and a noteworthy contribution to the growing — yet still underrepresented — field of AR-assisted surgical training.

The Methods section clearly outlines the proposed concepts, and both the experimental setup and results are presented in a structured and transparent manner.

The inclusion of one expert surgeon and eight surgical trainees in the user study adds credibility and demonstrates the practical utility of the proposed AR-assisted training approach.

The authors also developed four high-fidelity custom kidney phantoms for the study, validated by an expert surgeon, which highlights their hands-on approach, attention to realism, and effective interdisciplinary collaboration. Furthermore, the design of novel eye gaze markers in collaboration with a certified urologist demonstrates thoughtful integration of clinical expertise into the technical design process.

The authors provide a thoughtful self-assessment of their work in the Discussion and Limitations section, which adds to the overall credibility and scientific rigor of the study.

It is also clearly stated that the user study received official approval, which appropriately addresses ethical considerations.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Despite the positive aspects of this paper, several weaknesses should be addressed:

Overall, the work appears to be a system design paper that does not emphasize novel software-side technical contributions. In this context, a more in-depth technical description of the system’s components would be beneficial. For instance, in Figure 2, three coordinate system transformations are referenced, but not explicitly defined. Providing clear mathematical formulations would help readers better understand the underlying computational steps.

A clear definition of the user study task is missing. For example, how is task completion time measured when a participant identifies a kidney stone? What is the exact protocol followed in such cases? Is verbal confirmation from the participant sufficient to mark the stone as “found”? Clarifying these aspects — including how task completion time is consistently measured — would strengthen the methodological transparency of the study.

The conclusion that M2 is the most effective marker, based on the results in Tables 1 and 2, appears somewhat vague and potentially subjective. For example, while M2 shows favorable values in terms of Total Distance and Fixation/Saccades in Table 1, and scores highest in the “Performance” dimension of the NASA-TLX in Table 2, the relationship between these metrics is not clearly explained. Is the combination of these factors sufficient to justify M2 as the best marker, or is there a risk of confirmation bias? A more in-depth analysis of the relevance and weight of each gaze metric and NASA-TLX component would strengthen the argument and help avoid overinterpretation.

Providing a link to the source code would certainly enhance the reproducibility of the work. If the authors do not intend to make the code publicly available, a brief statement explaining this decision would be appreciated. Alternatively, an anonymized URL to a project page with additional technical details could also help improve transparency and reproducibility.

Additionally, as an optional comment, a supplementary video illustrating the collaborative HoloLens 2 view could be highly beneficial. It would help readers better visualize the participant experience and gain a clearer understanding of the AR setup and interaction during the study.

Finally, the paper would benefit from a clearer description of the specific task performed during the exploration of the kidney phantoms. A more detailed explanation of what participants were instructed to do — and how task success was measured — would help contextualize the results.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Dear Authors,

This is a well-motivated and timely contribution to the still emerging and underrepresented field of AR-assisted surgical training. While your system may not introduce novel technical methods per se, it presents a creative integration of existing tools and software components to address an important application area.

To strengthen the paper, I would suggest including more technical detail, even if only in supplementary material such as a video or through a URL to a project page, to clarify design decisions and system architecture. In addition, the description and interpretation of the experimental results would benefit from further elaboration to more convincingly support your claims.

At this stage, I lean toward a weak reject, primarily due to the current level of technical depth and the limited clarity in parts of the evaluation. In any case, further clarification on these points would be helpful. Regardless of the outcome of this MICCAI submission, I believe the work has the potential to serve as a strong foundation for an extended journal publication.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite its overall solid structure and meaningful contribution to a niche field, I have two major concerns regarding this paper:

1.) The lack of a novel technical contribution, and

2.) The focus, design, and evaluation of the clinical application study.

While the study does provide indications of potential benefits from using AR-based eye gaze markers, the description of the study design and the interpretation of results could be improved. For more details, please refer to my comments under the Weaknesses section.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper presents a user study about using augmented reality (AR) to help the training of young surgeons in the context of ureteroscopy. During a training session both participants, a trainer and a trainee, wear head-mounted eye tracker/ AR system. With this system, the trainee can follow the gaze path of the trainer. The hypothesis is that this solution improves the training compared to the classic feedback (vocal feedback only). Three different AR markers are evaluated and compared to the classic feedback on four different kidney phantoms. The study implied one trainer expert and 8 trainees. The use of the markers is evaluated thanks to a NASA-TLX scale and visual metrics.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The major strengths of the paper are:
- Innovant method to help training
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The major weaknesses of the paper are:
- The paper does not present statistical analysis to support the findings.
- The study is built upon an authors’ previous study. This study is never referenced. To ensure the double-blind review, this study should be anonymously referenced.
- As for each trainee “the order of visual guidance markers and phantom numbers was randomized”, the performance metrics could not be compared only based on the markers used. Even if the difficulty of each phantom is similar according to an expert surgeon. The difference could have an impact on performance metrics.
- The different levels of expertise of trainees could have a high impact on performance metrics. A PGY 4 trainee has more experience than a PGY1, that could have a high impact.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- The number of kidney stones by phantom is not mentioned. This information is important to know if the metric “number of Stones” is good.
- The description of M3 is not very clear. What represents each circle?
- It would be good to assess the kidney difficulty of the participant to ensure that all kidneys are similar.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The absence of statistical analysis.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors’ response is thorough and convincing, addressing my previous concerns in a satisfactory manner. I appreciate the additional statistical analysis they have conducted, even if the statement of superiority of M2 should be mitigate. Furthermore, their plan to release a link to the source code and provide additional technical documentation demonstrates a commitment to transparency and reproducibility. My concern about trainee expertise level and phantom difference has been correctly answered. Overall, I am pleased with the authors’ response and believe that it significantly enhances the quality of their submission.

Review #3

Please describe the contribution of the paper
The paper investigates how the design of AR markers impacts cognitive load and task engagement in the context of surgical training using Head-Mounted Displays (HMDs). The authors:
- Develop three AR marker designs based on existing guidelines, refined through iteration with a clinical expert.
- Implement an AR-based system that projects expert eye gaze onto a trainee’s HMD during simulated kidney stone surgery.
- Conduct a user study combining eye tracking metrics and NASA TLX responses to evaluate the markers’ impact on visual attention and perceived workload. The primary contribution is the empirical evaluation of visual marker design in an intraoperative AR training scenario, showing that marker choice can influence user performance and experience.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The evaluation includes both subjective and objective metrics, which strengthens the methodological rigor and significance of results. The clinical involvement in the marker design process adds realism and relevance to the study.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1 ) Lack of clarity in system description: While the paper’s main contribution is the evaluation of marker design, the system used to conduct this evaluation is not clearly described. This creates difficulty in fully understanding how the markers are perceived and interpreted by users. There is an extensive focus on marker design justification (already covered in previous work), but less attention is given to explaining how the overall system operates, step by step. For example,
1. the surgeon looks at the laparoscopic screen,
2. eye gaze is captured and projected onto the screen,
3. it is rendered for the trainee via HMD, adapting to their viewpoint.
2) While the paper mentions projecting the gaze “onto the tracked monitor plane and sharing it with surgical trainees,” it is unclear how the adaptation to the trainee’s position is managed, especially given the use of ArUco markers and external displays. Readers must rely on a figure that lacks clarity and does not explicitly show the trainee’s visual experience. Overall, even if the application itself is not the core contribution, it is a necessary foundation for understanding the marker evaluation, and its incomplete description weakens the overall impact of the paper.

3) Unclear if system limitations influenced marker preference:The preference for M2 may be affected by system setup. For example, changes in viewing angle could impact marker visibility. Without discussing such limitations, it’s hard to isolate marker performance from system influence.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The reason for my recommendation is that the paper is partially clear, but the methodology and system architecture require further elaboration. Descriptions of how the gaze data is captured, processed, and projected are vague. In particular, the figures should better illustrate the trainee’s point of view and how the AR markers appear during the task. More clarity would help readers assess the validity and generalizability of the results.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

Even though the authors provided a more detailed description of the system in the rebuttal, the paper still lacks a clear technical contribution. The system relies on existing tools (HoloLens, MRTK2, PUN) and standard projection methods, without introducing a novel technical approach.

Author Feedback

We thank the reviewers for recognizing the relevance and timeliness of our work in AR-assisted surgical training. R3 describes the paper as “well thought out” and “a noteworthy contribution to the growing—yet still underrepresented—field of AR-assisted surgical training.” R3 and R4 highlight the novelty of using eye gaze markers designed by trainees for real-time guidance, while R1 and R3 commend the use of both subjective and objective evaluation metrics. In this rebuttal, we address key concerns raised by the reviewers, including statistical and analytical limitations, technical clarity of the AR system, and ambiguities in study design. To address the lack of system clarity (R1) and provide a detailed technical description (R3), we outline the system workflow and gaze capture steps: 1-Both users wear HoloLenses and perform built-in eye gaze calibration 2-Eye gaze is automatically captured using HMD sensors and MRTK2’s eye tracking APIs. 3-Users stand side by side facing a surgical monitor with four ArUco markers placed on four corners for localization. 4-Each HoloLens creates a virtual screen matching the monitor’s position and size. Eye gaze is captured and projected onto this virtual screen. The projected expert gaze (T_expGaze) and expert-to-screen transform (T_e2s) are sent via PUN (TCP client). The trainee’s device computes the expert’s gaze on trainee’s screen by applying T_t2s × inv(T_e2s) × T_expGaze (R3). This ensures all markers remain visible from all directions and avoids marker bias by using a 2D projection (R1). 5-The expert’s gaze appears in the trainee’s view as a test marker. We will clarify the system workflow and add a video to capture user experience using our AR application in the camera-ready version upon acceptance (R1, R3). To clarify experimental setup (R4), each phantom contains five stones. For M3, the largest circle represents the expert’s gaze, while smaller circles indicate its history. Regarding the concern that phantom difficulty may affect performance (R4), we agree that variability exists. However, all phantoms were based on real kidneys from patients who underwent surgery, excluding those with anatomic variations that would complicate endoscopy. These phantoms reflect typical anatomy, and our analysis of users’ performance over each phantom showed no significant differences. On trainee expertise variation (R4), we note that all participants received all three levels of guidance. Since phantom difficulty was consistent, performance differences are attributed to marker effectiveness rather than expertise. To address task clarity concerns (R3), the task began once users validated the eye gaze projection. Users decided when they were done exploring the phantom, defining completion. They navigated the ureteroscope with expert guidance and identified stones verbally, confirmed by both the expert and the in-room researcher. We recorded the number of stones found and the time taken to complete the task, defined as task completion time (R3). While our T-test did not show significant differences across the three markers, we observed M2 showed the shortest total distance and the lowest fixation-to-saccade ratio (R4). This supports the users’ subjective preference for M2 but a larger cohort is required to validate these findings (R3). Our work serves to power future studies. We highlight trends in how markers impact objective eye gaze and subjective NASA-TLX metrics. This approach aligns with prior work [1,2] linking eye movement features to NASA-TLX scores in surgical contexts (R3). To ensure reproducibility, upon acceptance, we will provide a link to the source code for gaze metric calculations and the AR application, along with technical documentation for setup and use (R1, R3). References: [1] Subjective and objective quantification of physicians’ workload and performance during radiation therapy planning tasks. (2013) [2] Towards Cognitive Load Assessment Using Electrooculography Measures. (2023)

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
This paper presents an AR system for surgical training that evaluates how different gaze marker designs affect cognitive load and task engagement during a simulated kidney stone removal procedure using a HMD.

The reviewers agree that the paper addresses a relevant application and that the study is clinically motivated with a practical user evaluation. Reviewers noted that:
- the design of the markers was informed by existing guidelines and refined in consultation with a clinical expert.
- The use of both subjective and objective measures and the inclusion of surgical trainees and an expert surgeon lend credibility to the study.
- The use of high-fidelity kidney phantoms and a collaborative HoloLens-based setup was also seen as positive.
However, the reviewers also raise concerns regarding the clarity and completeness of the system description, including:
- The manuscript lacks sufficient detail on how the AR system functions—especially how the gaze is captured, tracked, and projected to align with the trainee’s point of view.
- Figures do not clearly show what the trainee sees, and system limitations that may influence marker visibility or preference are not fully discussed.
- The reviewers also point out that coordinate transformations and system components (e.g., monitor tracking with ArUco markers, HoloLens calibration) are mentioned but not sufficiently explained, reducing reproducibility.
- Task definitions (e.g., how stone identification is confirmed) and the basis for concluding M2 is the best marker are seen as somewhat vague.
- A deeper interpretation of the results, clarification of evaluation metrics, and justification for marker ranking are needed.
While the work was appreciated for its interdisciplinary approach and practical relevance, reviewers recommend including additional technical details and a clearer explanation of task protocols and system behaviour. These improvements would support transparency and reproducibility and better justify the conclusions drawn.

The work seems promising and potentially impactful but further clarification and additional detail are needed, thus we invite the authors for a rebuttal.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

From Sight to Skill: A Surgeon-Centered Augmented Reality System for Ureteroscopy Training

Author(s):