Abstract

In human-centered assurance, an emerging field in technology-assisted surgery, humans assess algorithmic outputs by interpreting the provided information. Focusing on image-based registration, we investigate whether gaze patterns can predict the efficacy of human-machine collaboration. Gaze data is collected during a user study to assess 2D/3D registration results with different visualization paradigms. We then comprehensively examine gaze metrics (fixation count, fixation duration, stationary gaze entropy, and gaze transition entropy) and their relationship with assessment error. We also test the effect of visualization paradigms on different gaze metrics. There is a significant negative correlation between assessment error and both fixation count and fixation duration; increased fixation counts or duration are associated with lower assessment errors. Neither stationary gaze entropy nor gaze transition entropy displays a significant relationship with assessment error. Notably, visualization paradigms demonstrate a significant impact on all four gaze metrics. Gaze metrics hold potential as predictors for human-machine performance. The importance and impact of various gaze metrics require further task-specific exploration. Our analyses emphasize that the presentation of visual information crucially influences user perception.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3852_paper.pdf

SharedIt Link: https://rdcu.be/dV5xt

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72089-5_38

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3852_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Cho_Misjudging_MICCAI2024,
        author = { Cho, Sue Min and Taylor, Russell H. and Unberath, Mathias},
        title = { { Misjudging the Machine: Gaze May Forecast Human-Machine Team Performance in Surgery } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {401 -- 410}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors focus on understanding how users perceive and interact with content provided by a machine (human-machine interaction), analyzing how user’s gaze patterns can predict and evaluate the performance of image-based registration tasks. The authors take into account different gaze patterns like fixation count, fixation duration, stationary gaze entropy, and gaze transition entropy. The authors hypothesize that gaze metrics can indicate human assessment errors and that the visualization paradigm can impact the gaze metrics. In a preliminary experiment which collected gaze data during a 2D/3D fluoroscopy registration, the authors determined that gaze metrics could potentially predict human-machine performance, though suggest that further task-specific experiments are required.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Well structured, clear formulation of the problem and description of the relevant metrics for assessment in the user study.
    • Rigorous experimental evaluation in a large participant population (22 participants, though 12 were exluded) across three different task paradigms involving different visualization strategies for displaying simulated 2D/3D fluoroscopy registration results.
    • Interesting application of gaze data to explore whether it can reliably predict outcomes of 2D/3D registration.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of information regarding the participants in the study (e.g., what was their level of familiarity with fluoroscopy data, 2D/3D registration assessment, etc.).
    • Lacking details on the strict inclusion criteria that lead to over half of the participant data being discarded (level of participant movement, amount of noise). Would a head-worn gaze tracker mitigate the impact of these issues?
    • Some justification for the limited dataset and discussion on the effect/impact on the results
    • Suggest a plan that others should take in future studies to avoid these data contamination challenges.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • This work is interesting, and I can imagine how it could be expanded to different surgical/medically relevant tasks to better understand how users perceive information for key tasks (e.g. intraoperative guidance).
    • For future studies that involve surgical actions based on information provided to a user, recording additional quantitative metrics like task completion time or accuracy (if performing a specific task) could provide further insight.
    • Combining the eye tracking dataset with EEG could offer additional information as to anticipatory potentials/stimulus response.
    • Also, it would be interesting to investigate a range of surgeons/physicians with varying experience levels to see if there are consistencies in the findings here – do novices/expert surgeons also exhibit the behavior of prolonged and deliberate gaze on areas where there could be information mismatch.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The major limiting factor of this work is in its lacking justification for the limited dataset, exclusion of the majority of participants in the study, and details on how the results may have been impacted
    • Introduction and review of prior literature/findings related to gaze patterns (fixation count, duration, etc.) in similar/tangential areas could have been covered in more detail
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision
    • In the rebuttal, the authors mention that information regarding the participants in the study will be updated (e.g., what was their level of familiarity with fluoroscopy data, 2D/3D registration assessment, etc.).
    • The authors also mention that details on the strict inclusion criteria that lead to a chunk of participant data being discarded will be discussed
    • Further to this, the authors describe that they will lay out suggestions for future research methods to avoid the impact of these challenges with participant gaze noise.



Review #2

  • Please describe the contribution of the paper

    The paper explores the use of gaze metrics as predictors of human-machine team performance in image-guided surgery. It investigates how different visualization paradigms influence gaze patterns and their subsequent impact on the accuracy of assessing 2D/3D registration errors in surgical settings. This study is significant as it provides empirical evidence linking gaze metrics to performance, suggesting that focused visual attention can improve the accuracy of technological assessments in surgery.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel Application. The study extends gaze analysis to the assessment of 2D/3D registration errors in surgery, a novel application of gaze metrics that bridges human cognitive processes with machine output in a clinical context.

    Strong Evaluation Methodology. The use of linear mixed models to analyze the relationship between gaze metrics and assessment errors allows for a robust statistical approach that accounts for individual variability among participants.

    Demonstration of Clinical Feasibility. By showing a significant correlation between certain gaze metrics (fixation count and duration) and lower assessment errors, the study provides a foundation for using gaze tracking to enhance human-machine interfaces in surgical systems.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Limited Novelty in Data Use. The gaze metrics used, such as fixation count and duration, are standard in eye-tracking research and their application here, while innovative in context, does not introduce new methods of capturing or analyzing gaze data.

    Potential Bias in Sample. The exclusion of a significant number of participants due to technical issues with the eye tracker could introduce bias, as it may affect the generalizability of the findings.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper benefits from a detailed methods section and a reproducibility checklist. However, the exclusion of data and issues related to the eye-tracking equipment used might pose challenges for replicating the study exactly. Providing access to the raw data or more detailed parameters of the eye-tracking setup could enhance reproducibility. Future replication efforts might benefit from using more robust eye-tracking systems or alternative methods to handle participants with eyeglasses, as these were major sources of data exclusion.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Data Exclusion. Consider exploring methods to include participants who wear glasses, as this is a common condition that could affect the external validity of the results.

    Technical Details. Provide more technical to aid in the reproducibility of the study. Details about the specific settings and limitations of the Gazepoint GP3 could be beneficial.

    Further Exploration of Non-significant Metrics. Investigate why certain gaze metrics did not correlate significantly with assessment errors and explore whether different analytical approaches could uncover hidden patterns.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors influencing my rating include the novel application of gaze metrics in a clinical setting, robust statistical analysis, and clear demonstration of clinical feasibility. The findings are promising and could lead to practical applications in enhancing human-machine collaboration in surgery.

    However, the weaknesses are the lack of novelty in data usage and the high rate of data exclusion.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper investigates the correlation of gaze data during assessment of 2D/3D registrations with the intent of using this to inform the presentation of visual information for image-guided surgery. The paper looked at four different gaze metrics to look at whether they may be uses as predictors of assessment performance and investigated three different visualization paradigms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Analyses: This paper included a solid, thorough analysis, including explicit inclusion of hypotheses and identification of limitations. 2) Interest: The way that users interact with the technology and visual information introduced to image-guided interventions is a very important topic, pertinent to the MICCAI community. 3) Clarity: This paper was well-written and clear with good inclusion of figures to enhance understanding.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Application: This paper could have been strengthened for this particular venue with a little more information about the application investigated and a stronger tie-back to image-guided surgery specifically 2) Missing details: Given the space restrictions, some details (ex. detailed descriptions of the paradigms investigated) and the bulk of the results were not included but adequate referencing was provided and detailed results were provided in supplemental material. 3) Aims: The specific aims and novel contributions of this paper could have been highlighted more explicitly.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    This paper was not algorithm based, but did not provide mention of access to the dataset or the user interface used. Other details of the experiments and analysis were provided with a good level of detail.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Major:

    1. Page 3, Methods, 2.1 Paragraph 1: Given that the authors highlight the fact that different experience levels can contribute to different gaze patterns in the Introduction, it is critical that they identify the experience level of their participants or the range of levels included. Minor:
    2. Introduction: I would recommend changing the flow slightly so that the last paragraph highlights the focus of this particular paper and what is aims and novel contributions are, as this is currently a bit buried in the second last paragraph.
    3. Figure 2: Some more labels or a more comprehensive caption would be helpful on the figure to understand the images and particularly the caption says “highlighting key features” but it wasn’t immediately clear what those features were referring to.
    4. Page 5, Results: Details about the models and analysis, and particularly the first paragraph may have been better suited in the above Methods section.
    5. Page 7, Results, 3.2, Paragraph 1: Is LLM a typo here?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, I feel that this paper was very well done and written with thorough analyses and thoughtful discussion. I believe that this paper is unique and of distinct interest to MICCAI’s image-guidance community.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all the reviewers for their valuable feedback. We appreciate the recognition of our “rigorous experimental evaluation” (R3), “solid, thorough analysis” (R4), and “strong evaluation methodology” (R5). We feel encouraged by the acknowledgment of our “interesting application of gaze data” (R3), “clear demonstration of clinical feasibility” (R5), and the relevance of our study as a “very important topic, pertinent to the MICCAI community” (R4). Below, we summarize and address the points raised.

Data Usage: We agree that the gaze metrics we use are standard in eye-tracking research and consider this a strength because the measures are well-established. The innovation of this manuscript derives from tailoring these metrics to 2D/3D registration assessment by computing weighted metrics with image similarity metrics like NCC in the areas of interest (cf. Sec. 2.2 and 2.3). Analyzing these weighted metrics offers a valuable perspective on gaze patterns during human-based assessment of spatial alignment that were not previously documented. We believe this contributes new knowledge to the field, and we will revise the Introduction to clearly highlight the aims and novel contributions (R4, R5).

Prior Work and Application: We appreciate the importance of incorporating relevant literature, and we will update the Introduction to include relevant studies on gaze analysis around fluoroscopy data and related fields (R3). Additionally, we will expand on the application and its relevance to image-guided surgery in the Introduction (R4).

Participant Details: We agree that participant details were insufficiently described. The recruited participants had a background in medical imaging, image processing, or both. We will update 3.1 Gaze Data Collection to state this explicitly (R3, R4).

Data Exclusion Criteria: Despite progress in gaze tracking hardware, reliable data collection remains a challenge for various reasons. Upon inspection of our sample, we had to exclude data from participants where significant portions of the gaze data were incomplete to ensure data quality. We will modify 3.1 Gaze Data Collection to clarify the exclusion criteria (R3, R5). We chose a stationary screen-based eye tracker for its anticipated accuracy in our tasks (i.e., inspecting an image and overlays on a screen) over a head-worn gaze tracker (R3). Despite this, we faced challenges with gaze tracking, mainly due to participants altering their initial position or interference from eyeglasses. We will enhance our Discussion and Conclusions with potential future research methods to mitigate these factors (R3, R5).

Justification for Limited Dataset: We acknowledge the limited size of our dataset resulting from the exclusion criteria. However, even with this constraint, we observed significant effects in our analyses with substantial effect sizes. We believe this study informs future research by identifying possible effects between gaze patterns and user responses, allowing proper power analyses for subsequent studies. We will update the Discussion and Conclusion to clarify this (R3, R5).

Discussion of Non-significant Results: Stationary gaze entropy and gaze transition entropy showed no significant correlation, likely because misalignment cues were presented statically, leaving users to decide where and when to look. Future research could explore dynamic paradigms that sequentially reveal misalignment cues, potentially making entropy calculations more controlled across individuals. We will ensure this discussion is clear and comprehensive (R5).

Discussion of Future Studies: We agree that investigating additional quantitative metrics, various medically relevant tasks, and surgeons with different experience levels is intriguing. We will update the Discussion and Conclusion to reflect these suggestions (R3).

Other Minor Points: We will improve the caption of Fig.2, move the models and analysis details to the Methods, and address the typo in the Results (R4).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I would argue for accepting the paper and encourage the authors to address the concerns of the reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I would argue for accepting the paper and encourage the authors to address the concerns of the reviewers.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Although limitations remain in obtaining experimental data and conducting quantitative analysis, the rebuttal has adequately addressed the major issues.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Although limitations remain in obtaining experimental data and conducting quantitative analysis, the rebuttal has adequately addressed the major issues.



back to top