Abstract

Accurate detection of vulvovaginal candidiasis is critical for women’s health, yet its sparse distribution and visually ambiguous characteristics pose significant challenges for accurate identification by pathologists and neural networks alike. Our eye-tracking data reveals that areas garnering sustained attention - yet not marked by experts after deliberation - are often aligned with false positives of neural networks. Leveraging this finding, we introduce Gaze-DETR, a pioneering method that integrates gaze data to enhance neural network precision by diminishing false positives. Gaze-DETR incorporates a universal gaze-guided warm-up protocol applicable across various detection methods and a gaze-guided rectification strategy specifically designed for DETR-based models. Our comprehensive tests confirm that Gaze-DETR surpasses existing leading methods, showcasing remarkable improvements in detection accuracy and generalizability. Our code is available at https://github.com/YanKong0408/Gaze-DETR.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0974_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/YanKong0408/Gaze-DETR

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Kon_GazeDETR_MICCAI2024,
        author = { Kong, Yan and Wang, Sheng and Cai, Jiangdong and Zhao, Zihao and Shen, Zhenrong and Li, Yonghao and Fei, Manman and Wang, Qian},
        title = { { Gaze-DETR: Using Expert Gaze to Reduce False Positives in Vulvovaginal Candidiasis Screening } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed a DETR-based candida detection system using expert’s gaze information to reduce false positives. The experiment evaluation showed incorporating the gaze information can improve the detection performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well written and the presentation is well organized.
    2. Incorporating gaze information in object detection system is relatively new.
    3. Experimental evaluation is comprehensive and includes an ablation study.
    4. The author claimed to make the implementation publicly available.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The gaze data are noisy and temporal, how do you generate the gaze point and gaze only bounding boxes from the gaze data? Is there any preprocessing applied to the gaze data? Rather than simply providing a reference, it is better to provide more details (maybe in the supplementary material), as the gaze information play a significant role in your work

    2. My understanding is that the gaze data was collected during the annotation by the pathologist, is it correct? How many years of experience does the pathologist have? Please clarify

    3. Can you provide more detail about “The gaze information is not required during inference.”? It would be helpful to detail the inputs/outputs for the training and inference process.

    4. Do you compare your work with other detection works that incorporate gaze information?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the weakness section. In addition, I would suggest providing a more comprehensive description about Fig.2 (B) to make this figure self-contained.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper’s method section is well organized and well written. The experiments is reasonable.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper finds that sustained attention not marked by experts are often aligned with false positives of predictive neural networks. Using this finding, the paper introduces a method that integrates gaze data to enhance a model’s precision by diminishing the false positives. The approach entails using a gaze-guided warm-up protocol applicable across various detection methods as well as a gaze-guided rectification strategy designed for DETR-based models. The gaze-guided warm up strategy allows their detector to gain insights into confounding instances similar to candida, thus solving the issue of imbalanced candida quantities.The method shows significant improvement in detection performance while at the same time maintaining generalizability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is the first one to present a method to leverage eye gaze data to improve the detection of candida regions.

    1. The paper is overall well written and organized other than minor writing based errors.

    2. The provides adequate visualization while explaining a concept or finding, which makes reading and comprehending easy.

    3. Experiments are well designed and the evaluation of the proposed model is good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It is not clear what the sources of error are in this sentence: “Due to eye-tracking device errors and the gaze processing method, the generated ‘gaze only’ boxes tend to be offset from the corresponding objects..”. What is the error margin in the eye-tracking device? What is meant by “the gaze processing method” - what type of error could be introduced due to the processing method? This is not clear.

    2. At what magnification are the crops made in the WSI? (“Our in-house dataset consists of whole slide images from patients with candida, which are cropped into patches of 1024 × 1024..”)

    3. In Fig. 1, it is not clear what model is used to obtain the predicted bounding boxes. In the text, the authors mention “RetinaNet [12]”. However, this information should be there in the image caption as well.

    4. Some failure cases of the proposed model should be included to better understand the scope for improvement.

    5. The eye gaze tracker details are missing e.g. sampling frequency etc.

    6. Does the temporal sequence in which different image regions are visited (in other words the scanpath trajectory) impact the model performance (rather than just the spatial heatmap of eye gaze)?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper has some writing flaws which need to be addressed. In the rebuttal, I would like the authors to address the concerns I mentioned in the weakness section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the proposed method is novel for this task and the paper is well organized. However, some details are missing in the paper which should be included for better reproducibility and comprehension. With these changes included, I would lean towards acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose to utilize pathologists’ gaze data as auxiliary information when training vulvovaginal candidiasis (VVC) detection models. Their objective is to encourage the model to focus on areas that pathologists pay attention to, but go unannotated for valid reasons (e.g., false positives). They utilize a two-fold training process: (1) gaze-guided warm-up, and (2) gaze-guided rectification. The authors evaluate the model’s predictions, and find their method improves VVC detection performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-written, with detailed figures and captions.
    • The paper provides detailed explanations of the methods used.
    • The evaluation and ablation studies quantify the effect of each component proposed.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper lacks adequate reference to related work on VVC prediction and gaze data-based guidance, and to support claims such as “Despite notable progress, the existing methods still struggle with the precise identification of candida” and “The reasons include a) the imbalance between candida and background cells, and b) the thin, faint appearance and frequent occlusions”.
    • Section 2.1 lacks essential information for reproducibility. The authors mention that “… small-sized regions are eliminated, and gaze boxes are extracted by detecting contours within the remaining regions.” What steps were involved in detecting contours? Were the images binarized? How were the contour detection parameters chosen? How tight are the contour bounds?
    • In Section 2.3, the authors mention “…to make ‘gaze only’ boxes better reflect the confounding areas, we replicate the ‘gaze only’ boxes multiple times (four times in our model)”. What was the rationale behind this choice? Why 4 times? What if instead, the boxes were simply enlarged by a factor?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • I recommend the authors to describe (or illustrate) this process of gaze bounding box creation.
    • I recommend adding citations to prior work on gaze data fusion.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The proposed method is technically sound.
    • The paper is well-written, with detailed figures and captions.
    • The paper provides detailed explanations of the methods used.
    • The evaluation and ablation studies quantify the effect of each component proposed.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Q1: AC mentioned lack of essential Implementation information. Here are some implementation details: 1.Gaze preprocessing details: the size of the Gaussian kernel adopted to generate gaze heatmap is 119*119; After normalization, we covert the heatmap into binary mask via a threshold of 0.1; To reduce potential noise of gaze data derived from the expert’s distraction, bboxes generated from the binary gaze mask with both height and width less than 128 pixels are removed. 2.The Magnification of the crops made in the WSI: 200 3.Eye gaze tracker details: We collect the eye-tracking data with the Tobii 4C eye-tracker that records binocular gaze data at 90 Hz. To enhance the replication of our study, we have released source code on github as it has been accepted.

Q2:R3 asked work flow. The data was collected by a pathologist with three years of experience, who had been briefed on our idea prior to collection. The gaze data was collected during the annotation. In the training phase, Gaze-DETR takes both of the image, corresponding candida bbox(es) and gaze-only bbox(es) as the input. While in the inference stage, Gaze-DETR takes only the image as the input. The information of gaze-only bboxes faced during training have been encoded into gaze queries, which can be considered a kind of learned prior knowledge in the inference stage.

Q3: R3 mentioned lack of comparisons with other detection works that incorporate gazes. To the best of our knowledge, our work is the first to incorporate gaze information into the field of object detection. Indeed, it is possible to extend gaze-based methods from classification and segmentation tasks to object detection task as well. Nevertheless, our gaze-only box can still serve as a complementary component that can be used in conjunction with these methods.

Q4: R4 concerned source of gaze data error. The error due to the eye-tracking device: According to the manufacturer’s website, the Tobii 4C eye-tracker has an accuracy of 97%. However, based on practical experience, the accuracy tends to decrease further during the data collection process due to the calibration deterioration caused by the pathologist’s movements. The error due to the gaze processing method: This primarily arises from the thresholding step during the processing of the heatmap. This can be attributed to the non-uniform distribution of the pathologist’s eye gaze points across an object. Consequently, the resulting “gaze only” box obtained after thresholding may not be accurately aligned.

Q5: R4 wondered does the scanpath trajectory impact the model performance. In our model, as all the “gaze only” boxes are inputted and processed in parallel, the scanpath trajectory does not impact the model performance. Although the scanpath trajectory presents an intriguing aspect for exploration, it is not within the scope of our work.

Q6: R5 concerned the rationale behind the choice of 4 times in gaze-guided rectification. If the multiple times is too low, the constructed “gaze only” queries may only learn low-quality representations compared to other input queries, resulting in limited enhancement. While if multiple times is too high, the “gaze only” queries will dominate the query set, which can adversely affect the model’s ability to learn background classes beyond the eye gaze focus and those around ground truth, which may suffer the model performance. During the experimentation process, we observed that selecting values of 4, 5 and 6 yielded similar optimal performance. Additionally, choosing 4 as the multiple number aligns with the 4-scale setting used in the denoising detection model DN-DETR and DINO, which serves as a partial motivation.

Q7: Other weaknesses We acknowledge the valid concerns regarding the missing model description in figure1 caption, lack of failed cases, and insufficient references. We will thoroughly revise the manuscript based on your comments before the submission of camera-ready version.




Meta-Review

Meta-review not available, early accepted paper.



back to top