Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Lesion detection for medical image is crucial in computer-aided diagnostic systems, enabling early disease identification and enhancing clinical decision-making. Existing lesion detection models primarily rely on bounding boxes for supervision, which overemphasize lesion boundaries while neglecting critical internal features, potentially resulting in misdetections. In contrast, clinicians’ gaze, which reflects the visual focus during diagnosis, captures internal semantic patterns of lesions, providing a more informative supervisory signal than conventional annotations. Inspired by this insight, we propose a gaze-driven detection framework for enhancing lesion identification accuracy. Specifically, our framework introduces three key gaze-prioritized innovations: 1) an adaptive gaze kernel that prioritizes diagnostically significant high-magnification regions, 2) a gaze-guided assignment module that establishes query-level gaze-region correspondence, and 3) a query-level consistent loss that aligns detection model attention with clinicians’ gaze patterns. By incorporating clinicians’ expertise through gaze data, our method improves lesion detection accuracy and clinical interpretability. In addition, our method can be designed as a plug-and-play module, which maintains compatibility with mainstream object detectors. To validate the effectiveness of our method, we employ two public and one private datasets, and extensive experiments demonstrate its superiority over existing approaches. Furthermore, we contribute a pioneering gaze-tracking dataset with 1,669 precise gaze annotations, establishing a new benchmark for gaze-driven research in object detection. The dataset and code is available at https://github.com/YanKong0408/GAA-DETR.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1266_paper.pdf

SharedIt Link: https://rdcu.be/eHxaF

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_48

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/YanKong0408/GAA-DETR

Link to the Dataset(s)

https://github.com/YanKong0408/GAA-DETR

BibTex

@InProceedings{KonYan_QueryLevel_MICCAI2025,
        author = { Kong, Yan AND Peng, Zhixiang AND Yin, Yuan AND Li, Yonghao AND Cai, Jiangdong AND Wang, Sheng AND Wang, Qian AND Fang, Yuqi AND Shan, Caifeng},
        title = { { Query-Level Alignment for End-to-End Lesion Detection with Human Gaze } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {496 -- 506}
}

Reviews

Review #1

Please describe the contribution of the paper
1. The paper identifies that traditional lesion detection models tend to focus on lesion boundaries while neglecting internal features, which can lead to errors. In contrast, clinicians’ gaze naturally captures these critical, decision-relevant features, providing a more effective supervisory signal.
2. The authors propose a novel detection framework that incorporates clinical gaze priors through query-level attention alignment. This approach enhances lesion detection accuracy by aligning the model’s focus with the areas that clinicians naturally pay attention to during diagnosis.
3. The paper introduces the first large-scale gaze-tracking dataset specifically for lesion detection, featuring comprehensive annotations. This dataset sets a new benchmark for gaze-driven research in object detection and will be made publicly available to support further research in this area.
4. Through extensive experiments on both public and private datasets, the paper demonstrates that their proposed method not only outperforms existing approaches in terms of performance but also offers improved clinical interpretability and compatibility with mainstream object detectors.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Inspired by lesion identification in medical diagnosis, this paper proposes a novel approach based on DETR: guiding the lesion detector to focus more on the center of the lesion rather than its boundaries. The authors claim that this model design leads to more reasonable, natural, and accurate results.
2. The author contribute the large dataset for detection, which including 2450 images with eye-tracking annotations — the first gaze dataset for lesion detection.
3. Many experiments on different domain & different detector have been done to prove the validity of this method and the data.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The paper describes this as a plug-and-play approach. Is there any additional evidence or supplementary proof to demonstrate the benefits of this method on some center-based detectors?
2. It would be even better if there is a way to quantify the magnitude of the gain between the model improvement and the data contribution.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The reason I recommend this paper is that it has a strong motivation and the end-to-end experiments are quite solid.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper proposes GAA-DETR, a gaze-aligned detection framework that integrates clinicians’ gaze patterns into lesion detection via three innovations: (1) adaptive gaze kernels for magnification-aware heatmap generation, (2) gaze-guided Hungarian matching for query-region alignment, and (3) a query-level consistency loss to align model attention with diagnostic focus. It also contributes the first public gaze-tracking dataset (2,450 images, 1,669 annotations) for lesion detection, enabling gaze-driven research in medical AI.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novel Gaze Integration Framework: Introduces query-level alignment (Fig. 2D) to bridge model attention with clinicians’ diagnostic gaze patterns, addressing boundary-overfocus issues (Table 1: AD metric improves by 30-40%). This biologically inspired design is novel for lesion detection.
- Clinically Validated Dataset: Publishes a multi-institutional dataset (breast/cervical images) with raw gaze sequences and magnification data (Fig. 3), addressing data scarcity in gaze-driven research (vs. Reflacx [3], limited to chest X-rays).
- Compatibility & Scalability: Demonstrates plug-and-play adaptability across DETR variants (Table 2: +5-12% AP gains on Dino/RT-DETR) without inference-time gaze dependency, enhancing practical utility.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Limited Differentiation from Gaze-DETR: The proposed method’s key innovation (adaptive gaze kernel) appears incremental compared to Gaze-DETR [12], primarily adding magnification-based weighting. Notably, on the Breast Dataset (Table 1), Gaze-DETR outperforms GAA-DETR in AR10 (0.453 vs. 0.434) and AR100 (0.507 vs. 0.483), raising questions about the necessity of the magnification component.
- Unclear Loss Weighting Mechanism: The GAGD loss (Eq. 3) uses fixed coefficients, but the paper lacks justification for their values or experiments on adaptive weighting (e.g., learned via gradient-based optimization). Prior work [16] demonstrates the impact of dynamic loss balancing in gaze integration, which is overlooked here.
- Unaddressed Clinician-Specific Variability: The study assumes uniform gaze patterns across clinicians but ignores individual diagnostic biases (e.g., radiologists vs. pathologists) or calibration drift in eye-tracking (Sec. 3: “periodic recalibration” frequency is arbitrary). No analysis of inter-rater gaze consistency is provided, despite known variability in medical gaze behavior [23].
- Presentation Errors: Table 1 erroneously highlights 0.093 (In-house Dataset, Gaze-DETR’s AP0.75) as superior to GAA-DETR’s 0.123, contradicting the narrative. Such inconsistencies undermine credibility.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Method novelty and opensource for the community.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The authors present a novel lesion detection dataset and method that leverages gaze as additional supervision on top of traditional bounding boxes. Gaze is used to produce a Gaussian kernel-based heatmap which can provide denser supervision inside bounding boxes for better classification. The experiments suggest that their method outperforms other gaze-based detection algorithms across multiple metrics, the plug-and-play gaze-aligned attention module improves various model performances, and various components of the method contribute to meaningful results in their ablation study.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The idea of using gaze as additional supervision in segmentation/detection has been rising in popularity in the community, so the authors are commended for producing a meaningful dataset for other researchers to use. Overall, the idea seems promising and the manuscript is well written. The problem is well-motivated (shown in Fig.1) and the proposed method seems to address the limitations of existing traditional object detection methods. The experiments are promising and conveys that the method is useful for radiological applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The only concerns are writing-oriented – the loss function section (which seems to be the main difference between the proposed method and other gaze-based detection methods at a high level since that’s where you incorporate the heatmaps for supervision) is a bit convoluted, with many terms being introduced and not enough exposition to explain their purpose and motivation. However, the loss functions L_{match} and L_{GAGD} do make sense at a high level. In addition, some more details like lambda_i values would be useful.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well-written, well-motivated, and proposes a dataset and method that overcomes the limitations of existing approaches. The GAA module does improve results meaningfully, and the authors present sufficient experiments (comparisons with other studies + ablation). Therefore, my recommendation is accept.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

Response to Reviewer #1

Limited Differentiation from Gaze-DETR We acknowledge the observation that our GAA-DETR underperforms Gaze-DETR on certain metrics for the Breast Dataset (e.g., AR10 and AR100). However, when considering all datasets and metrics, GAA-DETR consistently demonstrates superior overall performance. Moreover, our method’s key innovations (e.g., adaptive gaze kernel and query-level alignment) are complementary to Gaze-DETR, and integrating our approach into Gaze-DETR could further enhance its performance (AP from 0.188 to 0.204, AR10 from 0.453 to 0.559 ). This compatibility highlights the practical and generalizable effectiveness of our method.

Unclear Loss Weighting Mechanism We appreciate this insightful reminder. The weighting coefficients in the GAGD loss should be chosen based on the quality of gaze data and task dependency on boundary information. For higher-quality gaze data or tasks where gaze plays a crucial role (e.g., distinguishing subtle lesion features), larger coefficients yield better performance. However, the overall impact of coefficient changes is relatively small, as they remain within the same magnitude. To simplify implementation and experimentation, we used fixed coefficients. We agree that ablation study on loss weights could be explored in future work for further improvement.

Unaddressed Clinician-Specific Variability Thank you for pointing out this limitation, which provides a valuable perspective. We acknowledge that clinician-specific variability (e.g., diagnostic biases and inter-rater consistency) may influence gaze patterns. While this was outside the scope of the current study, we plan to address this in future work by analyzing inter-rater gaze consistency.

Presentation Errors in Table 1 We sincerely apologize for the oversight in Table 1 and will correct the erroneous highlighting in the camera-ready version to ensure clarity and credibility. Response to Reviewer #2

Plug-and-play Applicability for Center-based Detectors Thank you for this reminder. Our method is indeed plug-and-play, requiring only prediction boxes, model attention, and gaze attention. DETR, indeed as a center-based detector (with predictions structured as (c_x, c_y, w, h)), partially demonstrates the generalizability of our approach. However, we agree that testing on additional center-based detectors (e.g., CenterNet, FCOS) would provide more comprehensive evidence. Due to space constraints in MICCAI, we may explore these experiments in a future journal version.

Quantifying Model Improvement vs. Data Contribution Table 1 compares various gaze-based methods, demonstrating the architectural improvement of GAA-DETR. Table 2 further shows gains from applying our framework and dataset to mainstream detection models. While these results partially address the contribution of architecture and data, we agree that more granular experiments (e.g., incrementally adding gaze data to isolate its impact) would better quantify the contributions. We may include such experiments in future work. Response to Reviewer #3

Writing and Loss Function Clarity Thank you for pointing out the writing-oriented concerns. We will carefully refine the loss function section in the camera-ready version to improve clarity. Specifically, we will provide detailed values of these hyperparameters in our open-source code files to enhance reproducibility.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

Query-Level Alignment for End-to-End Lesion Detection with Human Gaze

Author(s):