List of Papers Browse by Subject Areas Author List
Abstract
Weakly supervised semantic segmentation (WSSS) in medical imaging struggles with effectively using sparse annotations. One promising direction for WSSS leverages gaze annotations, captured via eye trackers that record regions of interest during diagnostic procedures. However, existing gaze-based methods, such as GazeMedSeg, do not fully exploit the rich information embedded in gaze data. In this paper, we propose GradTrack, a framework that utilizes physicians’ gaze track, including fixation points, durations, and temporal order, to enhance WSSS performance. GradTrack comprises two key components: (1) the Gaze Track Map Generation module for creating hierarchical attention maps, and (2) the Track Attention module for integrating attention features, which collaboratively enable progressive feature refinement through multi-level gaze supervision during the decoding process. Experiments on the Kvasir-SEG and NCI-ISBI datasets demonstrate that our GradTrack consistently outperforms existing gaze-based methods, achieving Dice score improvements of 3.21% and 2.61%, respectively. Moreover, GradTrack significantly narrows the performance gap with fully supervised models, such as nnUNet.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1072_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{WanZhi_Enjoying_MICCAI2025,
author = { Wang, Zhisong and Ye, Yiwen and Chen, Ziyang and Xia, Yong},
title = { { Enjoying Information Dividend: Gaze Track-based Medical Weakly Supervised Segmentation } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15969},
month = {September},
page = {201 -- 211}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposes GradTrack, a weakly supervised segmentation method that uses not just where doctors look, but the order and timing of their gaze. It introduces two modules to turn gaze patterns into useful attention maps, helping the model learn more effectively. GradTrack outperforms existing gaze-based methods and gets close to fully supervised performance, showing that gaze dynamics can reduce the need for detailed annotations.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Leverages full gaze dynamics (fixation, duration, order)
- Novel GTMG and TA modules for effective supervision
- Strong performance close to fully supervised models
- Outperforms existing gaze-based methods
- Reduces annotation cost significantly
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Relies on access to eye-tracking hardware
- May be sensitive to gaze noise
- Added model complexity from multiple modules (drawback for real time applications)
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The idea of leveraging full gaze dynamics—including fixation order and duration—for weakly supervised segmentation is novel and well-motivated. The proposed modules (GTMG and TA) are thoughtfully designed and lead to notable performance gains, narrowing the gap with fully supervised models. However, the paper is limited by its reliance on eye-tracking hardware. Despite this limitation, the method shows strong potential and contributes meaningfully to gaze-based medical image analysis.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
This paper presents GradTrack, a gaze-guided weakly supervised segmentation framework that incorporates fixation locations, durations, and temporal order. It introduces two components—Gaze Track Map Generation and Track Attention modules to progressively refine features using multi-level gaze supervision. Evaluated on Kvasir-SEG and NCI-ISBI, GradTrack outperforms prior gaze-based methods, showing promise in reducing annotation costs while maintaining segmentation accuracy.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1) Well written and organized. 2) The use of doctor’s eye tracking data for improving segmentation performance is an import task. 3) The methods achieves SOTA performance.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Major Comments
1) Ablation on NCI-ISBI Dataset: In Figure 4, the ablation study is shown only for the Kvasir-SEG dataset. It is unclear whether the same findings hold for the NCI-ISBI dataset. The authors should consider including ablation results on this dataset or clarifying if the trends are consistent across datasets.
2) Evaluation Metrics: While the Dice score is a widely used segmentation metric, it may not fully capture performance nuances. Have the authors evaluated the model using additional metrics such as IoU (Jaccard Index), Hausdorff Distance, or Average Surface Distance (ASD)? Including these would provide a more comprehensive assessment of segmentation performance.
3) Loss Function Explanation (Eq. 3): The loss function presented in Equation 3 requires further elaboration. Specifically, the intuition behind its components, how it integrates with the overall training pipeline, and any weighting strategies (if applicable) should be clearly described.
4) Choice of “Reverse Track Truncation”: The rationale for using reverse track truncation over forward truncation is not clearly explained. Please clarify why this choice was made and whether it offers specific advantages for scanpath modeling or segmentation performance.
5) Visualization of Scanpath-to-Attention Mapping: Figure 3 shows qualitative segmentation results, but the paper would benefit from a clearer illustration of how scanpaths translate into attention maps. A visual depiction of scanpaths overlaid on images and how they guide the soft attention mechanism during training would help readers better understand the proposed framework.
Minor Comments
1) Figure 1 Caption: The phrase “for overly under-activation activation maps” is unclear and likely a typographical error. Please revise for clarity and correctness.
2) Redundant Statement on Scanpath Truncation: The sentence “By truncating the gaze sequence from the last fixation point…” may be unnecessary, as scanpaths are typically processed in this manner. Consider removing or rephrasing to avoid redundancy.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is well written overall. The methodology seems to have novelty. However, there are some questions (see weakness) if addressed can help improve the quality of the paper.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The article introduces GradTrack, a novel framework leveraging physicians’ gaze tracking data to enhance segmentation accuracy. It incorporates two key components: Gaze Track Map Generation (GTMG) for generating hierarchical attention maps, and Track Attention (TA) modules for multi-level gaze supervision. Experiments on the Kvasir-SEG and NCI-ISBI datasets demonstrate that GradTrack improves Dice scores by 3.21% and 2.61%, respectively, achieving up to 97.36% of the performance of a fully supervised nnU-Net model. This method significantly reduces annotation costs while maintaining clinical precision. Notably, the integration of temporal information plays a crucial role.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The article is well-organized and clearly structured.
- The introduction of novel modules, such as Gaze Track Map Generation and Track Attention, enhances the framework.
- Comprehensive comparisons are made with other weakly supervised methods, demonstrating the effectiveness of the proposed approach.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The dimensions of FhF_hFh and PfghP^h_{fg}Pfgh are inconsistent. FhF_hFh is downsized, while PfghP^h_{fg}Pfgh should retain the same dimensions as the original image. The handling of this discrepancy is not clearly explained.
- The article mentions that the channel dimension of PhP^hPh is extended from 1 to match the size of FhF^hFh, but the specific operation used for this extension is not described.
- The loss hyperparameters λ1\lambda_1λ1 and λ2\lambda_2λ2 are used for blocks h=2/4h=2/4h=2/4, but for the last block h=6h=6h=6, the value of λ\lambdaλ is set to 1 by default. The rationale behind this choice is not adequately explained.
- Only the Dice Similarity Coefficient (DSC) is used as the evaluation metric. It would be helpful to include additional metrics, such as distance metrics, to provide a more comprehensive performance analysis.
- For training, 900 images (prostate) and 789 images (polyp) were used. However, there is no mention of an internal validation dataset—how was validation performed?
- Some of the comparison results are directly taken from [24]. However, it is unclear whether the training and testing datasets used in [24] are the same as those in this study. Clarification is needed.
- In Table 2, the results without truncation show a Dice score of 80.28, which is lower than 81.01. Considering the size of the test set, is this improvement statistically significant? Also, why is this experiment only performed on the Kvasir dataset?
- In the ablation study, the reverse-truncated track-weighted maps only result in a minimal improvement in Dice score, suggesting that reverse-truncation may not significantly contribute to the performance gain.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The study presents innovative modules, such as Gaze Track Map Generation (GTMG) and Track Attention (TA), to enhance weakly supervised segmentation. However, there are several critical issues that need further clarification, particularly regarding the methodology, the use of a single metric (DSC), and the handling of dataset dimensions. More comprehensive experiments and explanations are needed to strengthen the claims.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank the reviewers for their thoughtful and constructive feedback. Their insights will help us further refine the manuscript in the final version.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A