Abstract

Tissue tracking plays a critical role in various surgical navigation and extended reality (XR) applications. While current methods trained on large synthetic datasets achieve high tracking accuracy and generalize well to endoscopic scenes, their runtime performances fail to meet the low-latency requirements necessary for real-time surgical applications. To address this limitation, we propose LiteTracker, a low-latency method for tissue tracking in endoscopic video streams. LiteTracker builds on a state-of-the-art long-term point tracking method, and introduces a set of training-free runtime optimizations. These optimizations enable online, frame-by-frame tracking by leveraging a temporal memory buffer for efficient feature reuse and utilizing prior motion for accurate track initialization. LiteTracker demonstrates significant runtime improvements being around 7× faster than its predecessor and 2× than the state-of-the-art. Beyond its primary focus on efficiency, LiteTracker delivers high-accuracy tracking and occlusion prediction, performing competitively on both the STIR and SuPer datasets. We believe LiteTracker is an important step toward low-latency tissue tracking for real-time surgical applications in the operating room. Our code is publicly available at https://github.com/ImFusionGmbH/lite-tracker.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2578_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/2578_supp.zip

Link to the Code Repository

https://github.com/ImFusionGmbH/lite-tracker

Link to the Dataset(s)

STIR Challenge 2024 dataset: https://zenodo.org/records/14803158 SuPer dataset: https://sites.google.com/ucsd.edu/super-framework/

BibTex

@InProceedings{KarMer_LiteTracker_MICCAI2025,
        author = { Karaoglu, Mert Asim and Ji, Wenbo and Abbas, Ahmed and Navab, Nassir and Busam, Benjamin and Ladikos, Alexander},
        title = { { LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {307 -- 316}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The primary contribution of this paper is the reduction of latency in the tissue tracking pipeline.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The key strengths of this paper include:

    1. Achieving an impressive runtime in tissue tracking.
    2. Maintaining accuracy comparable to SOTA methods.
    3. Critically evaluating the previous pipeline and successfully integrating data more efficiently.
    4. Showcasing the clinical feasibility of real-time tissue tracking.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The primary weakness of the paper is that, while the runtime is improved, the proposed pipeline updates lack a robust methodological foundation and are more aligned with an engineering enhancement than a methodological advancement.

    It is also worth noting that, while the exponential moving average filter used is simple and fast, it does not necessarily yield an acceptable estimation. Additionally, the choice of the smoothing factor makes the solution dependent on the selected dataset.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary reason for this decision is the lack of methodological advancement in the proposed pipeline.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    While the effort to avoid retraining is appreciated and of practical relevance, the methodological contribution does not stand out as a prominent or novel aspect of the work.



Review #2

  • Please describe the contribution of the paper

    This paper contributes runtime optimizations to an existing point-tracking method (CoTracker3) enabling real-time use. These runtime optimizations consist of a temporal caching mechanism to avoid recomputing expensive correlation features and an exponential moving average flow module to initialize points on an incoming frame, reducing the number of refinement iterations needed. The author’s suggest that this contribution would allow for the use of point tracking for use in real-time surgical applications such as endoscopy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper achieves an impressive 7x speedup in model inference time, reducing latency from 200ms to 30ms while maintaining accuracy on endoscopic surgery video datasets. This significant performance improvement demonstrates strong technical execution.
    2. The implementation details are well-documented, providing sufficient technical information for reproducibility. This transparency strengthens the paper’s contribution to the field.
    3. The authors successfully demonstrate that their optimizations preserve model accuracy across multiple public endoscopic surgery video datasets, showing the reliability of their approach. 4.The sub-50ms latency achieved (30ms) represents a meaningful technical threshold for real-time applications, potentially enabling new use cases in surgical settings.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Hardware evaluation is limited to a single moderately high-end GPU configuration (RTX 3090). This restricts understanding of how these optimizations would perform across different computational environments that might be found in actual clinical settings. Given that the primary contribution of this paper is improved runtime performance, evaluation on a variety of hardware configurations would be beneficial to better characterize the contributions made.
    2. The optimization techniques employed (ring buffer caching of feature computations and exponential moving average initialization) lack significant novelty on their own. Using caching to avoid recomputing features is a common optimization strategy used in algorithm design, particularly with a simple FIFO buffer as described here, and the EMA initialization is borrowed from optical flow models without substantial modification or innovation.
    3. The paper lacks specific clinical context for the claimed “real-time use in surgical settings.” Without identifying specific intraoperative applications where 30ms vs 200ms latency makes a meaningful difference, the clinical value remains theoretical rather than demonstrated.
    4. There is no validation with surgical teams or in surgical workflow contexts. While using clinically acquired datasets is positive, this falls short of demonstrating that the reduced latency enables new applications in practice.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary factor supporting acceptance is the substantial 7x speedup achieved, reducing latency from 200ms to 30ms while maintaining accuracy across multiple endoscopic surgery datasets. This technical achievement is impressive and well-documented with sufficient implementation details for reproducibility.

    However, several factors prevent a stronger recommendation. The optimization techniques themselves (ring buffer caching and EMA initialization) represent incremental adaptations rather than novel methodological contributions. The evaluation is limited to a single high-end GPU configuration, raising questions about performance in realistic clinical environments with more constrained hardware. Most critically, while the authors claim their optimizations enable “real-time use in surgical settings,” the paper lacks specific intraoperative applications where the latency reduction would be clinically meaningful.

    Overall, the paper presents a technically solid optimization with potential clinical relevance, justifying acceptance, but the limitations in novelty and clinical validation prevent a stronger recommendation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    After reviewing the rebuttal, I am inclined to support acceptance of this paper. The authors have adequately addressed the primary concerns raised in my initial review and those echoed by Reviewer 3, specifically regarding hardware generalizability and the extent of methodological novelty. The core contribution—runtime optimization enabling real-time long-term point tracking—is technically sound and well-executed. While the optimization strategies employed (temporal caching, EMA-based initialization) are not individually novel, the rebuttal effectively argues that their integration into a fully causal, training-free, long-term tracking pipeline—while preserving or improving performance—is both non-trivial and impactful. I agree with this interpretation and acknowledge that practical contributions of this nature can be appropriate for MICCAI when they substantially enhance usability, robustness, or efficiency in clinically relevant contexts. The authors provided reasonable extrapolation of latency to alternative hardware configurations. While empirical tests on diverse hardware would strengthen the evaluation, the clarification provided is sufficient given the rebuttal constraints. Although no direct user studies or intraoperative evaluations are included, the authors appropriately position LiteTracker as an enabling component for real-time surgical navigation, citing related work and justifying their choice of task-specific quantitative metrics. The experimental design is thorough and consistent with standard practice in this domain. In summary, this paper provides a well-engineered, practically impactful solution to a long-standing bottleneck in surgical point tracking. While the methodological innovations are incremental, their combined effect is significant, and the system shows strong potential for adoption and further development. I now recommend acceptance.



Review #3

  • Please describe the contribution of the paper

    This paper proposed a realtime tissue tracking method that is robust to occlusion and has long-term stability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Long-term stability: Results seems that the proposed method achieves good long-term stability. When revisiting occurs, such as when the camera moves back or the tool/tissue-induced occlusion recovers, the previously tracked points can almost instantly be tracked again.
    2. Improved runtime performance. Online and realtime.
    3. Training-free.
    4. Good experimental design that demonstrates both the tracking accuracy and runtime performance.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Unable to add newly appearing points. Though this may not be the focus of the current study. Some point may be initially occluded, it would be nice if these points can be added into tracking set.
    2. It is unclear how the occlusion affects the tracking performance. The metric used in the experiment mixed the results in both the occluded and non-occluded cases. It is possible to separate these two cases automatically by using those dataset with instrument masks or perform segmentation on your own. The difficulty would be in distinguishing the occlusion induced by the tissue itself.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The logic of this article is clear. Methodology explains well the limitations of previous approach and the contribution of the proposed one. Experiment demonstrates the method well.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thanks for the author’s response to my question.

    Still the mechanism of adding new point cannot be claimed, even though there exist sentence saying that “it allows addition of new query points during the tracking process”. Since the author claims that the correlation features is cached in temporal memory buffer, when a new point comes, there is no cached information in the buffer. To initialize the new point for tracking, it is possible that once again the previous frames data need to be calculated again, which may somehow compromise the real-time performance.

    However, if such online updating mechanism really works, I believe it is the same or higher value then the speed-up, given the scenario in laparoscopic surgery, where new points always appears. Unfortunately, this point is not shown in either the experiment or methodology. Highlighting this point may help improve the method’s clinical meaning that may help to resolve the concerns from R1; and may make your method look more different from CoTracker3, which may help resolve the major concern of R3; these concerns are good and may also be raised by other reviewers.




Author Feedback

Dear Meta-Reviewer and Reviewers,

We sincerely appreciate the time and effort dedicated to reviewing our manuscript.

We thank the reviewers for highlighting our technical contributions. We agree that LightTracker “achieves an impressive 7× speedup in model inference time” (R1) and “good long-term stability” (R2). We believe that the practical relevance of “when revisiting occurs […] the previously tracked points can almost instantly be tracked again” (R2) is of utmost importance in the surgical domain. Moreover LiteTracker constitutes “an impressive runtime in tissue tracking. Maintaining accuracy comparable to SOTA methods” (R3) while being computationally much more efficient. We appreciate the reviewers’ summaries, which highlight the key technical contributions of our work as well as the thoroughness and transparency of our evaluations.

R1: Lower-end GPUs Restricted by the rebuttal guide, we can only provide an estimate by linearly scaling latency based on TFLOPS. For instance, LiteTracker’s latency would increase from the reported 29.67 ms (RTX 3090) to ~82.86 ms on an RTX 3060. Conversely, latency would significantly decrease to ~11.59 ms on an operating-room-grade NVIDIA IGX ORIN with RTX 6000 Ada.

R1, R3: Technical novelty LiteTracker’s core contribution lies in leveraging causal information for track initialization and feature caching, key to addressing the high computational demands of long-term point tracking, as shown in our experiments. Importantly, as highlighted by R2, this adaptation is entirely training-free, enabling direct reuse of the CoTracker3 weights, which originally requires substantial resources (32 NVIDIA A100 GPUs) for training. Capitalizing on individual “optimization strategy used in algorithm design” (R1) such as FIFO and EMA allows us to design a very effective pipeline well-suited for practical use in tissue tracking. LiteTracker therefore offers a solid foundation for future research and downstream applications.

R3: EMA initialization EMA provides a warm-start that LiteTracker refines during tracking. Despite potential noise from sudden motion or occlusions, LiteTracker effectively recovers using multi-scale features and spatial attention. As shown in our ablation (Fig. 4), EMA improves both localization accuracy and convergence speed. Regarding the smoothing factor, we observed minimal sensitivity across datasets and used a fixed value of 0.8 in all experiments.

R2: Appending new points Contrary to the concern, LiteTracker does support this functionality, as mentioned in methodology: “The temporal memory buffer is implemented in a way that it allows addition of new query points during the tracking process.” We appreciate the opportunity to clarify this point. When new points are queried in later frames, we expand the memory buffer with proxy features. Our modified transformer architecture dynamically masks these, enabling seamless integration while preserving fully causal tracking.

R1: Clinical context As we have briefly mentioned in the introduction, tissue tracking is crucial for navigation and XR in surgery [19, 22]. For instance, OneSLAM [22] demonstrates that inefficient point tracking can significantly hinder real-time usability in navigation tasks. In line with pioneering works [4, 18], we decided for objective evaluation through task-specific metrics. LiteTracker constitutes a foundational layer for tissue tracking, which we believe will directly impact multiple downstream surgical applications.

R2: Splitting accuracy with visibility While this is not feasible for STIR due to temporally limited annotations, it is possible for SuPer, which provides denser labels and visibility. In this work, we followed prior studies in reporting average Jaccard and occlusion accuracy, which, as noted by R2, jointly reflect tracking and visibility performance, both critical for downstream tasks.

R2: Open sourcing Contrary to the remark, our abstract announces the release of the source code.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents LiteTracker, a practical and efficient solution for long-term point tracking in surgical contexts, with a strong emphasis on runtime optimization and clinical usability. After carefully considering the author rebuttal and the thoughtful post-rebuttal review, I am pleased to support its acceptance.

    While the underlying techniques (e.g., temporal caching, EMA-based initialization) are not individually novel, their integration into a cohesive and high-performing pipeline is both technically sound and non-trivial. The system’s ability to operate in real time across extended durations and without retraining makes it highly relevant to the MICCAI community, especially for deployment in resource-constrained clinical environments.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top