Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Fourteen million colonoscopies are performed annually just in the U.S. However, the videos from these colonoscopies are not saved due to storage constraints (each video from a high-definition colonoscope camera can be in tens of gigabytes). Instead, a few relevant individual frames are saved for documentation/reporting purposes and these are the frames on which most current colonoscopy AI models are trained on. While developing new unsupervised domain translation methods for colonoscopy (e.g. to translate between real optical and virtual/CT colonoscopy), it is thus typical to start with approaches that initially work for individual frames without temporal consistency. Once an individual-frame model has been finalized, additional contiguous frames are added with a modified deep learning architecture to train a new model from scratch for temporal consistency. This transition to temporally-consistent deep learning models, however, requires significantly more computational and memory resources for training. In this paper, we present a lightweight solution with a tunable temporal parameter, RT-GAN (Recurrent Temporal GAN), for adding temporal consistency to individual frame-based approaches that reduces training requirements by a factor of 5. We demonstrate the effectiveness of our approach on two challenging use cases in colonoscopy: haustral fold segmentation (indicative of missed surface) and realistic colonoscopy simulator video generation. We also release a first-of-its kind temporal dataset for colonoscopy for the above use cases. The datasets, accompanying code, and pretrained models will be made available on our Computational Endoscopy Platform GitHub (https://github.com/nadeemlab/CEP).

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0692_paper.pdf

SharedIt Link: https://rdcu.be/eHw4t

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05127-1_43

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/0692_supp.zip

Link to the Code Repository

https://github.com/nadeemlab/CEP

Link to the Dataset(s)

https://zenodo.org/records/15460791

BibTex

@InProceedings{MatSha_RTGAN_MICCAI2025,
        author = { Mathew, Shawn AND Nadeem, Saad AND Kaufman, Arie},
        title = { { RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {446 -- 455}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a novel architecture Recurrent Temporal GAN (RT-GAN) that adds temporal consistency to frame-based unsupervised domain translation models without significantly increasing computational or memory demands. The model is evaluated on two tasks in colonoscopy, demonstrating competitive performance with fewer learnable parameters and training time.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This paper addresses a real-world problem in colonoscopy AI systems that lacks of video data and the challenge of temporal consistency in video models.
- A new temporal colonoscopy dataset is introduced and will be made publicly available.
- Quantitative and qualitative results are substantial, demonstrating improvements over baseline models in terms of DICE, IoU, and visual stability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The proposed method lacks long-term consistency. The model only looks at the previous frame, lacking a mechanism (e.g., transformer memory) to incorporate longer temporal context.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Including quantitative ablations on the impact of the λ parameter across tasks in more detail could be helpful.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper introduces a computationally efficient framework to reduces training requirements. With further improvements on long-term modeling, it could contribute to real-world clinical applications.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The current manuscript presents an efficient architecture called RT-GAN for enforcing temporal consistency for domain translation focused on intra-operative frames. Currently the current architecture consists of a single generator and two discriminator, introducing temporal consistency loss based on three consecutive frames.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The method is technically sound and supplementary material provide a visual proof of the results achieved.
- The method is understandable and clearly explained
- The data will be released upon acceptance
- Focus on the efficiency by introducing a method for domain-adaptation that uses less parameters and reduces the complexity.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The quality of the results and the extent of genuine improvement are questionable, considering the visual results presented in the supplementary materials. In the case of haustral folds segmentation, the proposed method appears to generate inaccurate results near the borders, detecting folds that do not exist. The video authors highlighted the issue of missing folds from FoldIt and TempCycleGan due to the added or removed folds caused by the lack of temporal context. While the method does appear to perform some degree of smoothing by averaging the content between frames, I anticipate that other methods would exhibit intermittent segmentation of the folds, while RT-GAN maintains stability. However, I observe that the fold appears to be “visible” when darker and away from the borders, suggesting that there are other underlying issues beyond the temporal context.
- In the texturization of the synthetic colon, the limitation of the short context memory, as highlighted by the authors, and consequently the method’s weakness is more pronounced. The video generated appears to transition between the learned styles periodically. While RT-GAN clearly outperforms CLTS-GAN, which exhibits significant temporal inconsistency in its results, it does not outperform OF-GAN. Furthermore, artefacts in the texture are evident, as shown in Figure 4.
- The metrics provided in Table 2 appear to refer to the average of the IoUs/DICEs. Integrating a statistical test to ascertain the significance of differences between methods would be beneficial in addressing the evaluation uncertainties.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- The losses should be better discussed and the stationary should be defined, or reference provided. Understanding how the losses between the frames are aggregated is difficult otherwise.
- Minor suggestion is to have a look in organising better the pagination of table and figures in a more readable way. Furthermore the usage of italic text is somehow confusing in the reading.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method can be an additional brick that pave the way to the generation of realistic images from synthetic models using consistency mechanism in an efficient way. However, despite the efforts, showed results impacts negatively the quality of the work.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This article studies the temporal consistency issue in Domain Translation Approaches. Based on the individual-frame model, it proposes RT-GAN, a lightweight model to add temporal consistency. The method achieves more consistent results in both experimental and visual results.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The topic is interesting and applicable. Many works have been done on image-based translation models, but improving temporal consistency in these results is a great problem to tackle.
2. The method is simple yet effective. By designing two discriminators, it can achieve consistent estimation results in temporal triplets.
3. Quantitative and qualitative experiments show that the method improves the consistency of the results while also shortening the training time.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. In Fig. 2, the significant advantage of RT-GAN over FoldIt is not apparent. Perhaps adding a comparison with ground truth would help.
2. In equation (2), please clarify whether y_t and x_t are unpaired.
3. Since RT-GAN requires an image-based frame model, the training time should include both the frame model’s training time and RT-GAN’s training time. In Table 1, why is RT-GAN’s training time shorter than the frame model’s training time? Shouldn’t the training time of RT-GAN be longer due to the added frame model?
4. The quantitative experiments are not sufficient. FID and KID are commonly used metrics in image translation tasks, but these are not included in the current manuscript.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper focuses on the temporal consistency of translation models. It adds a lightweight RT-GAN module in a simple and effective way, building upon existing methods. The experimental results show that RT-GAN improves the temporal consistency of the test results. Although there are some shortcomings in the experimental section, I believe the paper can be accepted after revisions.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

N/A

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches

Author(s):