Abstract

Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It adopts a T-shape model architecture to capture global structural information using low-resolution images and gradually recover the details in subsequent denoising steps. We further prone the model to significantly reduce the model size while retaining performance. While discarding certain downsampling operations to save parameters leads to instability and low efficiency in convergence during the training, we introduce a Temporal Light Unit (TLU), a plug-and-play module, for more stable training and better performance. TLU associates time steps with denoised image features, establishing temporal dependencies of the denoising steps and improving denoising outcomes. Moreover, while recovering images using the diffusion model, potential spectral shifts were noted. We further introduce a Chroma Balancer (CB) to mitigate this issue. Our LighTDiff outperforms many competitive LLIE methods with exceptional computational efficiency.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0771_paper.pdf

SharedIt Link: https://rdcu.be/dV5xq

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72089-5_35

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0771_supp.pdf

Link to the Code Repository

https://github.com/DavisMeee/LighTDiff

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Che_LighTDiff_MICCAI2024,
        author = { Chen, Tong and Lyu, Qingcheng and Bai, Long and Guo, Erjian and Gao, Huxin and Yang, Xiaoxiao and Ren, Hongliang and Zhou, Luping},
        title = { { LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {369 -- 379}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper “LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion” presents a new method and architecture for recovering illumination, and therefore content, from images captured in low-light settings. The method works training a diffusion model to generate high resolution detail from low resolution input. A temporal light block (TLB) performs denoising on low light input while a Chroma Balancer (CB) restores appropriate color and prevents drift. Additional optimizations increase computational performance while maintaining convergence during training.

    The method is evaluated on EndoVis17, EndoVis18, and a new real world dataset against several competing approaches, demonstrating promising performance both quantitatively and computationally.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths of the paper include:

    1. A novel architecture and approach for solving the low-light issue, simultaneously improving both recovered image quality and computational performance.

    2. Comprehensive evaluation on both open datasets and locally collected real world data against a wide array of competing approaches and an ablation study provides greater context to the present work.

    3. Well written and organized allows the reader to follow the methodology, rationale, and results. Well supported by references to recent work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses of the paper include:

    1. The clinical application is unclear. Non-clinical applications are readily envisioned, but clinical applications are not as obvious to a non-clinician audience and should be described.

    2. The clinical rationale for the work is disconnected from the experimental data used. The rationale derives from applications in which lighting can be an issue such as capsule endoscopy and laryngeal surgery, whereas the experimental data is sourced from the EndoVis challenges (da Vinci surgeries) in which lighting is not a major challenge.

    3. Does not discuss limitations and potential weaknesses of the approach, such as with regard to motion blur, motion artifacts, and other specific challenges that arise in clinical situations.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The present paper builds upon prior work to successfully recover lighting in low-light images with good quantitative performance and computational speed. Results are demonstrated on surgical video achieving superior performance versus competing approaches.

    On the other hand it is unclear how this achievement would be beneficial in the medical domain, especially in light of the experimental that is used. The “insufficient brightness” problem (Section 1, “Introduction”, first paragraph) is not one that is commonly brought up in the minimally invasive surgery domain, so one must be specific about the clinical relevance.

    The paper can be made much stronger by adopting a specific clinical application and demonstrating how some aspect of that procedure is improved. Perhaps via studies with clinicians, phantom studies, and so forth. Such exercises would help clarify how the technical achievement translates into clinical value. This would allow any limitations and weaknesses of the approach to come to light, explained, and overcome.

    The paper makes reference to prior work involving laryngeal surgery and capsule endoscopy. In both cases a lighting issue is apparent because lighting may be difficult to control in real time. In the former case, the endoscope is inserted into tight natural orifices so there may be angles in which the light source does not illuminate regions of interest. In the case of capsule endoscopy, likewise, the device may have limitations on projecting light onto regions of interest.

    However the EndoVis datasets used in the paper are of robotic da Vinci surgeries in which lighting is not typically an issue. The statement in Section 3.1 (“Dataset”), “Given the challenge of obtaining real paired endoscopic low-light and normal-light images” is is tied to the possibility that the problem is uncommon.

    There is mention of a ESD surgery dataset on porcine models but it is unclear what the original intent of the data collection is, so we are unable to judge whether it constitutes a compelling result. It is also unclear which portions of the results reflect this dataset as they appear to show robotic surgery data exclusively.

    It is understood that diffusion models can use Gaussian noise as part of the development process, but it would be helpful to the MICCAI audience to explain how well the model applies to surgical scenarios as opposed to everyday camera images. There also seems to be a leap between restoring the low-resolution images and restoring low-light images. Perhaps one can imagine that these processes are similar based on related work, but from the text the connection is not apparent.

    With regards to the Chroma Balancer (page 5), it is a reasonable effort to restore colors. However depending on the application, subtle color differences can have a large impact on clinician decision making as color or color change may be used to judge the condition of the tissue. This reiterates the importance of understanding the clinical context.

    The labels in Figs. 1 and 2 are difficult to read relative to the body text size. In the Abstract, the acronym “LLIE” is used before it is defined. The term “low-light image enhancement” is indeed used beforehand but the acronym is not associated with it.

    Performing an evaluation based on downstream tasks, in this case instrument segmentation, is an interesting approach. However there are a few caveats that can mitigate the strength of these results. Existing segmentation algorithms are often robust within an illumination range as they are trained with overall appearance and structure, so good instrument segmentation performance may not be indicative of restored lighting. Additionally robotic surgical instruments may feature depths and appearances that are not relevant to clinical scenarios in which low-light is a common problem.

    Reiterating on the importance of addressing a clinically relevant problem, it would be interesting to explain to the reader what the clinical ramifications of the technical achievement might be. Perhaps these can manifest in shorter procedure times, better patient outcomes, higher clinician satisfaction, etc.?

    “ResBlcok”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The technical achievement of recovering low-light images with superior performance and speed is counterbalanced by the clinical value of such an achievement. It already is possible to do this, so it is unclear whether the present improvement pushes the technology any closer to a clinical application.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper addresses the important problem of low-light image enhancement in endoscopic surgery, which can significantly impact the surgeon’s ability to perform precise procedures. The proposed LighTDiff model introduces several novel components that contribute to its computational efficiency and image enhancement performance, making it a promising solution for improving surgical visualization and guidance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.Novel formulation of a lightweight diffusion model for low-light image enhancement: 2.The authors evaluate LighTDiff on two synthetic datasets (EndoVis17 and EndoVis18) and a real-world endoscopic surgery dataset, demonstrating its effectiveness in various scenarios.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.A more in-depth discussion of the specific challenges and requirements of LLIE in endoscopic surgery, and how existing methods fail to address them adequately, would strengthen the motivation for the proposed LighTDiff model. 2.The authors mention that they evaluated LighTDiff on an in-house real-world dataset collected from 20 endoscopic submucosal dissection (ESD) surgery videos on pigs, but they do not provide sufficient details on the dataset’s characteristics. 3.While the authors compare LighTDiff with several state-of-the-art LLIE methods, including both traditional and deep learning-based approaches, they do not specifically compare their method with other lightweight LLIE solutions.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Strengths:

    The paper tackles a relevant and challenging problem in endoscopic surgery, where low-light conditions can hinder the surgeon’s ability to perform precise procedures. The motivation for developing an efficient and effective low-light image enhancement method is well-established. The proposed LighTDiff model introduces several novel components, such as the T-shape architecture, Temporal Light Unit (TLU), and Chroma Balancer (CB), which contribute to its computational efficiency and image enhancement performance. These innovations are well-described and justified, making the paper a significant contribution to the field of medical image computing (MIC).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors have successfully addressed an important problem in endoscopic surgery and presented a promising solution that advances the state-of-the-art in low-light image enhancement.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This study proposed a lightweight Denoising Diffusion Probabilistic Model (LighTDiff) to address the inadequate lighting issue in endoscopy usage in surgeries. LighTDiff outperformed existing models while being less computationally demanding, tested on two public datasets and a real-world dataset. The novelty in the model structure of LighTDiff involves: it adopts an inconstant resolution diffusion structure and optimize the model by strategically pruning certain downsampling components; it introduces the Temporal Light Unit (TLU) for stability, which injects the time step into the image features of the current denoise stage; it introduces the Chroma Balancer (CB) to adjust the image channel bias.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    While existing Denoising Diffusion Probabilist Model (DDPM) shows promise for low-light image enhancement, it is computationally demanding and slow and might not be practical to use in medical applications. This paper introduce a new model that has less parameters (so less computational cost) while not sacrificing the performance. It evaluated the model performance in both public datasets (EndoVis17 and EndoVis18) and and a real-world low light data collected from surgeries on pigs. It also provides url to code source which enable reproducibility.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No big weakness noticed. Two minor things:

    1. The dataset does not include real clinical data with low light, though the result might not change when tested on real clinical data.
    2. As the conclusion mentioned, further plan includes real-time augmentation, while I won’t ask for extra analysis, does the author have the result/sense of the time delay of the augmentation caused by generating predictions by LighTDiff? Does the author have an estimate of this number on a consumer-grade hardware? This number will affects the feasibility of the usage of this model in practical scenario.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Page 5 “Overall Structure”: it cites Figure 2e, which is not the overview of the framework.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work has introduced a novel model for addressing a medical application issues. It evaluated the model by comparing to other benchmark models, tested on both public dataset and real-world dataset, showing good performance and efficiency.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thank the reviewers (R) for the recognition of our work. Clinical Application(R1): Our method improves surgical imaging in low-light conditions, offering clearer visuals crucial for precise procedures. Its lightweight nature ensures that it can be integrated with existing medical imaging systems without requiring significant hardware modifications, making it a practical solution for real-time applications in busy clinical environments.

Robotic da Vinci surgery dataset (R1): We would like to clarify that although we used synthetic data from the da Vinci robotic systems, we do not specifically target at the da Vinci robotic systems. Instead, our research addresses a general case scenario applicable across various domains, including robotics. While existing fluorescence-guided techniques indeed provide a controllable approach in managing low-light conditions in surgical settings, they do not completely solve the illumination challenges faced in robotic surgery. Our aim is to explore broader computational methods that can complement existing physical solutions to enhance visual accuracy under diverse operational conditions. We appreciate the feedback and will ensure to highlight the general applicability and relevance of our findings beyond the specific context of surgical robotics in our revised manuscript.

ESD surgery (real-world) dataset (R1 R3 R4): For our ESD real-world dataset, we manually selected 61 low-light images from 20 real robotic ESD surgery videos and verified them by the doctor. Using porcine models for endoscopic experiments is primarily because the anatomical structures and size of organs in pigs closely mimic those of humans. Prior to human experimentation, live pigs were best suited for use as research subjects for digestive tract surgery and are now widely used as research. This similarity allows for more realistic training and testing of medical devices and surgical techniques that are intended for human use. And this case also verified from EndoVis17 and EndoVis18 which also collected from pig endoscopic surgeries. Our dataset is doing in the same way as EndoVis17 and EndoVis18 dataset.

Leap between Super resolution and Low light Images (R1): It is important to note that the diffusion models have proven to be excel in a vast of generation tasks other than denoising or super-resolution. Clarify that diffusion models excel at learning robust feature representations of images effectively across different imaging conditions. In surgical applications, especially under low-light conditions, the robust feature representation learned by DDPMs can be crucial. While the transition from enhancing low-resolution images to restoring low-light images may seem like a leap, both processes fundamentally rely on reconstructing and enhancing underlying image details that are obscured, whether by resolution loss or inadequate lighting. Thus, our approach with DDPMs is not only about handling noise but also about enhancing the quality of surgical images under varied conditions, making it a promising tool for improving the clarity and utility of endoscopic video data in real-time surgical environments. It is noted that the capability of diffusion models in enhancing low-light images has also been confirmed by other LLlE methods compared in our study.

Comparison with other light-weight LLIE methods (R3): We did compare our method with other lightweight LLIE methods such as MIRNet v2 and PyDiff on both performance and efficiency (FPS) in Table 1.

Real-time augmentation(R4): While our approach made a significant leap forward, there remains a gap to achieving true real-time performance (24+ FPS). In our ongoing and future work, we are committed to conducting extensive testing to better quantify this time delay and further optimize our model.

We will also fix the minor problems for better clarity. All the suggestions will surely be considered and added to the final manuscript.




Meta-Review

Meta-review not available, early accepted paper.



back to top