Abstract

The single greatest obstacle in developing effective algorithms for removing surgical smoke in laparoscopic surgery is the lack of a paired dataset featuring real smoky and smoke-free surgical scenes. Consequently, existing de-smoking algorithms are developed and evaluated based on atmospheric scattering models, synthetic data, and non-reference image enhancement metrics, which do not adequately capture the complexity and essence of in vivo surgical scenes with smoke. To bridge this gap, we propose creating a paired dataset by identifying video sequences with relatively stationary scenes from existing laparoscopic surgical recordings where smoke emerges. In addition, we developed an approach to facilitate robust motion tracking through smoke to compensate for patients’ involuntary movements. As a result, we obtained 21 video sequences from 63 laparoscopic prostatectomy procedure recordings, comprising 961 pairs of smoky images and their corresponding smoke-free ground truth. Using this unique dataset, we compared a representative set of current de-smoking methods, confirming their efficacy and revealing their limitations, thereby offering insights for future directions. The dataset is available at https://github.com/wxia43/DesmokeData.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1213_paper.pdf

SharedIt Link: https://rdcu.be/dVY6s

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72378-0_1

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

https://github.com/wxia43/DesmokeData

BibTex

@InProceedings{Xia_ANew_MICCAI2024,
        author = { Xia, Wenyao and Fan, Victoria and Peters, Terry and Chen, Elvis C. S.},
        title = { { A New Benchmark In Vivo Paired Dataset for Laparoscopic Image De-smoking } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {3 -- 13}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper describes the curation of a new dataset of in vivo laparoscopic image pairs, with and without surgical smoke, for the intended task of benchmarking de-smoking algorithms. Along with the dataset itself, the authors describe the method for creating such a dataset using manual selection of smoking events in laparoscopy videos, motion correction to pair to prior non-smoking frames and manual verification. Finally, the authors present results of existing algorithms on the new benchmark dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this paper is the curation of a novel dataset consisting of pairs of smoke and smoke-free images from in vivo laparoscopic videos. The authors motivate that smoke artefacts are uniquely difficult to annotate manually in laparoscopic videos and existing synthetic, ex vivo, non-laparoscopic, and unpaired in vivo datasets are not optimal for training nor benchmarking de-smoking algorithms. In this work, the main novelty is the first benchmark dataset of in vivo laparoscopic image pairs. The idea of how to create such a dataset is also very clever, ie, searching for the onset of surgical smoke events in real laparoscopic videos and choosing the before and after frames as image pairs. The authors manually review to find which events contain minimal motion (less than 5 pixels) . If some motion does exist, the authors use an existing motion correction algorithm cleverly applied to the red channel of the images which is both dominant in laparoscopy and least affected by smoke. This simulates the idea that the scenes of two frames are identical with the exception of surgical smoke. Thus de-smoking algorithms can be robustly benchmarked using reference metrics.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This paper has a few weaknesses:

    1. The first weakness is methodological. Most of the steps of curating the novel dataset rely on manual inspection. The only algorithm in the paper is the application of a classical motion correction algorithm from 2012 to correct minimal motion between identified image pairs with less than 5 pixels of motion. The overall methodology for creating the dataset does not meet the criteria for novelty of a MIC submission.
    2. The second weakness is the evaluation of the benchmark dataset. The authors evaluate a number of existing de-smoking algorithms on the new dataset. While benchmarking these is important to the community, and can now be done using reference metrics instead of non-reference metrics, the authors do not attempt to evaluate how the new dataset itself might compare to previously used datasets, eg synthetic, ex vivo, non-laparoscopic or unpaired in vivo datasets. The authors attempt to motivate that methods trained on synthetic of un-paired datasets do not generalize well to in vivo datasets. This is loosely motivated from papers that do not involve smoke artefacts or laparoscopy [24, 27]. But these statements are inconclusive from the benchmarking experiments. While the existence of a paired in vivo dataset is inherently useful to the community, the paper falls short of providing experimentation to prove the motivating claims for superiority of such a dataset over synthetic or unpaired data for training.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    While a novel smoke/smoke-free paired laparoscopic dataset for benchmarking de-smoking algorithms should be of interest to the MICCAI community, without methodological novelty, this paper would benefit from experimentation showing the superiority of the in vivo paired dataset over the state of the art synthetic, ex vivo, or unpaired datasets. The authors attempt to motivate weaknesses of synthetic and unpaired data in overfitting and generalization but do not prove these weaknesses in the benchmarking of methods trained on these datasets. A convincing set of experiments could include, for example, the additional comparison of existing methods trained/validated also on the new paired dataset and a similar benchmarking experiment on existing datasets to demonstrate superiority of the proposed dataset.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors that lead to the overall recommendation was a lack of methodological novelty (MIC) and a lack of experimentation to show the usefulness and superiority of the proposed dataset over existing datasets (CAI).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors make clear in their rebuttal that the purpose of this dataset is for benchmarking purposes on in-vivo data which is the first of it’s kind. It is clear now that while training can be performed on any other synthetic, ex-vivo, other data, (or even the proposed dataset) the evaluation on paired in-vivo data is important to the community. With this information I revise my original recommendation to fully accept this well done work.



Review #2

  • Please describe the contribution of the paper

    In the submitted work, the authors introduce a new in vivo benchmark dataset for the task of laparoscopic de-smoking. According to the authors, this dataset is the first of its kind, as previously only ex-vivo datasets, datasets without 1:1 correspondences between smoked and non-smoked scenes, or synthetic datasets, existed.

    In particular, the dataset is paired, i.e. contains pairs of the same scene and camera view, once with and once without smoke. This is made possible by also introducing an approach to facilitate motion tracking through smoke, allowing for the warping of the given groundtruth image to the smoked image view. Overall, it contains almost 100 such image pairs, rendering a good size for benchmarking of existing approaches. Along these lines, the authors also introduce the first study of existing methods, evaluated on these images, highlighting the promises and weaknesses of existing methods for the first time.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is overall well written, outlines the contributions clearly and presents them in a convincing way. The current state of the art is summarized, and the relevance of the research work is highlighted. The main contribution is three-fold: i) the creation of the dataset and the selection of suitable frames from existing work. ii) The development of an approach for motion compensation in deforming scenes with potentially moving cameras. iii) Applying a set of existing methods to the dataset and generating the first benchmark on the generated dataset.

    The presented work demonstrates the first dataset of paired in-vivo images in smoked and de-smoked scenes. This by itself is a great contribution to the community, as it allows for benchmarking of existing assistive technologies, and the development of new methods.

    Moreover, the approach for motion compensation of dynamic scenes is motivated well, and seems to be unavoidable for the in-vivo samples used in this dataset. While the approach is likely not perfect, the fact that human supervisors checked all results afterwards ensures the quality of the dataset and the suitability towards GT-sample benchmarking.

    The summary of existing works together with the benchmarking of existing works in Table 1 is a great showcase of the presented dataset.

    Moreover, one of the great advantages of this work, which is one main focus of MICCAI contributions, is the potential to showcase limitations of existing approaches and insights for future research directions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the work is overall convincing, the are a few parts for further improvement.

    Major Technical Comments —————-

    1.1 The dataset contains 961 images. While this seems to be a reasonable dataset size for benchmarking (which is also the main purpose of the proposed work), it seems to be very small to be used as a training set. This fact could be discussed in the paper, by showcasing how many images were used to train the works benchmarked in this paper. Are there several magnitudes of data missing? How could this gap be overcome for future work, given the considerable amount of manual work?

    1.2 It is not clear whether the warping in Fig.2 is the same as in Equation (4) after the de-enhancing?

    Minor Technical Comments —————-

    2.1 As introduced in the paper, the dataset mainly contains video sequences with “relatively stationary scenes”. While this is necessary for the given data processing, this can also be seen as a limitation, as it biases the dataset to a certain subset of surgical situations.

    2.2 The paper discusses works trained on synthetic or un-paired data. Is there any research happening on artificially smoked images? I.e. taking an un-smoked image, and adding smoke through (3D) computer graphic algorithms or NNs? This seems to be an obvious way to approach the problem, but is not discussed explicitly.

    2.3 Using in-vivo seems to be very important for video data / models. However, for an image dataset like the proposed one, one could argue that ex-vivo is sufficient. This could be discussed a bit more clearly.

    2.4 How are different levels of smoke considered in the dataset? Are multiple smoke amounts included for each smoke-free GT? This is not clear in the current version of the paper.

    2.5 Fig. 1 (the central paper figure) is not very appealing and not vectorized. Also not super easy to read.

    2.6 The time index to, t1, … is not super clear to the reviewer. Is one GT used and warped again and again for multiple corresponding smoked images?

    2.7 The whole approach contains several manual steps. It would be great to outline how this could be further improved for future work.

    Structural Comments ———-

    3.1 “A preferred alternative involves digitally removing …” Is there any reference showcasing that this is actually the preferred way?

    3.2 The three key areas introduced in the introduction (1. Development …, 2. Advancement …, 3. Evaluation …) seems to be a bit artificial, as it seems to be 3 times the same problem, just looked at from different perspectives (image models similar to AI-based methods, if not sufficient for training, also not sufficient for benchmarking).

    Grammar / Spelling Comments ——————

    4.1 “De-smoking ground truth cannot be created manually” –> “cannot be created through manual hand-labeling”?

    4.2 “In vivo animal trials” –> why exactly animals?

    4.3 Fig. 1 –> “Enhanced red channel”, should it not be “de-enhanced red channel”?

    4.4 “Dissected tissues. .” –> . .

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is clear that the dataset will be made publicly available.

    It would be great, if certain plans for a public benchmark would be provided, but this is of course fully optional.

    Moreover, more samples from the dataset in either the supplementary material or the video would be helpful, to better judge the diversity and quality of the dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The size of the dataset seems to be fairly small for some use cases that potential readers of the publication would be interested in. A better discussion on how large the datasets used in the related works are would be helpful.

    • In the method, it should made more clear how the baseline in Fig.2 is computed. Is it the same approach as in Equ. 4, just without the red channel preprocessing?

    • The limitations of the work, e.g. on potential biases coming from the dataset selection and preprocessing, should be highlighted more clearly.

    • The exact need for in vivo data for an image-only dataset could be highlighted more clearly.

    • More dataset samples should be provided in the supplementary material work, or in the accompanying video.

    • Fig. 1 does not quite meet the expectations for a top notch publication presented at MICCAI. It would be great to beautify the figure and make it overall more clear with less text.

    • The method, in particular the frames at times to, t1, … could be simplified and made more clear.

    • For future fork, it would be great to discuss how some of the extensive manual work required for the dataset generation could be even further simplified / removed.

    • It is not clear from which “previous study of robotic-assisted laparoscopic radical prostatectomy” the data is coming from. A reference is needed here.

    • Table 1 is a bit hard to read, in particular as M1, M2, … do not tell anything about the method. It would be helpful to refer to the methods using either the author et al., or the method’s acronym.

    • In Table 1, the values for the not desmoked image should be presented, to see the baseline values. Of course it is expected that any of the presented methods performs better as the fully smoky image. However, this would give some further trust about the pixel accuracy of the warped pseudo GT.

    • The best result in the table should made bold.

    • The dehaze reference is missing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper outlines a strong contribution to the community and fosters both future research as well as the benchmarking of existing research. Moreover, the strategy for generating the dataset, while limited to simple situations, is convincing and suitable for the propose scope of the publication.

    There are still a few weaknesses and parts that need clarification. However, the required adjustments needed to improve the quality of the paper seem reasonable.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The reviewer agrees with most parts of the authors’ response to the two main concerns of reviewer 1. Also, the reviewer appreciates the authors response to the two major technical comments made.

    The reviewer believe’s that the authors will be able to incorporate the comments from all 3 reviewers to even further enhance the paper and camera ready version.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a new dataset of smoky/non-smoky image pairs of laparoscopic procedures. Beside much manual effort, the steps to achieve this dataset include the detection of potentially relevant clips, the removal of clips with large movement and the movement elimination for small movements. Several SOTA approaches for desmoking are compared yielding a benchmark for future research.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength is providing the dataset of paired smoky and non-smoky images, which is indeed urgently needed to develop specifically enhanced methods for laparoscopic desmoking. This is important for CAI in real-world environments. The images in the paper as well as the supplemental video shows a high quality of provided image pairs.

    Secondary strength is the proposal of a method to pre-process smoked images to allow the application of optical flow to correct for small movement.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Main weakness is the description of the method to de-enhance the red channel of smoky images using structural images. The method itself is cited, but not all variables of the formula are described and the main idea of the method remains unclear.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    While the dataset will be released, which is the main strength of the paper, the code for motion removal seems not to become open access.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper very clearly introduces the problem itself and the motivation of the impact of the dataset which is now created and released. Unfortunately, the description of the method to reconstruct the red channel prior map using structural images is hard to follow. In Eqn. 1 and 2, the variable S_I and S_G are used. How do we derive them? Although paper [26] is referenced, the main idea of the approach should be included here. This is necessary in particular also because the cited method is used for de-enhancement, which seems to be a very special application of the approach.

    Please consider also to make the code open access to allow the community to enlarge the dataset afterwards more easily.

    Minor: overfull text box in abstract

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The dataset provided is very usefull and strengthens the CAI research for real-world applicability.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I keep my decision of Accept, because in the CAI domain, real-world data sets are rare and this one could be really helpful. The methods used are not novel or overwhelming sophisticated, but the text does not pretend to provide such methods. Therefore, for me still an Accept.




Author Feedback

We would like to thank the reviewers and the area chairs for this constructive and critical review of our paper.

As a CAI-focused submission, our main contribution lies in developing a novel methodology to create a paired in-vivo dataset that greatly benefits laparoscopic image desmoking. As indicated by the reviewers, the strength of our paper lies in the cleverness, effectiveness, and significance of our work. Reviewers also raised constructive criticisms regarding 1) the lack of methodological novelty and 2) the lack of comparison with other synthetic/ex-vivo datasets. In the following section, we would like to clarify these concerns.

1) Although each individual sub-step may not seem novel, the main novelty of our work lies in the overall workflow to systematically create such a dataset, allowing us to construct the first-of-its-kind in-vivo paired dataset for laparoscopic image desmoking. To the best of our knowledge, this is the first time such a methodology has been employed to address the well-known challenge in clinical translation for laparoscopic image desmoking, meeting the contribution standard for a CAI paper. Another novelty lies in the proposed pre-processing in the red-channel image, which is the main technique that can address the problem of motion correction through smoke. This discovery and methodology are novel and significant for the MICCAI community.

2) The ultimate goal of any clinical transnational method is to demonstrate its effectiveness in a real in-vivo setting. As suggested in our title and abstract, our dataset was created to satisfy this need, allowing us to evaluate and compare existing methods, reveal limitations, and offer insights for future research, which is also one of the main focus of MICCAI. Although synthetic/ex-vivo databases may facilitate training, they may not truly reflect algorithm performance in real in-vivo environments. Thus, we believe that comparing with synthetic/ex-vivo data for training purposes would be an interesting research direction for expanded future studies but is beyond the scope and focus of this paper.

Aside from these two major criticisms, we would also like to address other constructive comments. 1) For derivations of structural maps S_I and S_G, these are obtained by performing texture removal on the max-intensity color channel image (bright channel image) as a rough estimate of the image illumination. Since the red channel is almost always the max-intensity channel for laparoscopic images, S_I and S_G are computed as texture-free images of the red-channel images. 2) Example matlab code for motion correction will be provided with the dataset. 3) Another constructive question is about the potential difficulties in expanding the dataset for training. Existing de-smoking methods typically use 2000–20000 pairs of images for training (mostly synthetic). While our current dataset may not be sufficient to train a network from scratch, it is suitable for fine-tuning existing models or joint training with synthetic/non-laparoscopic data. To bridge the data size gap, a video content recognition-based method could be developed in the future to automatically identify candidate clips, significantly reducing manual work. With the in-house video data we have, we aim to expand the dataset to 2000 pairs and this number can be further expanded by using online database (eg. hamlyn database) or establish collaboration with other institution. 4) For R4’s concern 1.2, sorry for this oversight. Fig.2(c) is the result of equation 4 without red-channel pre-processing. The result of equation 4 with red-channel pre-processing isn’t shown as it looks too similar to Fig.2(b) in a side-by-side comparison due to very small motion. A very similar case (with worse smoke) is showcased in a motion picture in the supplementary material (second row). 5) We will also address reviewers’ minor comments as best as we can in the revision to improve the quality of this paper.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top