Abstract

Surgical smoke in laparoscopic surgery can deteriorate the visibility and pose hazards to surgeons, although medical devices for mechanical smoke evacuation worked well, it prolonged operative duration and thus restricted the efficiency. This work aims to simultaneously remove the surgical smoke and restore the true-to-live image colors with deep learning strategy to improve the surgical efficiency and safety. However, the deep network-based smoke removal remains a challenge due to: 1) higher frequency modes are hindered from being learned by spectral bias, 2) the distribution of surgical smoke is non-homogeneity. We propose the multi-frequency and smoke attention-aware learning-based diffusion model for removing surgical smoke. In this work, the frequency compensation strategy combines the multi-level frequency learning and contrast enhancement to integrates comprehensive features for learning mid-to-high frequency details that the smoke has obscured. The smoke attention learning employs the pixel-wise measurement and provides the diffusion model with complementary features about where smoke is present, which helps restore the smokeless regions during the inverse diffusion process. And the multi-task learning strategy incorporates L1 loss, smoke perception loss, dark channel prior loss, and contrast enhancement loss to help the model optimization. Additionally, a paired smokeless/smoky dataset is simulated by a 3D smoke rendering engine. The experimental results show that the proposed method outperforms other state-of-the-art methods on both synthetic/real laparoscopic surgical images, with the potential to be embedded in laparoscopic devices for smoke removal.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2720_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2720_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Li_MultiFrequency_MICCAI2024,
        author = { Li, Hao and Zhai, Xiangyu and Xue, Jie and Gu, Changming and Tian, Baolong and Hong, Tingxuan and Jin, Bin and Li, Dengwang and Huang, Pu},
        title = { { Multi-Frequency and Smoke Attention-aware Learning based Diffusion Model for Removing Surgical Smoke } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a diffusion model based on multi-frequency and smoke attention-aware learning to remove surgical smoke. In this work, the multi-frequency block is used to obtain the mid to high frequency information of the image. Smoke attention learning provides supplementary features for diffusion models regarding the location of smoke presence. The multi task learning strategy combines L1 loss, smoke perception loss, dark channel prior loss, and contrast enhancement loss to assist in model optimization.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) This paper proposes a frequency compensation strategy that can adaptively learn high-frequency details of surgical smoke masking. 2) This paper proposes a smoke attention module that provides additional information about the location of smoke for diffusion models, which helps to restore smoke-free areas during the reverse diffusion process. 3) This paper proposes a multi-task learning strategy that integrates L1 loss, DCP loss, smoke perception loss, and contrast enhancement loss.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Why does formula (1) use β_ {t} as input instead of using t directly? (2) What does the t in I_ {t} ^ {sm} in Figure 2 mean? Does it also need to be obtained through diffusion, like I_ {t} ^ {sl}? (3) What is the experimental basis for using four loss functions in multi-task learning? The effectiveness of each loss function has not been demonstrated through ablation experiments. (4) Similar to (3), the effectiveness of the MFB module has not been demonstrated through quantitative ablation experiments. (5) What is the relationship between [“Cholec80 dataset”, “real world dataset”] and [“Synthetic smoke dataset”, “Real smoke dataset”] mentioned in the article? (6) Will the dataset be made public?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) Please cite necessary papers such as DDPM and CBAM. (2) This paper should provide a detailed introduction to the diffusion process and how to introduce conditional images for inverse diffusion. (3) A column of GT images should be placed as a reference in Figure 3. (4) More quantitative ablation experiments should be added.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is interesting, however quantitative ablation experiments are not enough.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I am inclined to accept the paper as the authors dealt with the comments well.



Review #2

  • Please describe the contribution of the paper

    This study introduces a new model for removing surgical smoke using advanced learning techniques. The key features of our model include:

    1. Frequency Compensation: This approach enhances the image by focusing on detailed frequency elements that are usually hidden by smoke, integrating these with contrast improvements to bring out more image details.
    2. Smoke Attention Learning: This method identifies where the smoke is in an image. By understanding these areas, the model can better restore parts of the image that are obscured by smoke.
    3. Multi-task Learning Strategy: This strategy uses different types of losses, such as L1 loss and smoke perception loss, to improve the overall performance of the model. This helps in refining the image quality and enhancing contrasts effectively.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-structured, with clear figures and text that make it easy for readers to follow.
    2. The proposed method demonstrates the ability to synthesize scenes with high fidelity, free from smoke.
    3. The experiments are comprehensive and effectively demonstrate the superior performance of the method.
    4. The author proposed a novel smoke removal network that integrates a DCP (Dark Channel Prior), a segmentation module, a diffusion model, and a multilevel frequency module to fully utilize information from the input image, achieving state-of-the-art performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Overall it is a good technique paper.

    Some crucial ablation studies are missing. The author mentions the use of a segmentation network to enhance the diffusion model with additional information. Therefore, the ablation study should include a comparison of the performance with and without this segmentation network.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See weakness

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The performance and the clinic meaning.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author answered my questions.



Review #3

  • Please describe the contribution of the paper

    This paper proposed a diffision model based framework aims at reducing smoke in surgical scenes. In specific, they designed Multilevel Frequency Learning to capture the mid-to-high frequencies which should be reduced in smoky images. They also design Smoke Attention Learning to identify smoky areas.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well structured and the model desgin is good, each component in the proposed model is targeted and reasonable.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Each important component is the proposed methos is reasonable in the model design, but there’s no ablation study to demonstrate the effectiveness of them.
    2. The Figures seems not SVG format so they are a little blurry when zoom in.
    3. Some illustration in method seems too redundant, it would be better to focus on the novelty this paper proposed. There’s also no implementation details of settings of experiments.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Each important component is the proposed methos is reasonable in the model design, but there’s no ablation study to demonstrate the effectiveness of them.
    2. The Figures seems not SVG format so they are a little blurry when zoom in.
    3. Some illustration in method seems too redundant, it would be better to focus on the novelty this paper proposed. There’s also no implementation details of settings of experiments.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    regardless of the weakness mentioned above, this paper proposed a clear framework aims at solving the smoke issues in surgical scenes. Each components is well designed and reasonable both in model and application side of consideration. Therefore, I think this paper is worth acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

The authors thank the reviewers for their constructive comments on the paper. (R1, R4, R5) We acknowledge the reviewers’ concerns regarding the effectiveness of the loss function, multilevel frequency learning block (MFB), and smoke segmentation network (SSN). Due to length constraints in the main paper, detailed evaluation results for these components were not included. However, supportive evidence demonstrating their effectiveness has been presented within this paper. Specifically, the MFB module enhances the detection of mid-to-high frequency information in the smoky images, as evidenced by the frequency domain analysis shown in Figure 1, it shows that MFB improves the learning of mid-to-high frequency information and can help smoke removal and restoration of image details. For loss functions, Appendix 4 provides an impact comparison of varying the loss weights (λ1, λ2, λ3) used in the multi-task learning, these weights, which extend the baseline L_1 loss, have been adjusted to optimize the model’s performance across various tasks, indicating that the multi-task learning effectively enhance the overall performance. In addition, SSN focuses on the smoke attention and is also associated with the smoke perception loss, its contribution (PSNR/SSIM is increased by 1.5/0.1 as λ1 is adjusted from 0.01 to 0.8) can be shown in Appendix 4, which validates the effectiveness of SSN in optimizing the model performance. (R1) Regarding the dataset issues, Cholec80 is a public dataset and “real world dataset” refers to a dataset from our local laboratory, both include laparoscopic surgery images but from different sources and they are used to evaluate the robustness of the model. The synthetic smoke datasets are rendered with the smoke-less laparoscopic images from both the Cholec80 and the local dataset, and the real smoke dataset refers to the smoky laparoscopic images from both the Cholec80 and the local dataset. And we plan to present our dataset during the conference sessions, facilitating in-depth discussions, feedback, and potential collaborations. (R1) Furthermore, regarding the technical symbol concerns, β_t denotes a predefined variance schedule parameter that controls the intensity of the noise addition at each step, using 𝑡 could result in less control and potentially unstable noise addition, while exponential decay nature of β_t ensures a smoother and more stable transition during the diffusion processes. I_t^sm is the smoke image and is not obtained through diffusion, t is used as an index and corresponds to the paired I_t^sm and I_t^sl.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Rebuttal addressed most of the concerns. Paper can be accepted.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Rebuttal addressed most of the concerns. Paper can be accepted.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All three reviewers agreed to accept this paper after the rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All three reviewers agreed to accept this paper after the rebuttal.



back to top