Abstract

Automated semantic segmentation in colonoscopy is crucial for detecting colon polyps and preventing the development of colorectal cancer. However, the scarcity of annotated data presents a challenge to the segmentation task. Recent studies address this data scarcity issue with data augmentation techniques such as perturbing data with adversarial noises or using a generative model to sample unseen images from a learned data distribution. The perturbation approach controls the level of data ambiguity to expand discriminative regions but the augmented noisy images exhibit a lack of diversity. On the other hand, generative models yield diverse realistic images but they cannot directly control the data ambiguity. Therefore, we propose Diffusion-based Adversarial attack for Semantic segmentation considering Pixel-level uncertainty (DASP), which incorporates both the controllability of ambiguity in adversarial attack and the data diversity of generative models. Using a hierarchical mask-to-image generation scheme, our method generates both expansive labels and their corresponding images that exhibit diversity and realism. Also, our method controls the magnitude of adversarial attack per pixel considering its uncertainty such that a network prioritizes learning on challenging pixels. The effectivity of our method is extensively validated on two public polyp segmentation benchmarks with four backbone networks, demonstrating its superiority over eleven baselines.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2811_paper.pdf

SharedIt Link: https://rdcu.be/dV518

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72114-4_62

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2811_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Jeo_Uncertaintyaware_MICCAI2024,
        author = { Jeong, Minjae and Cho, Hyuna and Jung, Sungyoon and Kim, Won Hwa},
        title = { { Uncertainty-aware Diffusion-based Adversarial Attack for Realistic Colonoscopy Image Synthesis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {647 -- 658}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose to combine adversarial perturbations with diffusion, to generate synthetic images while controlling both data diversity and ambiguity. The proposed method is validated on two polyp detection benchmarks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Very well written
    2. Strong validation in terms of comparison with SOTA across 2 benchmarks.
    3. The qualitative results look nice.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My main concern is missing citations. A quick google search revealed several articles from MICCAI 2023 on conditional diffusion models that were missing. The authors should consider citing the relevant ones among these:

    1. https://conferences.miccai.org/2023/papers/134-Paper3305.html
    2. https://conferences.miccai.org/2023/papers/299-Paper0172.html
    3. https://conferences.miccai.org/2023/papers/637-Paper1741.html

    My second major concern is, if the main method novelty comes from adding adversarial attack on diffusion models, the result doesn’t show significant benefit from it. Can the authors elaborate?

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. In introduction, the authors claimed that “the generative models cannot sensitively control the ambiguity of the sampled data.” I didn’t see any experiments supporting this argument.

    2. If I understand the ablation results in table 2 correctly, the attack only improves the performance by about 1%. Since the baseline is already 92%, I understand this might be significant. But please provide some qualitative understanding of which lesion type/ parts contributed to this improvement.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Missing citations, as well as unclear benefit from the technical contribution.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors have a addressed a good research gap. They propose a diffusion-based adversarial attack for semantic segmentation considering pixel-level uncertainty.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written, well organized, and attempted to convince beautifully.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper has some weaknesses technically that can well be addressed.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper has addressed an important problem, however, the approach has a few limitations that the authors need to address:

    1- The proposed approach seems to rely on adding imperceptible perturbations to input images to fool the segmentation model. However, the effectiveness of these perturbations can be sensitive to the magnitude of the added noise. Small changes may not be sufficient to fool the model, while large changes may be easily detected by human observers.

    2- Adversarial perturbations generated by the method may exhibit limited transferability across different segmentation models or architectures. The perturbations may be highly tailored to the specific model architecture and training data used during the attack, making them less effective against unseen models.

    3- Diffusion-based attacks may be susceptible to preprocessing or postprocessing steps commonly applied in semantic segmentation pipelines, such as data augmentation, normalization, or filtering. These steps can mitigate the effectiveness of the adversarial perturbations, reducing their impact on segmentation performance. The authors may refer the below papers while addressing: “A Lightweight Neural Network with Multiscale Feature Enhancement for Liver CT Segmentation,” Scientific Reports, Nature, vol. 12, no. 14153, pp. 1-12, 2022. “Re- routing drugs to blood brain barrier: A comprehensive analysis of Machine Learning approaches with fingerprint amalgamation and data balancing,” IEEE Access, vol. 11, pp. 9890-9906, 2023. “Dense-PSP-UNet: A Neural Network for Fast Inference Liver Ultrasound Segmentation,” Computers in Biology and Medicine, ScienceDirect, vol. 153, pp. 106478, 2023.

    4- Generating adversarial examples by the method seems to be computationally expensive, especially for high-resolution images or complex segmentation models. The iterative nature of the attack process may require a large number of iterations to find effective perturbations, resulting in long computation times.

    5- While diffusion-based adversarial attacks can successfully perturb input images to produce misclassifications, the resulting perturbations may not be robust to real-world variations such as changes in lighting conditions, camera angles, or occlusions. Adversarial examples generated in controlled settings may not generalize well to real-world scenarios.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written, well organized, and attempted to convince beautifully.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposed a method for colonoscopy image synthesis which aims to solve the scarcity of training data in segmentation task. It innovatively introduced adversarial attack as part of the work flow and employed a weight computation method to emphasize the edge area.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The application of adversarial attack into the synthesis of data effectively increase the diversity of the generated data compared to methods that only relies on generative models
    2. A weighted map that cast more weight at the edge attack is beneficial for the ability of the network in the segmentation of edge area
    3. Qualitative and quantitive evaluation are both presented, and various segmentation networks are tested to support the capability of proposed method in generating reasonable data to improve the segmentation performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It will be better to have more clinical-related analysis of the qualitative results included in Fig.3, for example, a reference segmentation map that is manually segmented by doctors. The first example of ‘Generated data w/o DASP’ has smooth area with different color right next to the masked polyp region, which can be a sign of missing mask of polyp in the first stage synthesis. The quality of data generation before DASP should be validated.
    2. In section 4.1 Baseline and Evaluation, it is expected to have clear demonstration on whether the comparison methods like CutMix are applied on Kvasir and ETIS directly, or also on the generated data. To guarantee fair comparison, the comparison should be all evaluated on generated data.
    3. There are other datasets for colonoscopy including CVC-300, CVC-ColonDB and CVC-ClinicDB, they can be useful to evaluate whether the better performance is the result of overfitting, or the improved capability of networks brought by training with more representative data
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    ChatGPT The paper introduced a novel approach to synthesize colonoscopy images, addressing the limited availability of training data for segmentation tasks. It creatively incorporated adversarial attack into the workflow and utilized a weight computation method to highlight the edge area. However there are things that can be further improved:

    1. Better verification of the reliability of the generated data before applying adversarial attack.
    2. Make sure other compared methods are applied on generated data
    3. Try to have evaluation on other datasets to avoid the possibility of overfitting But overall, these problems are not greatly influencing the novelty and contribution of this work.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper does states the problem of data scarcity according to the quantitive and qualitative results, with innovation of combining an uncertainty-weighted adversarial attack method. However in the analysis of the results, there should be clearer demonstration on comparison. The proposed method can also be more comprehensively verified on more datasets aside from the two mentioned in the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank unanimous acceptance and constructive reviews.

Rev #1: Q) Missing references. A) We appreciate your suggestions. We will add them to the paper and discuss their methods.

Rev #1: Q) Why generative models cannot sensitively control the ambiguity? A) As generative models perform random sampling from a learned data distribution, they do not ‘directly’ control the ambiguity of the sampled data in the feature space. On the other hand, adversarial attack-based methods directly control data ambiguity by manipulating attack parameters such as the number of attacks and attack magnitude.

Rev #1: Q) The benefit from the attack is not significant. Which lesion part contributed to this improvement? A) First of all, compared with the 2nd and 4th row in Table 2, the Diff-PGD attack itself showed a relatively minor improvement (~0.5%). However, with the uncertainty map (5th row), the pixel-level strength of the adversarial attack is controlled such that the final performance is further improved. Specifically, this improvement is mainly derived from the correct segmentation of lesion edges as the uncertainty map emphasizes the significance of ambiguous edge regions for model training. As shown in Fig. 3, a model has difficulty in predicting edge regions (7th column). This inconsistency with the ground truth (4th column) on these edges is improved with a supervised loss during training.

Rev #3: Q) Sensitivity of noise magnitude. A) Our method exhibits low sensitivity to noise magnitude. As shown in Fig. 1, our method projects perturbed data back onto the original data manifold via a diffusion model, even when large noises are added. Also, Fig. 1 in the supplementary shows that our method generates seamless and realistic data even when the attack is sufficiently applied.

Rev #3: Q) Diffusion-based attacks may be susceptible to pre/post-processing. A) On the Kvasir-Seg with U-Net, we conducted an ablation study with normalization, which resulted in mIoU of 93.05. This marginal decrease from our original result (93.1) without normalization indicates that our method is robust on normalization.

Rev #3: Q) Long computation time for attacks. A) We agree that the iterative nature of PGD leads to long computation times. To reduce it, non-iterative attack methods such as FGSM (Goodfellow et al, ICLR 2015) can be used as alternatives. On the Kvasir-SEG with U-Net, the required time for PGD with K=10 was 67.8s and that for FGSM (i.e., K=1) was 6.26s. However, as shown in Table 2 of the supplementary, we observed a trade-off between performance and time; the mIoU of the FGSM setting was 92.6, while that of the PGD was 93.1.

Rev #3 and #4: Q) Additional experiments. (e.g., model transferability, real-world settings, more datasets, and evaluation of generated labels) A) We thank the reviewers for these suggestions and acknowledge their importance. However, due to limited space, it is challenging to add all these analyses to the paper. We value the reviewers’ feedback and will include these experiments in detail in the future journal version of our paper.

Rev #4: Q) Were baseline methods applied to generated data? A) Data augmentation methods were applied to the given data. As noted in the Introduction, data augmentation methods can control data ambiguity but naturally exhibit a lack of diversity compared to generative methods as they were designed to alter the given data slightly. If they were applied to generated data, it would be beyond the scope of their original approach which may potentially introduce unintended biases. Also, if generated data are used, it is unclear whether only generated images or both generated images and labels should be used, as a generative DNN such as ArSDM generates only images. In this regard, the augmentation methods were applied to the given data for a fair evaluation. Note that our novelty comes from adopting benefits from both methods so that it provides data diversity and ambiguity controllability.




Meta-Review

Meta-review not available, early accepted paper.



back to top