Abstract

The high performance of denoising diffusion models for image generation has also paved the way for their application in unsupervised medical anomaly detection. As diffusion-based methods require a lot of GPU memory and have long sampling times, we present a novel and fast unsupervised anomaly detection approach based on latent Bernoulli diffusion models. We first apply an autoencoder to compress the input images into a binary latent representation. Next, a diffusion model that follows a Bernoulli noise schedule is employed to this latent space and trained to restore binary latent representations from perturbed ones. The binary nature of this diffusion model allows us to identify entries in the latent space that have a high probability of flipping their binary code during the denoising process, which indicates out-of-distribution data. We propose a masking algorithm based on these probabilities, which improves the anomaly detection scores. We achieve state-of-the-art performance compared to other diffusion-based unsupervised anomaly detection algorithms while significantly reducing sampling time and memory consumption. The code is available at https://github.com/JuliaWolleb/Anomaly_berdiff.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0424_paper.pdf

SharedIt Link: https://rdcu.be/dV578

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72120-5_13

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0424_supp.pdf

Link to the Code Repository

https://github.com/JuliaWolleb/Anomaly_berdiff

Link to the Dataset(s)

https://www.med.upenn.edu/cbica/brats2020/data.html https://www.kaggle.com/datasets/paultimothymooney/kermany2018

BibTex

@InProceedings{Wol_Binary_MICCAI2024,
        author = { Wolleb, Julia and Bieder, Florentin and Friedrich, Paul and Zhang, Peter and Durrer, Alicia and Cattin, Philippe C.},
        title = { { Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {135 -- 145}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper targets unsupervised anomaly detection in medical images based on a diffusion-based method. Inspired by latent Bernoulli diffusion models, the authors tried to reduce GPU memory and long sampling times for the original diffusion-based methods. Experiments are conducted on the BRATS2020 dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-organized and well-written.
    2. The motivation is clear.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty of this paper is limited. Medical image anomaly detection based on diffusion models has been extensively studied, and this paper’s improvements mainly focus on two aspects: first, utilizing existing latent Bernoulli diffusion models to improve model inference speed and save GPU memory; second, proposing a masking strategy to improve the inference process. However, (a) According to the results in Table 1, despite the improvement in inference speed compared to AnoDDPM, the performance has significantly declined. For example, the AUPRC has decreased from 0.727 to 0.656; (b) There are existing studies on masking strategies in the inference process, such as [a], which proposes a masking refinement strategy for OCT data. The authors are suggested to discuss the difference between their proposed method with the approach in [a] and its following studies.
    2. This paper only presents numerical results for the BRATS2020 dataset, and there is no improvement in the results. The authors mention that they also conducted experiments on the OCT17 dataset, but they did not show numerical results.
    3. The meaning of the subplot on the right side of Figure 3 is unclear.
    4. Figure 4 does not compare with existing state-of-the-art methods, so it cannot demonstrate the effectiveness of the proposed method. [a] Self-supervised masking for unsupervised anomaly detection and localization. TMM 2022.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Thank you for your response. I have carefully read the rebuttal and other reviewers’ reviews. The main concerns are the novelty of the paper and the significantly performance lost compared with baseline methods. Also, if possible, the RESC dataset [a] indeed contains the groundtruth srgmentation masks, which is suitable for the OCT data evaluation. [a] Junjie Hu, Yuanyuan Chen, and Zhang Yi. Automated segmentation of macular edema in oct using deep neural networks. Medical image analysis, 55:216–227, 2019.



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors present an unsupervised anomaly detection (UAD) technique that combines a binary auto-encoder with a latent Bernoulli diffusion model. The diffusion model enables the identification of anomalous latent codes and the creation of alternative, in-distribution samples, which are then turned into healthy counterfactual reconstructions in image space. Experiments are performed on BrATS 2020 multi-sequence data (quantitatively) and on OCT images (qualitatively), comparing the proposed approach to a series of diffusion-based competitors.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written and easy to follow. Figures 1 and 2 are particularly helpful in understanding the overall approach.
    2. The methodology is interesting and suitable for the task. The reduction in inference time and memory footprint is particularly noteworthy and potentially useful for future developments.
    3. The shown reconstructions are quite good, which suggests the technique is sensible for the task.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited novelty: the technique is based on Binary Latent Diffusion models by Wang and al., which feature the same architectures (i.e. binary auto-encoder followed by a Bernoulli Diffusion model). The masking and stitching of the latent space of diffusion models follows the same rationale of other recently published works (many - but not all - of which have been referenced, e.g. AutoDDPM). In practice, the main novelty seems to lie in trying the masking/stitching approach on binary latent diffusion models, which is a sensible decision but carries limited novelty. Can the authors further justify/clarify their contribution?
    2. Not yet state of the art results: judging from AUPRC (which is one of the more reliable metrics in UAD studies), the proposed approach is substantially behind AnoDDPM in detection performance. On the other hand, it requires 10x less time to run, which is quite impressive.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors trained their brain MR model on slices extracted from only one dataset (BrATS) after removing bottom and top slices. This experimental approach has already been used in the past but is, in my view, a bit questionable, since it might introduce substantial biases. I would suggest, for future studies, to use separate datasets for training and testing as well as more complete volumes, since this is a more realistic scenario for UAD applications. The experiments of the UPD study (Lagogiannis et al., https://arxiv.org/abs/2303.00609) could be for instance used as benchmark.
    2. It is unclear to me how hyper-parameter tuning was performed: grid search is mentioned, but was cross-validation used? If so, with how many folds? If not, what was the validation set and how was it created? This is particularly important for the setting of the P (threshold) and L (max noise level) parameters: how could they be tuned using only an anomaly-free training set?
    3. The reported metrics are okay, but in general it would be interesting to also report maximum DSC coefficient (typically indicated as [DSC]) as well as average precision (AP), which are now becoming quite standard in UAD literature (see the aforementioned UPD study as well as Baur et al, https://arxiv.org/abs/1804.04488).
    4. Related to the above: the authors use PSNR between original image and restored version as an index of reconstruction quality. But anomalous tissue is supposed to be altered in the restoration process, which would in turn lower PSNR. Am I missing something?
    5. As already stated, AnoDDPM still outperforms the proposed technique by a substantial margin in AUPRC (which is typically considered more reliable than DSC for anomaly detection). I believe that this should be explicitly stated and discussed in the paper.
    6. The OCT dataset seem to have been used only for qualitative comparisons. Are ground-truth anomaly maps not available to produce quantitative results? If so, I would 1) state this, 2) produce a reference anomaly map at least for the qualitative comparisons, so that it is possible to judge the performance of the compared models.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty is limited since the pipeline is essentially already available in the Binary Latent Diffusion paper by Wang and al., and the obtained results suggest that the technique still lags behind state of the art approaches in terms of detection accuracy. This being said, the application of binary latent diffusion models to unsupervised anomaly detection is interesting, and the proposed approach seems much faster than the competition. Overall, I believe this paper will be of interest to a good portion of MICCAI’s audience.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Reading the rebuttal, I am convinced that this paper can be of interest to the MICCAI audience and will be beneficial to researchers working in unsupervised anomaly detection.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a novel method to enhance unsupervised anomaly detection by replacing Gaussian noise with binary Bernoulli noise in the diffusion model. Additionally, a masking scheme is incorporated during the diffusion process to preserve anatomical information. This approach yields improved and accelerated results during the sampling process compared to traditional Gaussian noise-based diffusion models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well-written, with clear and easily understandable contributions.

    The elegant approach of simplifying Gaussian noise to binary noise in the diffusion model for unsupervised anomaly detection offers faster inference compared to state-of-the-art methods. The utilization of binary noise, which is simpler than Gaussian noise, along with the masking scheme, significantly improves the efficiency of noise estimation by the neural network. This improvement is supported by the experiments in Figure 3.

    Although the augmentation of performance metrics is modest compared to existing methods, the substantial gains in time and memory efficiency make this method highly effective in balancing performance and resource constraints.

    The authors conducted experiments on both simple tasks with large tumors and challenging tasks with OCT data, further strengthening the argument in favor of this method. This comprehensive experimentation underscores the robustness and applicability of the proposed approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The masking scheme is still unclear to me. From what I understand, it’s utilized to focus on regions with a high probability of flipping, thus intuitively capturing abnormal regions in the latent space. However, how exactly does it retain anatomical information? Additionally, I find the notation $\tilde{z}_0$ a bit confusing, as it seems to still depend on $t$.

    Secondly, does the UNet estimate calibrated probabilities for the Bernoulli noise parameters, or does it simply output raw scores from the network?

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As a suggestion for future work, I propose that the authors explore the calibration of probabilities derived from their estimation network.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is clear and well-written. The authors aim for efficiency and have elegantly proposed a method to maintain performance while significantly reducing the high complexity of standard diffusion models for anomaly detection. I believe this paper should be accepted, as it is of interest to researchers working in anomaly detection and, more generally, to the MICCAI community.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    I thank the authors for their responses, and it seems they have addressed the questions raised by other reviewers as well.

    I still believe that the focus on efficiency is crucial and often overlooked in most research. This paper provides a valuable solution that balances performance and resource efficiency, which is commendable. For these reasons, I will maintain my initial grade.




Author Feedback

We thank all reviewers for their valuable feedback. Below we address the main concerns and clarify some issues.

(R3, R4) Limited novelty: As R3 correctly states, we were inspired by [28], who used binary latent diffusion models for image synthesis. The novelty of our approach is to adapt their method to solve an unsupervised anomaly detection (UAD) task. To this end, we introduce a masked denoising process to recover the healthy reconstructions. Although other UAD approaches have also used masking schemes, our proposed pipeline is novel and specifically designed to exploit the binary nature of the Bernoulli diffusion model, allowing anomaly values to be obtained directly from the model output, i.e., the predicted flipping probabilities. We will include work [a] proposed by R4 in the camera-ready submission and briefly discuss the contribution of our masking scheme, which is not learned by the model but taken directly from the model output, highlighting the advantage of our binary architecture for this binary task.

(R3, R4) Evaluation metrics and comparison to SOTA: As R3 and R4 correctly point out, our method lags behind AnoDDPM in terms of AUPRC. As suggested by R3, we will highlight this in the camera-ready version. However, from the quantitative and qualitative comparison shown in Table 1 and Figure 4, it is comparable to a variety of SOTA diffusion-based approaches, while significantly reducing memory requirements and sampling time, which will be crucial for a successful extension to 3D problems in future research. Unfortunately, there are no ground truth anomaly maps available for the OCT dataset, so quantitative results are not possible for this dataset. We will clarify this in the camera-ready version. We agree with R3 that reporting [DSC] and AP would be beneficial. Following AutoDDPM [4], we additionally report the PSNR value between input and output image to indicate the reconstruction quality close to the input image. The aim is to show that the anatomical information is preserved in healthy tissue.

(R3) Experimental design: We thank R3 for the constructive comments on the training/validation/test set. We argue that training and testing on separate datasets introduces a domain shift, which may affect model performance. Omitting the lower and upper slices of the BRATS2020 dataset removes the bias of tumors being more prevalent in the middle of the brain. Although this 2D slice-wise approach can indeed be improved in its design, it serves as a good first step in model development before benchmarking as suggested by R3. As we trained only on healthy samples, no cross-validation was performed. The grid search presented in Figure 3 (left) was performed on the test set described in Section 3.

(R4) Meaning of subfigure 3: Taking advantage of the binary nature of Bernoulli diffusion models, we can extract image-level anomaly scores directly from the model output. In Figure 3 (right), we show that the flipping probabilities predicted by the diffusion model are indicative for out-of-distribution data. This can be seen by the clear difference between the healthy and diseased cohorts for different masking thresholds P and noise levels L.

(R5) Details on the masking scheme: We thank R5 for the positive feedback. The proposed masking scheme extracts entries in the latent space with a high flipping probability (indicating out-of-distribution data). While these entries can be changed during the denoising process, all other entries of the original input image are preserved. This step is implemented in Equation 7. This allows the anatomical information of the input image to be preserved, resulting in a higher PSNR value compared to the unmasked baseline (Table 1). The final layer of the diffusion model is a sigmoid activation function that provides probabilities in the range [0,1]. We will consider the suggestion to explore the calibration of the probabilities to further improve the estimation network in future research.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a method for unsupervised anomaly detection. The method is a combination of an encoder-decoder neural net with a denoising diffusion model applied in the latent space. The main objection, which serves as grounds for rejecting this paper, is that the performance is far inferior to existing standard supervised methods. The method is of little practical interest for the MICCAI audience.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents a method for unsupervised anomaly detection. The method is a combination of an encoder-decoder neural net with a denoising diffusion model applied in the latent space. The main objection, which serves as grounds for rejecting this paper, is that the performance is far inferior to existing standard supervised methods. The method is of little practical interest for the MICCAI audience.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top