Abstract

Unsupervised Anomaly Detection (UAD) methods aim to identify anomalies in test samples comparing them with a normative distribution learned from a dataset known to be anomaly-free. Approaches based on generative models offer interpretability by generating anomaly-free versions of test images, but are typically unable to identify subtle anomalies. Alternatively, approaches using feature modelling or self-supervised methods, such as the ones relying on synthetically generated anomalies, do not provide out-of-the-box interpretability. In this work, we present a novel method that combines the strengths of both strategies: a generative cold-diffusion pipeline (i.e., a diffusion-like pipeline which uses corruptions not based on noise) that is trained with the objective of turning synthetically-corrupted images back to their normal, original appearance. To support our pipeline we introduce a novel synthetic anomaly generation procedure, called DAG, and a novel anomaly score which ensembles restorations conditioned with different degrees of abnormality. Our method surpasses the prior state-of-the art for unsupervised anomaly detection in three different Brain MRI datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1928_paper.pdf

SharedIt Link: https://rdcu.be/dV58l

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72120-5_23

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1928_supp.pdf

Link to the Code Repository

https://github.com/snavalm/disyre

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Nav_Ensembled_MICCAI2024,
        author = { Naval Marimont, Sergio and Siomos, Vasilis and Baugh, Matthew and Tzelepis, Christos and Kainz, Bernhard and Tarroni, Giacomo},
        title = { { Ensembled Cold-Diffusion Restorations for Unsupervised Anomaly Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {243 -- 253}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a Unsupervised Anomaly Detection (UAD) method to identify anomalies in test samples. The approach is based on a generative cold-diffusion pipeline (i.e., a diffusion-like pipeline which uses corruptions not based on noise) that is trained with the objective of turning synthetically-corrupted images back to their normal.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The presentation is relatively clear

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The proposed approach seems to be quite similar to [16], more detailed clarification for the novelty need to be explained and discussed. 2) The authors claimed that the approach can generate subtle abnormalities, and distangle different attributes of the abnomality. The claims are not convincingly justified. More visual samples and experiments need to be designed to support such claim. 3) The experiments are conducted for Brain MRI images only, as a general approach, different medical images such as Chest Xray images etc. shall be involved. Are the model overfitting to brain lesions? 4) While AP is reported to evaluate the accuracy of anomaly location, how about the quality of reconstruction?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    n/a

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) Novelty needs to be further clarified 2) More intensive results shall be visualized and reported to support the claims.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Experiments and novelty

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    All reviewers raised concerns regarding to the novelty, when compared with available literature work [16] and [20], though rebuttal clarified a bit, the modification and refinement of the work is quite minor and incremental. All reviewers also challenged the experiments, which are not convincing for current version. Regarding to the guidelines, significant addition of results could change the contents, which contradict with the guidance. So, I don’t think the current version, even after minor revision, could meet the standard of MICCAI.



Review #2

  • Please describe the contribution of the paper

    The proposed method focuses on enhancing Unsupervised Anomaly Detection (UAD) in brain MRI images by employing a generative approach that does not depend on annotated data. The core of the method is a cold-diffusion pipeline, which is trained to restore synthetically corrupted images to their original, uncorrupted state. This technique has been rigorously tested across three brain MRI datasets, where it shows superior performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is written clearly and in a coherent manner.
    • Related work section provides a comprehensive and insightful review of prior research.
    • The proposed method has been evaluated extensively on three datasets and sets a new benchmark for UAD task on Brain MRI datasets
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The proposed method exhibits limited novelty since it is refining an existing approach, DISYRE [1], by altering two specific components. Firstly, instead of employing random cropping to simulate anomalies, the method disentangled anomaly generation process in three parts: shapes, texture and intensity bias, to better mimic real lesions. Secondly, they propose a minor adjustment in the calculation of the Anomaly Score, which is to sum “single-step” restoration at each time step rather than summing only the incremental versions.

    • Also, the reported performance of the DISYRE in this paper, doesn’t seem to match the reported performance of the DISYRE paper.

    • A qualitative comparison between the standard Anomaly Score and the Ensemble Anomaly Score (as introduced in the proposed method) would be beneficial. This could provide deeper insights into the enhancements or limitations introduced by the new method.

    [1] Naval Marimont, S., Baugh, M., Siomos, V., Tzelepis, C., Kainz, B., & Tarroni, G. (2024, February). DISYRE: Diffusion-Inspired SYnthetic REstoration for Unsupervised Anomaly Detection. In Proceedings/IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging. IEEE.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    A qualitative comparison between the standard Anomaly Score and the Ensemble Anomaly Score (as introduced in the proposed method) would be beneficial. This could provide deeper insights into the enhancements or limitations introduced by the new method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper does a good job in terms of clarity, presentation and evaluation. Even though the proposed method shows limited novelty as mentioned above, the results seem to show superior performance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Tha Authors have clarified most of my concerns in the rebuttal. Assuming they will implement all the changes they have promised, I think the paper could be accepted.



Review #3

  • Please describe the contribution of the paper

    A “disentangled anomaly generation” process is proposed to simulate anomalies of varying shapes, textures and overall intensities. This process is an extension of the one in [20], [16]. Compared to [16], one additional novelty in this paper is the use of intensity shifting in the simulated anomalies.

    Now, an image restoration model is trained to replace the simulated anomalies with the original image regions. To this end, a “cold diffusion” model [1] is trained, that generalized diffusion models to arbitrary corruption processes. Notably, this model is trained with a single-step restoration objective - that is, it takes as input the image with the simulated anomaly and the degree of severity of the simulated anomaly, and outputs the original image in one step.

    Given a test image, the image restoration model is evaluated with a number of values for the degree of severity. Pixel-wise average of the residual maps is the anomaly map predicted by the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Unsupervised anomaly detection is an unsolved problem, particularly in medical imaging. The proposed self-supervised approach of simulating anomalies and training a model to revert the simulated anomalies is an interesting line of work, following [16], [19], [20], etc.

    2. The method provides impressive results, especially in detecting subtle anomalies such as the one on the top left in figure 3. I can imagine that methods based on image restoration will have a lot of trouble in detecting such subtle abnormalities.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some details regarding the method are missing: a. Please provide details regarding the patch-wise training (random patches from random images are sampled in each batch?), and the sliding window inference (e.g. what stride is used?). b. Also include a sentence or so to justify the use of patch based vs image based training and testing.

    2. Please clearly state the differences in the anomaly simulation process in the proposed method vs those in [16] and [20].

    3. Missing comparisons: a. with diffusion model based image restoration methods (e.g. Wolleb et al MICCAI 2022). b. with [20] in table 1.

    4. Please show qualitative comparison with [16] and with at least one image restoration method based on diffusion models.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No additional comments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In addition to the comments in the “weakness” section:

    1. At the end of section 3, there is a discussion regarding the step size hyper parameter. But it was previously stated that a single step restoration is employed. Please clarify that the multi-step restoration and the step size discussion is relevant for the comparisons in table 2.

    2. In equation 1, x0 should be x?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a simple idea that leads to good results, as evaluated on multiple challenging datasets. Although comparisons with diffusion based image restoration models are missing, I recommend acceptance at this stage, but suggest that the authors should add these comparisons if a journal extension of the paper is planned.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I would like to retain my rating of the paper, and would encourage authors to add details to the paper (if accepted), as promised in the rebuttal.




Author Feedback

Dear Reviewers,

We appreciate the time and effort you invested in reviewing our paper and providing valuable feedback. Below, we address the concerns raised, particularly clarifying the novelty of our approach.

The innovation of ECDR hinges on two novel components, namely a disentangled anomaly generation process and an ensembled inference procedure, explained in more detail below. Our experiments demonstrate that these two innovations are key to overcoming the limitations of previous methods, which fail to detect subtle anomalies.

Disentangled Anomaly Generation (DAG): [20] introduced Foreign Patch Interpolation (FPI) to generate synthetic anomalies using only square shapes. [16] uses an enhanced FPI approach, with random shapes and smoothed interpolation edges to avoid sharp edges in the synthetic anomalies. Differently from [16], we first introduce a normalization step so foreign patches have the same intensity ranges as the training images. Furthermore, we propose to generate anomalies by separately manipulating shape, texture, and a novel intensity bias component. The intensity bias component randomly shifts specific tissue types at the synthetic anomaly location. DAG allows for more realistic and varied synthetic anomalies compared to existing methods [16, 20], achieving a better coverage of the anomalous distribution which in turn improves the robustness and generalizability of our method.

Anomaly Localization with Ensembled Restorations: Unlike previous works that used a multi-step restoration for inference, our method proposes an ensemble of single-step restorations, each based on a different assumption on the severity of the anomalies present in test images. By considering different assumptions of severity, our method is more robust than [16,20] and generalizes better to real brain MRI anomalies, specifically those that appear as subtle contrast changes from the expected normal anatomies.

The relevance of these novel contributions is strongly supported by a consistent performance improvement upon [16], by 5% and 13.8% in the BraTS-T2 and ATLAS datasets respectively, and most importantly, by 41% in BraTS-T1. Regarding R3’s comment, “the authors claimed that the approach can generate subtle abnormalities”, we want to clarify that we do not claim our method can generate more subtle anomalies. Our claim, supported by the quantitative results on BraTS-T1, is that our method is more robust to subtle, real medical anomalies at test time, thanks to the interactions of DAG and the Ensembled Restorations inference procedure.

We appreciate the Reviewers’ suggestion to evaluate our method on additional modalities, like Chest X-Rays. We now have preliminary results showing strong performance in Chest X-Ray that will be included in future work. Consistently, [10] shows that methods performing strongly across the three Brain MRI datasets were also often competitive in CCR and CheXpert datasets.

We are also evaluating Wolleb et al.2022 in our future work, although this approach is weakly supervised, requiring both healthy and anomalous training samples and image-level annotations, making comparisons not like-for-like.

R4 noted that the results reported in [16] do not match the results in Table 1. We clarify that [16]’s arXiv submission was updated on 05/03/2024, and we included this latest version.

We also thank R3 for suggesting to quantitatively evaluate restoration image quality. Although not critical for Unsupervised Anomaly Detection, counterfactual restorations address important interpretability aspects, and R3’ suggested experiment is an interesting idea to expand this work.

As suggested by R4, we will include the multi-step anomaly score in the qualitative comparisons. We will also include details regarding patch-wise pipeline and its justification as suggested by R1. We use patch-based training to make the pipeline more flexible and applicable seamlessly to other image modalities with higher resolution.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have addressed most concerns raised by R1 and R4, who subsequently gave “Accept” decisions after the rebuttal. They also responded to R3’s concerns about novelty and introduced acceptable new results to support their conclusion. Therefore, I suggest an “Accept.”

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have addressed most concerns raised by R1 and R4, who subsequently gave “Accept” decisions after the rebuttal. They also responded to R3’s concerns about novelty and introduced acceptable new results to support their conclusion. Therefore, I suggest an “Accept.”



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This is an interesting paper that addresses the important area of unsupervised anamoly detection. Despite some issues regarding limited novelty, I still think this paper is of high interest to the MICCAI community and recommend accepting it.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This is an interesting paper that addresses the important area of unsupervised anamoly detection. Despite some issues regarding limited novelty, I still think this paper is of high interest to the MICCAI community and recommend accepting it.



back to top