Abstract

Unsupervised Anomaly Detection (UAD) methods rely on healthy data distributions to identify anomalies as outliers. In brain MRI, a common approach is reconstruction-based UAD, where generative models reconstruct healthy brain MRIs, and anomalies are detected as deviations between input and reconstruction. However, this method is sensitive to imperfect reconstructions, leading to false positives that impede the segmentation. To address this limitation, we construct multiple reconstructions with probabilistic diffusion models. We then analyze the resulting distribution of these reconstructions using the Mahalanobis distance (MHD) to identify anomalies as outliers. By leveraging information about normal variations and covariance of individual pixels within this distribution, we effectively refine anomaly scoring, leading to improved segmentation. Our experimental results demonstrate substantial performance improvements across various data sets. Specifically, compared to relying solely on single reconstructions, our approach achieves relative improvements of 15.9%, 35.4%, 48.0%, and 4.7% in terms of AUPRC for the BRATS21, ATLAS, MSLUB and WMH data sets, respectively.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1502_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1502_supp.pdf

Link to the Code Repository

https://github.com/FinnBehrendt/Mahalanobis-Unsupervised-Anomaly-Detection

Link to the Dataset(s)

IXI: https://brain-development.org/ixi-dataset/ MSLUB: https://lit.fe.uni-lj.si/en/raziskave/viri/3D-MR-MS/ ATLAS: https://fcon_1000.projects.nitrc.org/indi/retro/atlas.html WMH: https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/AECRSD BRATS: http://braintumorsegmentation.org

BibTex

@InProceedings{Beh_Leveraging_MICCAI2024,
        author = { Behrendt, Finn and Bhattacharya, Debayan and Mieling, Robin and Maack, Lennart and Krüger, Julia and Opfer, Roland and Schlaefer, Alexander},
        title = { { Leveraging the Mahalanobis Distance to enhance Unsupervised Brain MRI Anomaly Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a framework to detect anomalies in Brain MRI in an unsupervised manner. The proposed method is based on the reconstruction of healthy brain regions, with probabilistic diffusion models, such that the anomalies are the differences between the input images and the reconstructed ones. The deviations between the input and the outputs are quantified using the Mahalanobis distance (MHD). Four different brain MRI datasets were tested using both AUPRC and Dice scores.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Unsupervised approaches are advantageous as they reduce the dependancy on manual annotations. 2) The proposed method was train on IXI data but tested on four different other datasets – highlighting its generalization capabilities. 3) The method was compared to 11 other methods – presenting superior results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The novelty here is very moderate. The key idea – reconstruction for anomaly detection (actually Brian MRI segmentation) has been proposed before and utilized. The contrbution proposed (multiple reconstructions, MHD) is a refinement of this key idea. 2) Segmentation is binary – anomaly vs. healthy brain tissues, while the BRATs for example contains labels allowing to partition the tumor. 3) Although comparison is extensive it is unclear if a classical method, e.g., Expectation Maximization, assuming a Mixure of Gaussian model of the scans’ intensities would not perform better.
    4) Ablation study is limited. Other distance/metrics (rather than MHD) and different number of diffusion models (how did you chose N=10?) are not assessed.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    implenmentation details are provided within the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer my comments above. Explain your methodological choices. What happens when the number of diffusion models gradually increases ? What are the method’s limitation? Minor: Acronyms e.g., DDPM and GM should be explicitely written.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is nice that the authors aimed to demonstrate generality of the method however, the Dice scores re very low. It seems that standard classical method (not machine learning) can do better. The novelty is limited. Ablation studies are limited.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Authors’ feedback only partially addressed my concerns. Being well familar with classical methods such as EM (which do not reuire training) I am not sure if the method proposed can do better. I also think that binary (only) segmentation is a significant limitation.



Review #2

  • Please describe the contribution of the paper

    The authors propose to impose Mahalanobis Distance (MHD) during anomaly detection. Specifically, the work selects DDPM as the generative model and utilizes MHD to measure the anomaly score. The author also proposes to generate multiple reconstructions as the mean reconstruction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The discoveries that sampling multiple reconstructions and that MHD can be a more effective strategy are interesting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The generalizability of the method is not discussed. Desipte the authors show the effectiveness of using MHD on cDDPMs, does this trend extend to other DDPM-based methods or GAN-based methods?
    2. The authors only samples 10 reconstructions per image. Does sample more or less reconstructions affect the performance?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See the weakness section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed idea is easy to understand and is demonstrated effective across 4 datasets. If the authors can show the generalizability of the idea across different reconstruction-based AD methods, I think it will be valuable contribution.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors claim the benefit of integrating MHD with different DDPM-based methods, which raise the value of the paper. I would suggest the authors to highlight this advantage in the final version of the paper.



Review #3

  • Please describe the contribution of the paper

    The paper introduces an Unsupervised Anomaly Detection method based on the calculus of the Mahalanobis distance between the input image and various image reconstructions produced by a generative model. Unlike conventional approaches that rely solely on a single reconstruction error, this method prioritizes inter-pixel dependencies to capture variations across reconstructions, leading to enhanced anomaly detection capabilities.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method proposed is well-structured and innovative. The authors discuss existing methods and the typical approach to studying anomaly reconstruction problems. Therefore, it is deemed appropriate to employ a different metric for examining the dissimilarities between generated images.The results of the proposed approach are promising, surpassing existing methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors mentioned that the method can reduce false positives in both the the results section and even at the end of the introduction, but simply analyzing one case is insufficient to demonstrate this. To truly show this, other strategies, such as introducing a metric, must be considered. Moreover, in equations 2 and 3, it is unclear whether the distance is calculated for each pixel with respect to the multivariate distribution of all pixels within the N reconstructions. Since the distance is a scalar, it is not clear how the final anomaly map is obtained. Finally, the analysis of symmetric patterns from correlations appears disconnected from the overall idea of the paper. It is unclear what the contribution of this analysis is. It is appropriate to discuss the relevance of the chosen distance with respect to other existing distances that serve for the same purpose.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Furthermore, there are a few additional suggestions to add to the previous comments. The appropriate name for the area under the precision-recall curve is AUCPR, not AUPRC. Particularly, from this score, the quality of well-predicted pixels can be assessed and discussed. Some notations, such as “DDPMs,” were mentioned for the first time without prior nomenclature introduction. There is some redundancy in repeating the contribution in various parts of the paper, such as at the end of the introduction, at the end of recent works, and at the beginning of the method section. Given the significance of the work, the description of how the final anomaly map is obtained should be more clearly elucidated mathematically.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The way this distance is applied in this context is innovative and appropiate. There are some corrections in the text and related to the discussion of the relevance of the achieved results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Dear Reviewers, We appreciate your time and effort in reviewing our paper. We would like to ask you to consider that the space limits of this conference paper also limit the number of methods and metrics we can include in our analysis, and we therefore had to focus. However, we feel that we still provide a comprehensive comparison. In the following, we address the major concerns.

Novelty and Limitations (R1): The reviewer correctly notes that reconstruction-based UAD is not new. However, the approach fundamentally depends on the anomaly scoring. Our work introduces a novel anomaly scoring method, which addresses limitations of reconstruction-based UAD and can substantially improve their performance, as demonstrated in our experiments. Therefore, we believe that our work is an important contribution to the field of UAD. Furthermore, the reviewer is correct that UAD is typically limited to binary segmentation. We will include this general limitation of UAD in the discussion.

Comparison with Classical Methods (R1): The reviewer has raised an interesting point regarding the potential performance of deep learning-based methods compared to “classical” methods. Our results revealed that deep learning approaches outperformed the compared “classical” covariance model (CM). Consequently, our comparative analysis primarily focuses on state-of-the-art deep learning-based methods. However, we agree that further work could include further methods.

Discussion of the Chosen Distance (R1 and R4): The MHD is designed to measure the distance of a point from a reference distribution. Particularly for multivariate outlier detection, the MHD is a common choice due to its ability to consider covariances, a feature not present, e.g., in the Euclidean or Manhattan distances. We chose MHD as it aligns well with our problem, i.e., to test whether a pixel value is an outlier / abnormal.

Methodological choices and Ablations (R1, R3 and R4): We agree that further ablation studies can be performed, and we do have further results that we could not present due to page limitations. When choosing the number of reconstructions, we observed a moderate improvement in performance up to N=10, after which performance plateaued (tested up to N=30). Hence, we used N=10. We also considered MHD for different DDPM-based models (DDPM, pDDPM and cDDPM). The results show that all benefit, but due to their overall superior performance, we chose to present the results only for cDDPMs. We agree that it would be interesting to evaluate using MHD beyond DDPM-based generative models in the future.

False Positives (R4): We agree that the claim of reduced false positives is not supported by a metric in the paper. We did observe a decrease in the false positive rate (FPR), but we focused on discussing the DICE and AUCPR metrics. As we don’t see how to fit the quantitative results for FPR in the paper, we suggest revising the introduction and results sections by removing the remarks regarding false positives.

Description of the MHD calculation (R4): The Mahalanobis Distance (MHD) is calculated for each pixel, considering the covariance of all pixels across the N reconstructions. After reshaping the MHD map to the input image shape, each pixel in the MHD map is a scalar representing the MHD of the corresponding pixel in the input image. The final anomaly map is obtained by a per-pixel multiplication of the MHD map with the initial anomaly map. We will clarify the mathematical description in our revised manuscript.

Relevance of the symmetry analysis (R4): This analysis aims to reveal correlations, leading to non-zero covariances among image pixels. These covariances are considered by “MHD_full” but overlooked by “MHD_diag”. Hence, this analysis provides insights into the information that can be leveraged by MHD_full, thereby providing a potential explanation for its superior performance.

We hope that our responses address your concerns. Kind regards, Anonymous Authors




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This is an interesting paper tackling an important topic in the MICCAI community. There were some issues in the original submission that the authors have addressed in the rebuttal. I recommend the authors to revise the final paper accordingly.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This is an interesting paper tackling an important topic in the MICCAI community. There were some issues in the original submission that the authors have addressed in the rebuttal. I recommend the authors to revise the final paper accordingly.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This papper introduces Mahalanobis distance and uses DDPM for anomaly detection in brain MRI. The authors generally did a good job in clarifying the misunderstanding in the rebuttal.

    R1 raised a concern on the limitation of binary segmentation. It should be noted that anomaly detection is not the same as semantic segmentation where multi-class labels could be used for training. In the multi-class setting, one might need to rely on unsupervised (e.g. clustering-based) but this might be out of the scope of anomaly detection. EM method was not compared although there was one classical method included in the comparison.

    I vote for accept considering the extensive comparison and its novelty in the context of anomaly detection.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This papper introduces Mahalanobis distance and uses DDPM for anomaly detection in brain MRI. The authors generally did a good job in clarifying the misunderstanding in the rebuttal.

    R1 raised a concern on the limitation of binary segmentation. It should be noted that anomaly detection is not the same as semantic segmentation where multi-class labels could be used for training. In the multi-class setting, one might need to rely on unsupervised (e.g. clustering-based) but this might be out of the scope of anomaly detection. EM method was not compared although there was one classical method included in the comparison.

    I vote for accept considering the extensive comparison and its novelty in the context of anomaly detection.



back to top