Abstract

Medical anomaly detection aims to identify abnormal findings using only normal training data, playing a crucial role in health screening and recognizing rare diseases. Reconstruction-based methods, particularly those utilizing autoencoders (AEs), are dominant in this field. They work under the assumption that AEs trained on only normal data cannot reconstruct unseen abnormal regions well, thereby enabling the anomaly detection based on reconstruction errors. However, this assumption does not always hold due to the mismatch between the reconstruction training objective and the anomaly detection task objective, rendering these methods theoretically unsound. This study focuses on providing a theoretical foundation for AE-based reconstruction methods in anomaly detection. By leveraging information theory, we elucidate the principles of these methods and reveal that the key to improving AE in anomaly detection lies in minimizing the information entropy of latent vectors. Experiments on four datasets with two image modalities validate the effectiveness of our theory. To the best of our knowledge, this is the first effort to theoretically clarify the principles and design philosophy of AE for anomaly detection. The code is available at \url{https://github.com/caiyu6666/AE4AD}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0616_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0616_supp.pdf

Link to the Code Repository

https://github.com/caiyu6666/AE4AD

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Cai_Rethinking_MICCAI2024,
        author = { Cai, Yu and Chen, Hao and Cheng, Kwang-Ting},
        title = { { Rethinking Autoencoders for Medical Anomaly Detection from A Theoretical Perspective } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper analyses the potential of autoencoders to detect medical anomalies based on normal training data from a theoretical perspective.

    The main contributions are the analysis of the influence of the latent space dimension on the identical shortcut problem

    A theoretical analysis to provide explanations on how AE operates for the AD problem

    Evaluation of 4 different datasets, including different image modalities

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well strucutured and provides insights in the theoretical background of AD with AE.

    It provides a easy solution to avoid the identical shortcut problem, which can easily be included in different implementation. High relevance for different use cases.

    Good visualization of relevant steps in Fig1 and Fig2, nice overview of results.

    Interesting future work for a easier to apply version of the algortihm

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses of the theoretical analysis:

    Proposition 1’s proof is not a mathematical proof but rather an argument. If Z_0 = Z_0 W_1 W_2+ (b_1 W_2 + b_2), this does not necessarly mean that W_1W_2 (Paper mentions W_2W_1, I assume this is a typo) should therefore be the unit matrix and b_1=b_2=0, other combination are possible this is just one solution.

    They state “To ensure that Eq. 4 has at least one solution, the number of independent scalar equations should not exceed the number of learnable parameters” One what common mathematical proposition is this based? No citation given.

    They conclude that d >= D/2 if at least one solution exists , it follows that for d< D/2 there is no solution. It is unclear whether this is true, as the original mathematical theorem is not mentioned. It is unclear whether this is a consequence relation or an equivalence relation, as no known proposition is mentioned.

    In the proof of proposition 2 they mention: “As shown in Fig. 2(a), this reveals that the information content of abnormal data H(Xa) comprises the information content of normal data H(Xn) and the information content specific to lesions H(Xa|Xn).” While this might be true for smaller abnormal parts like lessions, it is not neccesaarly true for bigger abnormalties like Pneumothorax or Pneumonia.

    In the proof of proposition 2 they conclude I(X_n, Z) = H(X_n) + H(Z) - H(X_n, Z) and therefore I(X_N:Z) <= H(X_n). It is not comprehensible why this applies to H(Z) >0. The notation and definition of I and H are not sufficiently introduced.

    The supplementary states that all image data is resized to the size 64x64. This is a very small image size with a very low resolution. It is possible that abnormalities that do not affect many pixels, as they appear in the data sets, are no longer displayed in the image data.

    The implemented baselines are relativly old ( 2019 and older papers). Newer state-of-the-art baselines would benefit the evaluation.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The author does not provide code, but within the supplementary, they provide sufficient training and architecture information. They use public datasets, however, it is unclear how they picked the samples relevant for their training (random, the first/last samples etc.) and do not discuss data preprocessing it is only partially possible to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Include a short title, the current one is too long

    Like discussed in the main weakness section, please discuss theorems on which the cited proofs are based in more detail, especially references to theorems on which the proof is based are important.

    In a journal paper where more pages are available, it would be helpful to list the formulas mentioned in the text as equations, as this improves the overview and makes it easier to follow.

    I would redo the experiments for the data sets where abnormalities are also present in a small pixel size with a larger image size, e.g. 224x224.

    Introducing another state-of-the-art baseline based on a recent machine learning paper would improve the evaluation.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main focus of the paper is the theoretical insights, but as discussed in the main weaknesses there are some doubts about the proofs of the two propositions.

    The images size of 64x64 raises doubts about the information content of abnormalities with a small pixel size.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper attempts to theoretically clarify the principles and design philosophy of AE for anomaly detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Provide some theoretical support

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. For Eq. (4), why W^(2)W^(1)=I_(DxD), my understanding is that W is the weight parameter of the fully connected layer, and they have no constraints. Also why b=0? It is claimed below that b is a learnable parameter. If it is 0, why does it need to be learned?
    2. “To ensure that Eq. 4 has at least one solution, the number of independent scalar equations should not exceed the number of learnable parameters” What does mean?Any explanation?
    3. For Fig.2, why is the relationship between abnormal image and normal image X_a contained in X_n, rather than X_n contained in X_a?
    4. Table 1, you mentioned d>D/2=512, but i didn’t see d=512 case, so…. Also, I don’t seem to see any experimental results from Table 1 that can support the theoretical derivation. Because from Table 1, the optimal result is not directly related to the dimension of d.
    5. What is the dimension of D in all experiments?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    no

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    please refer to main weaknesses

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Experimental results do not appear to support theoretical claims

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper demonstrates how theoretical insights can be practically applied to yield significant improvements in use of autoencoders. The authors dive into theory to prove non-existence of identical shortcuts and that the AE with optimal latent size can be used to improve anomaly detection in medical images

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper provided theoretical foundation to explain the workings of reconstruction-based methods and uncover theoretically optimal solutions and the theoretical insight are validated with experiments which brings a new perspective into application of AE with d_optimal By applying their methodology to different types of data, the paper effectively demonstrates the versatility and robustness of the approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Upon thorough evaluation of the paper, I find that the authors have addressed the key aspects of their research question comprehensively. The methodology is sound, the data analysis is robust, and the conclusions are well-supported by the findings. No weakness found

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Sharing the dataset used would be helpful

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Authors could work on sharing the dataset used in the paper

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As stated above the authors have addressed the key aspects of their research question comprehensively. The methodology is sound, the data analysis is robust, and the conclusions are well-supported by the findings.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

[Proof-Proposition 1] Q1:(R4) Why there are constrains on W&b in Eq4-5? A1: Here we assume an identical mapping in the AE’s bottleneck and then find out the conditions to be satisfied in this case. As a result, Eq4-5 are the determined conditions for identical mapping. Q2:(R3&4) Why the solution of Z0=Z0W1W2+(b1W2 + b2) is W1W2=I, b1=b2=0? A2: Thank R3 for pointing out the typo. In identical mapping, Z0=Z0W1W2+(b1W2+b2) should always hold for any Z0, rather than hold for specific Z0. To help understand, we move all terms into one side of the equation and get Z0(W1W2-I)+(b1W2+b2)=0 for any Z0. Clearly, we must have W1W2=I, and then b1=b2=0, to make this equation hold for any Z0. This deduction is also verified by a previous paper [1]-Sec3.1 Q3:(R3&4) Explain the statement “To ensure that Eq.4 has at least one solution, the number of independent scalar equations should not exceed the number of learnable parameters”. A3: Based on the definition of Matrix product [2], W1W2=I_(DxD) is equivalent to D^2 scalar equations. The number of elements in these equations is the sum of the number of elements in W1 and W2, i.e., Dd+dD=2Dd. Based on the consistency of System of linear equations [3], D^2 should be <= 2Dd to make the solution exist, i.e, d>=D/2. Q4:(R3) Relationship between “at least one solution exists –> d>=D/2” and “d<D/2 –> no solution”. A4: They are contrapositions [4], thus equivalent. [Proof-Proposition 2] Q5:(R3&4) Does H(Xa) always contain H(Xn), and if so, why? A5: We argue that for regional anomalies, H(Xa) always contains H(Xn). The reason is that information entropy measures variation of random variables [5]. Thus, H(Xn) represents variation of normal regions. H(Xa) represents variation of normal and abnormal regions, since all normal patterns can appear in abnormal images. Note that H(Xa) is a measurement regarding the distribution of abnormal images, thus, specific samples with bigger anomalies mentioned by R3 don’t affect the conclusion. Q6:(R3) How to get “I(Xn;Z)<=H(Xn)” from “I(Xn;Z)=H(Xn)+H(Z)-H(Xn,Z)”? A6: Based on the property of joint entropy[6], H(Xn,Z)>=max[H(Xn),H(Z)]>=H(Z). Thus, H(Z)-H(Xn,Z)<=0 –> I(Xn;Z)=H(Xn)+H(Z)-H(Xn,Z)<=H(Xn) [Experiment] Q7:(R4) No results at d=512, and Tab.1 doesn’t support the theoretical claim. A7: There exist misunderstandings. We kindly request reviewer to read Sec4.2. We believe that a careful reading will address the concerns. As described in the 1st paragraph of Sec4.2, we present reconstruction errors w.r.t. d (from 1 to 1024) in Fig.3, instead of in Tab.1. Fig.3 shows that when d is small, an increase in d results in a decrease of rec errors. When d>D/2=512, an increase in d doesn’t lead to smaller errors. This supports Proposition 1 that AE with small d doesn’t encounter identical shortcut, and d>D/2 makes the bottleneck saturated. As shown in 2nd-3rd paragraphs of Sec4.2, Tab.1 presents results to support Proposition 2. First, reducing d from 128 to 1 initially improves the performance and then leads to deterioration. This aligns with proposition 2 that H(Z) should be minimized to H(Xn). Second, the optimal d for 2D scans is smaller than 3D scans (MRI). The reason is that MRIs offer richer information than 2D scan, resulting in larger H(Xn) -> larger optimal d. (R1&3: More SOTAs, input size, open source) We aim to build a theoretical foundation for AE-based AD. Thus, introducing extra designs to improve metrics is out of the scope of this study. The size 64 follows previous works. Larger size doesn’t bring improvement. Code&data will be public. [1]You et al. A unified model for multi-class anomaly detection.NeurIPS2022 [2]https://en.wikipedia.org/wiki/Matrix_multiplication#Matrix_times_matrix [3]https://en.wikipedia.org/wiki/System_of_linear_equations#Consistency [4]https://en.wikipedia.org/wiki/Contraposition [5]Shannon C E. A mathematical theory of communication.1948. [6]https://en.wikipedia.org/wiki/Joint_entropy#Greater_than_individual_entropies




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    None of the three reviewers made a final decision after the rebuttal. This paper received one “Accept” and two “Weak Reject” decisions. After reviewing the authors’ rebuttal and manuscript, I believe they have sufficiently addressed the reviewers’ concerns. I suggest an “Accept.” The authors should improve clarity and incorporate the rebuttal materials into the final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    None of the three reviewers made a final decision after the rebuttal. This paper received one “Accept” and two “Weak Reject” decisions. After reviewing the authors’ rebuttal and manuscript, I believe they have sufficiently addressed the reviewers’ concerns. I suggest an “Accept.” The authors should improve clarity and incorporate the rebuttal materials into the final version.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I think that the rebuttal has clarified some questions and addressed some concerns. Overall this is an interesting work which provides some insights on the existing architectures and can be interesting to the MICCAI audience. Clearer explanations are suggested to be included in the final submission.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I think that the rebuttal has clarified some questions and addressed some concerns. Overall this is an interesting work which provides some insights on the existing architectures and can be interesting to the MICCAI audience. Clearer explanations are suggested to be included in the final submission.



back to top