Abstract

Medical anomaly detection has emerged as a promising solution to challenges in data availability and labeling constraints. Traditional methods extract features from different layers of pre-trained networks in Euclidean space; however, Euclidean representations fail to effectively capture the hierarchical relationships within these features, leading to suboptimal anomaly detection performance. We propose a novel yet simple approach that projects feature representations into hyperbolic space, aggregates them based on confidence levels, and classifies samples as healthy or anomalous. Our experiments demonstrate that hyperbolic space consistently outperforms Euclidean-based frame-works, achieving higher AUROC scores at both image and pixel levels across multiple medical benchmark datasets. Additionally, we show that hyperbolic space exhibits resilience to parameter variations and excels in few-shot scenarios, where healthy images are scarce. These findings underscore the potential of hyperbolic space as a powerful alternative for medical anomaly detection. The project website can be found at https://hyperbolic-anomalies.github.io

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2690_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://hyperbolic-anomalies.github.io

Link to the Dataset(s)

https://github.com/DorisBao/BMAD

BibTex

@InProceedings{GonAlv_Is_MICCAI2025,
        author = { Gonzalez-Jimenez, Alvaro and Lionetti, Simone and Amruthalingam, Ludovic and Gottfrois, Philippe and Gröger, Fabian and Pouly, Marc and Navarini, Alexander A.},
        title = { { Is Hyperbolic Space All You Need for Medical Anomaly Detection? } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {316 -- 326}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes using hyperbolic space for medical anomaly detection instead of conventional Euclidean space. The authors hypothesize that hyperbolic geometry can better represent hierarchical features extracted from pre-trained networks for anomaly detection. Their framework generates synthetic medical anomalies, extracts multi-layer features from pre-trained networks, projects these features into hyperbolic space, aggregates them, and classifies samples using a hyperplane in hyperbolic space. The approach is evaluated on multiple medical imaging datasets across different modalities and compared against Euclidean-based methods. The results indicate that hyperbolic representations have the potential to outperform Euclidean approaches for image-level detection, particularly in few-shot scenarios.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors attempt to merge two interesting research directions, combining ideas from hyperbolic geometry with medical anomaly detection. The motivation for using hyperbolic space is reasonable, although it could benefit from clearer explanations.

    • The model is compared against a robust set of baselines using a well-curated benchmark dataset collection.

    • The few-shot performance of the model is noteworthy and provides an interesting angle given the common challenge of data scarcity in medical imaging.

    • Ablation studies effectively illustrate the effect of main hyperparameters.

    • The inclusion of statistical significance testing strengthens the experimental findings.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Overall, the method’s performance in the unsupervised setting is lackluster. While it shows improvements at image-level detection, it is difficult to conclusively state its superiority based solely on the presented results.

    • It is unclear how projecting into hyperbolic space explicitly models the hierarchy of features. Since each ResNet-layer feature embedding is projected independently and then aggregated into a single patch-wise embedding, more explanation is needed on why averaging in hyperbolic space yields better representations than in Euclidean space. Is there an empirical or theoretical way to verify this claim?

    • The paper states that normalizing the projected features by their Euclidean norms is “connected to model confidence” (with references). Explaining this connection to the reader in more detail would strengthen the paper’s motivation.

    • The experiments extract features from only two layers, raising the question: if the key insight is improved hierarchical representation, why not incorporate more levels?

    • AUROC, while commonly used in benchmarks, tends to over-emphasize performance on the majority class. For pixel-level segmentation, where the background often dominates, additional metrics such as PRO (used in the BMAD benchmark) would provide a more comprehensive evaluation.

    • Although the inclusion of statistical significance is appreciated, it is unclear how the authors adapted the Mann-Whitney U test for their setting, given that the test is designed for two-sample comparisons. More clarity on this aspect could be helpful.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • The methodology section needs further refinement. It should be made explicit that the notion of “hierarchy” stems from the different ResNet layers. Section 2.2 is somewhat confusing in its presentation: it is not clear how the feature embedding for an entire image is derived, as opposed to for a patch. For example, it appears that a spatial average pooling is performed for each patch from each layer, yet the subscript $i$ in the equations appears to refer to the entire image. Additionally, when specifying constants like $f_{i,l} \in \mathbb{R}^C$, please clarify what C represents.

    • In Section 2.3, consider highlighting salient details for the reader, especially for those unfamiliar with metric spaces or hyperbolic spaces.

    • In Section 2.4, it would be helpful to explicitly state that $w$ represents the learned parameters of the hyperbolic linear classifier. Furthermore, if Equations (5) and (6) apply to patch-wise classification, clarify how these outputs are aggregated for an image-level decision.

    Minor Comments

    • Specify the dimensionality of the original feature embeddings and of $w$. Section 4.1 suggests that $w$ is 512-dimensional, but this is not explicitly stated.

    • Figure 2 needs labels for its lines to make the results interpretable. Additionally, the curvature (blue line) does not seem to follow the statement that “better results are observed at lower curvature values.”

    • It would be interesting for the reader to see the final value of the learned curvature $c$ in Figure 2.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Barely accept. I believe the authors offer an interesting analysis; however, the presentation lacks clarity in several key areas. The performance of the model is modest, but that is sometimes inherent in exploratory research. I would have liked to see a more concrete justification of why hyperbolic spaces should be used beyond referencing prior citations.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors present a framework for anomaly detection that projects feature representations into hyperbolic space. For this reason, they 1) generate synthetic anomalies, 2) extract and aggregate features from a pre-trained classifier to project them into hyperbolic space, and 3) construct a hyperplane in this space to perform anomaly detection.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Overall, the authors convey the key message very clearly, and the paper is well-structured. The authors clearly define the research question. The provided figure helps the reader intuitively understand the proposed approach.

    2. The experimental design follows established benchmarks and evaluates the proposed approach on multiple public datasets. The authors provide statistical significance analysis, as well as ablation studies making the results convincing and easy to interpret. Moreover, they promise to release their source code, which will enhance the reproducibility of their approach.

    3. The authors provide a nice discussion about the benefits and future direction of their approach.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The authors focus on approaches that extract features from pre-trained networks. How does this compare to reconstruction-based (unsupervised) anomaly detection methods? A discussion of the limitations and benefits of each approach would really strengthen the paper.

    2. How does the authors’ design choice to create synthetic anomalies affect the proposed framework? Does their approach offer advantages in detecting non-localized anomalies compared to Euclidean methods? Could generative diffusion models help with this task?

    3. It would be helpful to intuitively explain in the text what kind of data and labels are required to train the proposed framework before diving into the methodological details. Such an explanation would help general readers and practitioners better understand and implement the proposed approach.

    4. A brief discussion of the limitations of the proposed framework would enhance the transparency of the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Please see major weakness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall motivation of this work is interesting and solid. However, I can only provide a limited assessment of the presentation and overall performance, as I am not very familiar with this specific area of anomaly detection.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents a novel approach to anomaly detection by using feature projections from pre-trained networks into hyperbolic space for anomaly detection showcasing results that outperform a number of baseline anomaly detection methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The application of feature projections from pre-trained networks into hyperbolic space is a novel contribution of this work that shows advantages over competing anomaly detection baselines that use pre-trained networks.

    • The paper is well written, with a coherent structure where it is easy to follow the methodology and evaluation of the work.

    • The model is well evaluated over 3 different datasets showcasing consistent strong performances across all tasks over the baselines.

    • The ablation study over model parameters additionally is a well documented add given the novelty of this approach for this set of work.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Although I understand the use and merits of a public benchmark dataset and associated baselines, it is clear that this comes from the computer vision side of research given the model choices and evaluation metrics. AUROC specifically is often geared towards anomaly detection in 2D computer vision methods, however is not always the best metric in medical imaging and further metrics such as DICE, AUPRC and sensitivity can be a much more telling and useful metric to use.

    • Further discussion of the results of section 4.2 Few-Shot Anomaly Detection and Localization would be helpful as some of these results are interesting and perhaps not what would be expected.

    • One note highly relevant to medical imaging is if this 2D approach can be easily transferable to 3D or are there any limitations given the compute required. As this is not discussed in the paper (but noted that Camelyon16 dataset is excluded due to memory issues) it would be nice to know if this would be of concern if trying to expand this approach to 3D or whether it would only be possible in 2D.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    In section 2.3 you say: “ We project the hyperbolic features to a lower-dimensional hyperbolic space with a hyperbolic linear layer [4], and adapt the features to the target domain”. It is not clear what you mean by adapt the features to the target domain and additionally not clear how this is done.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The strengths of this paper outweigh any weaknesses. Although better evaluation metrics could have been used, the author has followed the metrics used and provided baseline values from the benchmark dataset and should not be heavily penalised for this. Otherwise I believe the application and novelty of this approach shows promise over existing work and the thorough evaluation of the model and its hyperparameters are suitable and meet the necessary threshold for acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We would like to sincerely thank the reviewers for their thoughtful feedback and constructive suggestions. We have carefully considered all comments and have revised our manuscript accordingly to address the concerns raised.

– Motivation The motivation for employing hyperbolic space stems from the inherently hierarchical nature of both deep neural network feature representations and the structure of visual datasets. It is well established that earlier layers in convolutional networks tend to capture low-level features (e.g., edges, textures), while deeper layers encode increasingly abstract semantic concepts (e.g., object parts, class-specific cues). This progression naturally gives rise to a hierarchical structure across the network’s depth [1, 2]. We argue that projecting these features into hyperbolic space better preserves their relational structure than Euclidean embeddings, ultimately benefiting downstream anomaly detection tasks.

– Model Confidence In hyperbolic space, points near the boundary typically represent more specific, fine-grained concepts, while those closer to the origin correspond to more general or ambiguous representations. For example, a model trained on distinct dog breeds (e.g., Siberian Husky, Labrador Retriever, Dalmatian) would embed these confidently recognized features closer to the boundary. Conversely, an unfamiliar image, such as a wolf, which visually resembles a Husky, would likely be embedded nearer to the origin, reflecting the model’s uncertainty and the more abstract nature of the representation. This property discussed in previous works [3,4] offers an intuitive way to capture varying degrees of confidence – Evaluation Metrics We agree that including additional evaluation metrics would strengthen the analysis. We chose AUROC primarily because: (a) it is a principal metric used in the BMAD benchmark, and (b) it remains widely adopted in the anomaly detection literature. Due to space constraints in the main paper and restrictions in the supplementary material, we were unable to include results for other metrics. Regarding baseline comparisons, our focus on feature- and memory-based methods reflects their close alignment with our architectural design. Nevertheless, we recognize the value of incorporating reconstruction-based approaches for broader context and plan to include such baselines in future work.

– Limitations We have clarified the limitations of our approach in the revised conclusion section. Specifically, we currently extract features from only two layers, which aligns with common practice in hybrid (Euclidean + Hyperbolic) architectures. We agree that incorporating features from more layers could potentially improve performance and better reflect hierarchical representations. At present, while the use of fully hyperbolic networks shows better performance than hybrids and Euclidean architectures [5,6] it remains an active research area and lacks widespread stability and reproducibility. Despite these limitations, our results demonstrate that leveraging hyperbolic space can lead to tangible improvements in anomaly detection performance and highlight a promising avenue for further exploration.

We genuinely appreciate your feedback, and we strongly believe that these revisions have significantly enhanced the clarity and quality of our paper.

[1] Y. Bengio. Representation learning: A review and new perspectives. TPAMI, 2013. [2] T. Nguyen. Do wide and deep networks learn the same things? ICLR, 2021. [3] V. Khrulkov. Hyperbolic image embeddings. CVPR, 2020. [4] M. Ghadimi Atigh. Hyperbolic image segmentation. CVPR, 2022. [5] W. Chen. Fully hyperbolic neural networks. ACL, 2022. [6] A. Bdeir. Fully hyperbolic convolutional neural networks for computer vision. ICLR, 2023.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top