Abstract

Pathology is essential for cancer diagnosis, with multiple instance learning (MIL) widely used for whole slide image (WSI) analysis. WSIs exhibit a natural hierarchy—patches, regions, and slides—with distinct semantic associations. While some methods attempt to leverage this hierarchy for improved representation, they predominantly rely on Euclidean embeddings, which struggle to fully capture semantic hierarchies. To address this limitation, we propose HyperPath, a novel method that integrates knowledge from textual descriptions to guide the modeling of semantic hierarchies of WSIs in hyperbolic space, thereby enhancing WSI classification. Our approach adapts both visual and textual features extracted by pathology vision-language foundation models to the hyperbolic space. We design an Angular Modality Alignment Loss to ensure robust cross-modal alignment, while a Semantic Hierarchy Consistency Loss further refines feature hierarchies through entailment and contradiction relationships and thus enhance semantic coherence. The classification is performed with geodesic distance, which measures the similarity between entities in the hyperbolic semantic hierarchy. This eliminates the need for linear classifiers and enables a geometry-aware approach to WSI analysis. Extensive experiments show that our method achieves superior performance across tasks compared to existing methods, highlighting the potential of hyperbolic embeddings for WSI analysis.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0670_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/HKU-MedAI/HyperPath

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HuaPei_HyperPath_MICCAI2025,
        author = { Huang, Peixiang and Huang, Yanyan and Zhao, Weiqin and He, Junjun and Yu, Lequan},
        title = { { HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI Analysis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {261 -- 271}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    (1) The paper proposes the HyperPath framework, which integrates hyperbolic space with vision-language models to model the multi-level semantic structure of whole slide images (WSIs). (2) The paper introduces two key loss functions: Angular Modality Alignment Loss and Semantic Hierarchy Consistency Loss, designed for cross-modal alignment and semantic hierarchy modeling, respectively. (3) The paper implements a classifier-free classification mechanism, using geodesic distance for slide-level prediction, which avoids traditional linear classifiers and enhances geometric consistency and generalization.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The method is novel and well-structured. It is the first to combine hyperbolic geometry with hierarchical semantics in pathological image analysis, with a clear theoretical foundation. (2) It integrates cross-modal alignment with hierarchical modeling, simultaneously considering the alignment between visual and textual features and the consistency of semantic hierarchies. (3) The experiments are comprehensive and demonstrate strong generalization. The method outperforms existing approaches across multiple TCGA tasks, with stable performance in both IND and OOD settings.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) How is the textual information obtained? Is the textual information the class label, or does it include other content? (2) What is AdaptorT? Is it a trainable module? (3) Why do visual embeddings lie farther from the origin as the hierarchy goes deeper (i.e., from slide → region → patch)? (4) Which hyperparameters is the model sensitive to? There doesn’t seem to be any sensitivity analysis. (5) Are the different loss terms on the same magnitude scale? How are the loss weights determined?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the proposed framework is conceptually sound, the paper lacks sufficient detail on critical components. These components are central to the method’s effectiveness, yet their structure and training status are not clearly explained. This significantly affects the reproducibility and clarity of the proposed approach.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes HyperPath, a novel framework for whole slide image (WSI) analysis that leverages hyperbolic geometry to model the semantic hierarchy inherent in pathology images. The key idea is to integrate knowledge from textual descriptions into the representation learning process to enhance classification performance. Two novel loss functions are introduced: Angular Modality Alignment Loss (LAMA) to align visual and textual modalities, and Semantic Hierarchy Consistency Loss (LSHC) to enforce logical relationships (entailment and contradiction) across levels. The model achieves state-of-the-art results across multiple cancer classification tasks in both in-domain and out-of-domain settings.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper introduces a creative approach by applying hyperbolic geometry to WSI analysis, which is especially well-suited for capturing hierarchical semantics.

    • Effectively leverages both image and text features from vision-language models, an emerging and powerful direction in medical imaging.

    • Comprehensive evaluation across four TCGA tasks with both in-domain and out-of-domain splits.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The proposed approach introduces significant complexity with multiple components and transformations.

    • Some recent hierarchical models (even using graph) or foundation-model-based baselines might be missing. It’s unclear if all the most recent state-of-the-art techniques are included in the comparison.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See above

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper
    1. First adaptation of visual and textual features from pathology vision-language models (e.g., CONCH) into hyperbolic space, explicitly modeling the hierarchical structure of WSIs (patch-region-slide) and leveraging hyperbolic geometry’s exponential expansion to naturally encode semantic hierarchies.
    2. The ​​Angular Modality Alignment Loss (AMAL)​​ addresses cross-hierarchical alignment discrepancies between visual and textual modalities in hyperbolic space using angular distance, mitigating geometric mismatches caused by hierarchical granularity differences in traditional contrastive learning.
    3. The ​​Semantic Hierarchy Consistency Loss (SHCL)​​ employs entailment cones in hyperbolic space to model intra- and inter-modal entailment and contradiction relationships, enhancing semantic coherence across modalities and hierarchical levels.
    4. Replaces linear classifiers with geodesic distance-based similarity computation between slide-level features and textual class semantics, enabling hierarchy-aware classification without additional trainable parameters.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. First integration of hyperbolic geometry and cross-modal alignment in WSI analysis, addressing the inherent limitation of Euclidean space in hierarchical modeling. The AMAL and SHCL are original and absent in prior pathology or hyperbolic learning literature.
    2. Evaluated on four TCGA tasks (BRCA/NSCLC subtyping, HER2/EGFR prediction), HyperPath achieves significant improvements (1.9%-9.2% AUC/F1 gains) over baselines (ABMIL, TransMIL, HIT) under both OOD and IND settings.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Hyperbolic operations (exponential maps, geodesic distances) may increase computational overhead, but training/inference time comparisons with baselines are missing, hindering practicality assessment.
    2. Experiments focus on classification; segmentation or prognosis tasks are unexplored. (This opinion is not very important, and its flaws do not overshadow its merits)
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (6) Strong Accept — must be accepted due to excellence

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work is innovative (hyperbolic space + cross-modal alignment) with rigorous evaluation (multi-task, OOD validation) and clear writing.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

General Response to All Reviewers

Firstly, we would like to express our gratitude to all the reviewers for their hard work in reviewing and providing valuable feedback!

Due to space limits, we could not include all details and results, but our representative experiments clearly show the effectiveness of our method. Besides, we noticed that two concerns were raised:

1. Reproducibility: We will release the code upon acceptance to ensure reproducibility

2. Computational Complexity: The main cost is during training for hyperbolic representation learning. During inference, aggregation is done in the tangent space (a Euclidean space associated with hyperbolic geometry). Hyperbolic transformations are only applied to semantic class feature (this one can be precomputed, as it is like “label”) and the aggregated slide feature. Geodesic distance is computed once per WSI. And their complexities are comparable to a fully connected layer. Thus, test-time complexity is acceptable

Response to Reviewer 1

We greatly appreciate your positive and encouraging feedback. We are truly glad that you recognize the value of our work.

Based on the discussion above, we believe that the computational cost during testing is acceptable, which suggests the practical applicability of our method. We fully agree that extending our approach to more downstream tasks is an important and promising direction, and we intend to explore this in our future research

Response to Reviewer 2

Thank you for your thoughtful comments. We have carefully considered your feedback and made corresponding revisions in our camera-ready version. Due to page limitations, we have condensed many detailed descriptions in the manuscript. We sincerely apologize for any inconvenience this may cause. We will explain each of them in detail below.

Q1: Textual Information – We use synonymous prompts for each class (e.g., “invasive ductal carcinoma”, “breast ductal tumor”) as textual inputs. Future work may incorporate detailed reports for large-scale pretraining.

Q2: Adaptor – It is a trainable MLP that maps foundation model features into the tangent space of the hyperbolic manifold.

Q3: Visual Embedding Placement – In hyperbolic space, embeddings closer to the origin have larger entailment cones, so that they can entail broader concepts. Text embeddings are placed near the origin, while visual embeddings are placed progressively farther from the origin as the hierarchy deepens, forming a semantic hierarchy.

Q4: Hyperparameters – We used consistent settings (as shown in Sec 3.1) across datasets, largely adopted from baseline methods, showing robust performance without extensive tuning. We find they are not sensitive and do not include an explicit analysis.

Q5: Loss Scale – We set λ_a = 1, λ_s = 10 as shown in Sec 3.1. We set λ_s an order of magnitude larger than λ_a to prioritize the construction of the desired semantic hierarchy. This ensures that the hierarchical structure is well preserved. Ablation results show that removing the semantic hierarchy loss greatly harms performance, highlighting its importance for preserving semantic relationships and effective representation learning. For each individual loss term of different levels, we just apply the same scaling factor

Response to Reviewer 3

Thank you for your feedback and for recognizing the merit of our work.

As explained, inference complexity is acceptable. Due to space constraints, we selected strong, representative baselines (both MIL-like and hierarchical-like methods) including recent ones like ACMIL (2024). While graph-based approaches exist, they bring additional computational overhead. And our method still outperform these methods.

All baselines use CONCH for feature extraction to ensure fairness, so in our experiments, they are also foundation-model-based. We are aware of similar methods such as CATE, which we tried, and our method still achieves competitive performance




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top