Abstract

Histopathology images capture tissue morphology, while spatial transcriptomics (ST) provides spatially resolved gene expression, offering complementary molecular insights. However, acquiring ST data is costly and time-consuming, limiting its practical use. To address this, we propose HAGE (Hierarchical Alignment Gene-Enhanced), a framework that enhances pathology representation learning by predicting gene expression directly from histological images and integrating molecular context into the pathology model. HAGE leverages gene-type embeddings, which encode relationships among genes, guiding the model in learning biologically meaningful expression patterns. To further improve alignment between histology and gene expression, we introduce a hierarchical clustering strategy that groups image patches based on molecular and visual similarity, capturing both local and global dependencies. HAGE consistently outperforms existing methods across six datasets. In particular, on the HER2+ breast cancer cohort, it significantly improves the Pearson correlation coefficient by 8.0% and achieves substantial reductions in mean squared error and mean absolute error by 18.1% and 38.0%, respectively. Beyond gene expression prediction, HAGE improves downstream tasks, such as patch-level cancer classification and whole-slide image diagnostics, demonstrating its broader applicability. To the best of our knowledge, HAGE is the first framework to integrate gene co-expression as prior knowledge into a pathology image encoder via a cross-attention mechanism, enabling more biologically informed and accurate pathology representations. https://github.com/uta-smile/gene_expression

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2215_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/uta-smile/gene_expression

Link to the Dataset(s)

HER+ dataset: https://doi.org/10.1038/s41467-021-26271-2 cSCC dataset: https://doi.org/10.1016/j.cell.2020.05.039 PCAM dataset: https://github.com/basveeling/pcam Skin cancer dataset: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/7QCR8S TCGA dataset: https://www.cancerimagingarchive.net/collection/tcga-brca/ SLN dataset: https://www.cancerimagingarchive.net/collection/sln-breast/

BibTex

@InProceedings{DanTha_HAGE_MICCAI2025,
        author = { Dang, Thao M. and Li, Haiqing and Guo, Yuzhi and Ma, Hehuan and Jiang, Feng and Miao, Yuwei and Zhou, Qifeng and Gao, Jean and Huang, Junzhou},
        title = { { HAGE: Hierarchical Alignment Gene-Enhanced Pathology Representation Learning with Spatial Transcriptomics } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {230 -- 240}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes HAGE (Hierarchical Alignment Gene-Enhanced), a framework that integrates gene-type embeddings and hierarchical clustering to predict spatial transcriptomic signals from histopathology images.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper introduces an intriguing task—predicting gene expression from pathology images to address the shortcomings of existing methods in utilizing spatial information. By combining gene embeddings and hierarchical clustering through the HAGE framework, the method effectively enhances the accuracy of gene expression prediction.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Although the paper proposes the HAGE framework that integrates gene embeddings and hierarchical clustering, its core technical components are relatively conventional. The use of cross-attention is standard and lacks novelty. Moreover, the so-called Hierarchical Alignment essentially relies on K-means clustering. According to the ablation study, the performance gain between single-layer and hierarchical cluster alignment is marginal. More importantly, in the “Impact of cluster” experiment, the authors note that varying the cluster parameters (k1/k2) has minimal effect on performance. This raises concerns about the actual contribution of the hierarchical clustering module to the overall model effectiveness. It suggests that the added complexity may be unnecessary, pointing to a potentially redundant design.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In short, I recommend a weak reject, primarily due to the limited methodological novelty. The use of cross-attention and hierarchical clustering in HAGE is fairly standard and lacks structural innovation. Moreover, the experimental results show that the Hierarchical Alignment contributes only marginal performance gains, suggesting it may be a redundant design. While the task itself is meaningful and the results are reasonably strong, the overall technical contribution remains somewhat conservative.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Agreed with the rebuttal



Review #2

  • Please describe the contribution of the paper

    They propose HAGE that leverages gene-type embeddings to guide the model to learn more powerful features for pathology image patches. They claim that they are the first work to achieve this, and evaluate their methods on six datasets to show its efficiency.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of leveraging gene information to learn more powerful pathological image feature embedding is very interesting. It is the major strength of this paper.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Further clarification and specification are needed regarding methodological details, as these ambiguities may lead to misunderstandings of the paper and reduce its overall quality. The specific issues are as follows: 1) The overall loss function (Equation 3) consists of four components. What are their respective roles? For example, what are the differences between the CyCLIP and CLIP loss functions, and what are their specific functions in this task? Why are both global and local loss functions designed? What distinguishes the PCC loss from the other three losses, and what is its role? 2) Why is hierarchical alignment necessary? What problem is the clustering step intended to solve? 3) The proposed method is designed for image patches—this point needs to be clearly stated; otherwise, it may be misunderstood as being applied to WSI-level features.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major contribution of this paper is the use of genetic information to help the pathological image feature encoder learn better representations, and this is also my main factor for evaluating the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The rebuttal has solved my concerns



Review #3

  • Please describe the contribution of the paper

    This paper proposes a framework that combines gene-type embeddings from spatial transcriptomics data with image embeddings from WSI data using a cross-attention mechanism in order to bring molecular insights into digital pathology applications. The framework is shown to perform better than existing methods on prediction of gene expression as well as patch-wise cancer classification and WSI diagnostics.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The key strengths of the paper are: 1. Integration of spatial gene expression data with morphology data thus improving biological interpretability by capturing “hidden signals” pertaining to tissue biology and disease progression, 2. Through a two step hierarchical clustering approach, gene expression profiles are grouped first, followed by grouping the corresponding image features within each gene cluster. This clustering aims to capture local and global molecular and morphological relationships ,3. By combining patch level contrastive learning on embeddings with cluster level contrastive learning, the technique outperforms existing methods in gene expression prediction, cancer classification and diagnostic tasks in digital pathology applications

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The framework builds on pre-trained and existing embedding methods (UNI and gene2vec) respectively within a cross-attention mechanism; therefore, the novelty of the method is limited. The computational aspects especially for large and high dimensional datasets are not addressed in the paper. Clinical Interpretability aspect is also not explicitly discussed in the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written, and the proposed methods are clearly presented including the ablation studies. However, based on the ablation study reported on table 2, the impact of hierarchical alignment component does not seem to be very significant.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thank you for the author feedback. I found the discussions/clarifications to be satisfactory.




Author Feedback

We thank reviewers for the thoughtful feedback and are pleased that our work is acknowledged as very interesting (R1), biologically interpretable (R2), and meaningful (R3). We address the main shared concerns followed by individual points.

[R2, R3] Clarification of Novelty Our major contribution lies in proposing a biologically grounded use of gene co-expression patterns to guide visual representation learning. Unlike spatial expression, which reflects where genes are expressed, co-expression captures regulatory relationships that inform which genes tend to be active together. We leverage representations that encode this structure to guide the image encoder, enabling visual regions to be attended to with awareness of gene-gene coordination. While the implementation uses standard mechanisms, the design enables a focused assessment of co-expression as prior knowledge. Our results show that incorporating this modality improves downstream performance, even with lightweight modeling choices. To our knowledge, this is the first work to formalize and apply gene co-expression as prior knowledge in expression prediction tasks.

[R1, R2, R3] Necessity of Hierarchical Alignment (HA) Tumor tissues display distinct compartments and nested substructures [1, 2], which flat clustering often fails to capture. Our HA is inspired by this biological organization and structures representation learning around coherent groupings to improve alignment and generalization under limited data (Sec. 2.3). The implementation uses standard components, but the contribution is conceptual, focusing on biologically meaningful insight. Cluster-based alignment creates pseudo-bags that allow each slide to contribute more image-gene pairs to the global loss, alleviating the lack of ST data. While clustering reduces MSE and MAE for both HA and its variant (Tab. 2), only HA consistently improves all metrics. On HER2+, removing HA lowers PCC_HEG by 1.1% (0.4458 to 0.4410) and reduces PCC_all from 0.2489 to 0.2474. A 1% PCC drop is considered meaningful in this field, as noted in the Hist2ST paper (e.g., our baseline). These consistent gains support the necessity of HA. [1] Smith et al.: The spatial and genomic hierarchy of tumor ecosystems revealed by single-cell technologies [2] Walker et al.: NeST: nested hierarchical structure identification in spatial transcriptomic data

[R1] Loss & Design Our model integrates four losses with complementary roles. The CyCLIP and CLIP losses promote cross-modality consistency and robust alignment, respectively, to support accurate prediction (Sec. 2.1). The global loss introduces pseudo-pairs generated from clustering to further enhance alignment (Sec. 2.3), forming the basis of the HA module (Fig. 1). The PCC loss supervises the expression predictor f_Pred to predict expression (Eq. 3). Our method is patch-based, we will revise the text to clarify it.

[R2] Scalability & Interpretability We agree that scalability is important. ST is a relatively young field, with broader data availability emerging since 2021 [3]. As large-scale datasets remain limited, we design our method to be effective under these constraints, including strategies like HA to enhance generalization from modest sample sizes. In Sec. 3.2 and Fig. 2, we use ITGB6 as a representative case, where high expression implies tumor presence. The model localizes such regions, showing potential for clinical interpretation. [3] Du et al.: Advances in spatial transcriptomics and related data analysis strategies

[R3] Cluster Experiment Thanks for pointing out the ambiguity. In Tab. 3, we aim to show that HA maintains stable performance when varying k1/k2 within a practical range (e.g., k1=10~40, k2=2~3), indicating that the method is not overly sensitive to moderate changes in these hyperparameters. This level of robustness is desirable, as extreme values may introduce noise or biologically implausible groupings. We will revise the wording to better reflect this intent.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top