Abstract

Spatial transcriptomics enables localized gene expression profiling within histological regions. Current supervised methods struggle to infer patterns for novel gene types beyond their training scope, while existing zero-shot frameworks partially address this by incorporating gene semantics, the ``independent learning’’ paradigms hamper their usage in zero-shot gene expression prediction. Specifically, they learn tissue morphology and gene semantics (inter-modality) independently, and treat gene functions (intra-modality) as independent entities. In this paper, we present a deep association multimodal framework which bridges pathological image with gene functionality semantics for zero-shot expression prediction. Concretely, our framework achieves generalized expression prediction by integrating nuclei-aware spatial modeling that preserves tissue microarchitecture, cross-modal alignment of pathological features with gene functionality semantics via iterative vision-language prompt learning, and gene interaction modeling that dynamically captures relationships across gene descriptions. On standard benchmark datasets, we demonstrate competitive zero-shot performance compared to other competitors (e.g., outperforms 16.3% in mean Pearson Correlation Coefficient on cSCC dataset), and we show clinical interpretability of our method. Codes is publicly available at https://github.com/DeepMed-Lab-ECNU/ALIGN-ST.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2218_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/DeepMed-Lab-ECNU/ALIGN-ST

Link to the Dataset(s)

cSCC dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE144240 Her2st dataset: https://zenodo.org/records/3957257#.Y4LB-rLMIfg

BibTex

@InProceedings{ZhoYij_Deep_MICCAI2025,
        author = { Zhou, Yijing and Lu, Yadong and Li, Qingli and Li, Xinxing and Wang, Yan},
        title = { { Deep Association Multimodal Learning for Zero-shot Spatial Transcriptomics Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {131 -- 140}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a deep association multimodal framework that bridges pathological images with gene functionality semantics for zero-shot gene expression prediction. The framework addresses the limitations of current supervised and zero-shot methods. The method achieves generalized expression prediction by integrating nuclei-aware spatial modeling, cross-modal alignment of pathological features with gene semantics, and dynamic gene interaction modeling. Experiments on two datasets (HER2+ and cSCC) demonstrate performance improvement.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper presents a clear and effective approach to integrating multiple data modalities with existing methods to bridge pathological images and gene functionality semantics for zero-shot gene expression prediction. It demonstrates superior performance compared to state-of-the-art methods on two benchmark datasets under zero-shot settings. The overall architecture and workflow are clearly illustrated, which helps the reader understand the proposed framework. The paper includes an extensive ablation study that explores various combinations of the proposed components.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The data used in this paper consists of three types: patch, mask, and description. Among these, the mask and description are generated, but there is no discussion on their stability or reliability. This paper uses a mask in the image feature extractor, which introduces a difference compared to the SGN architecture. However, there is no discussion on whether its use truly benefits the task. SGN is the only comparative method evaluated in a zero-shot setting. I noticed that the dataset used in this paper differs from the one used in SGN, and SGN exhibited particularly poor performance on HER2+ in this context. This result raises some concerns regarding the consistency of the comparison.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In this work, its strength outweighs weakness!

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes a multimodal framework for zero-shot spatial transcriptomics prediction by introducing dynamic mapping of tissue morphology to gene-gene interactions. By introducing a cross-modal interaction driven visual prompt learning mechanism, image features and textual gene-type descriptions are iteratively aligned aiming to improve generalization into unseen gene types. The proposed method is shown to outperform existing zero-shot methods like SGN.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Major strengths of the paper: 1.Dynamic Cross-modality alignment via vision-language prompt learning, 2.Gene-gene relationships modeling via adaptive semantic interaction graphs. 3.Zero-shot prediction capability into unseen gene types without the need for re-training, 4. Inclusion of Nuclei-aware spatial features

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper does not clearly show that the proposed method outperforms supervised methods

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a zero-shot model with a potential to establish biologically meaningful approach to gene-expression prediction in oncology settings.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel deep association multimodal framework for zero-shot gene expression prediction by effectively bridging pathological images with gene functionality semantics. Unlike existing methods that treat tissue morphology and gene semantics independently, the framework integrates nuclei-aware spatial modeling to preserve tissue microarchitecture, cross-modal alignment through iterative vision-language prompt learning, and dynamic gene interaction modeling to capture relationships among gene descriptions. This comprehensive design enables robust generalization to unseen genes and provides clinically interpretable predictions, demonstrating strong potential for practical biomedical applications.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    First, it introduces a deep association learning paradigm that includes a dynamic mapping between tissue morphology and gene semantics, allowing gene features to adaptively optimize based on each tissue slide rather than overfitting to seen gene types. Second, it models gene-gene dependencies through semantic relevance, enabling context-aware embeddings that capture functional relationships between genes. Additionally, the framework enhances image feature extraction by incorporating both tissue-wide structural information and fine-grained nuclei distributions, which are closely tied to spatial gene expression. Together, these components establish a biologically meaningful and dynamically adaptive approach that outperforms existing zero-shot methods like SGN while maintaining competitive performance with supervised models and extending prediction capabilities to previously unseen gene types.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    In clinical practice, it is very important to predict gene expression directly from pathological images, and this manuscript also proposes an effective method. However, the research in this manuscript is not sufficient, and more recent methods should be compared to highlight the advantages of the model.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This manuscript has a reasonable experimental design and clear expression, and provides a deep association multimodal framework for zero-shot gene expression prediction. However, the only drawback is that the overall investigation is slightly insufficient.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

N/A




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top