Abstract

Spatial transcriptomics (ST) captures gene expression fine-grained distinct regions (\ie, windows) of a tissue slide. Traditional supervised learning frameworks applied to model ST are constrained to predicting expression of gene types seen during training from slide image windows, failing to generalize to unseen gene types. To overcome this limitation, we propose a semantic guided network, a pioneering zero-shot gene expression prediction framework. Considering a gene type can be described by functionality and phenotype, we dynamically embed a gene type to a vector per its functionality and phenotype, and employ this vector to project slide image windows to gene expression in feature space, unleashing zero-shot expression prediction for unseen gene types. The gene type functionality and phenotype are queried with a carefully designed prompt from a pre-trained large language model. On standard benchmark datasets, we demonstrate competitive zero-shot performance compared to past state-of-the-art supervised learning approaches. Our code is available at \url{https://github.com/Yan98/SGN}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2573_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2573_supp.pdf

Link to the Code Repository

https://github.com/Yan98/SGN

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yan_Spatial_MICCAI2024,
        author = { Yang, Yan and Hossain, Md Zakir and Li, Xuesong and Rahman, Shafin and Stone, Eric},
        title = { { Spatial Transcriptomics Analysis of Zero-shot Gene Expression Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a pioneering framework for zero-shot gene expression prediction using spatial transcriptomics, allowing for the prediction of gene types that were not seen during the training phase. This is achieved through a semantic guided network that integrates functionality and phenotype descriptions of gene types, derived from a pre-trained large language model. The framework is capable of dynamically embedding gene descriptions into a vector space, enabling the projection of tissue slide windows to gene expression in feature space, thus facilitating the prediction of unseen gene types effectively.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper on zero-shot gene expression prediction using spatial transcriptomics has several notable strengths that highlight its innovative approach and potential impact on the field:

    1. Novel Formulation: The introduction of a zero-shot learning framework for gene expression prediction is highly innovative. The approach extends the capability of spatial transcriptomics by enabling predictions for gene types not seen during training. This is particularly crucial for expanding the utility of genetic analysis tools without the need for exhaustive training data for every gene type.
    2. Original Use of Data: The paper leverages a pre-trained large language model (LLM) to generate functionality and phenotype descriptions of gene types, which are then used to project tissue slide windows into gene expression feature spaces. This use of natural language processing tools within a computational pathology framework is a novel and creative integration of multi-modal data sources (textual and visual data) for biological predictions.
    3. Demonstration of Clinical Feasibility: The ability to predict unseen gene types without retraining models with new genetic data demonstrates significant clinical and research utility. It suggests potential for rapid and scalable deployments in varied clinical settings, facilitating personalized medicine and genetic research with reduced computational and time costs.
    4. Technical Depth: The methodological detail, including the use of graph convolution networks to refine feature extraction from tissue slide windows and the integration of these features with dynamically embedded gene type descriptions, showcases a sophisticated approach to handling complex biological data. These strengths collectively underscore the paper’s potential to influence future research and applications in gene expression analysis, making it a valuable contribution to the fields of bioinformatics and computational pathology.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the paper on zero-shot gene expression prediction using spatial transcriptomics exhibits significant innovations, there are several areas where it could be improved or further elaborated:

    1. Generalizability and Robustness: The paper could benefit from a more detailed discussion on the generalizability of the proposed model across different types of tissues and conditions beyond the datasets used. Zero-shot learning models can sometimes overfit to specific features not applicable universally, which might limit their practical deployment in varied clinical settings.
    2. Dependency on Large Language Models (LLMs): The reliance on LLMs for generating gene type descriptions introduces a potential vulnerability concerning the accuracy and reliability of the generated content. LLMs, depending on their training data, might produce biased or inaccurate descriptions, impacting the prediction performance. Further validation of the LLM outputs and their impact on the prediction accuracy should be considered.
    3. Evaluation Metrics: While the paper shows promising results in terms of Pearson Correlation Coefficient (PCC), the evaluation on other metrics like Mean Squared Error (MSE) and Mean Absolute Error (MAE) reveals poorer performance. This might suggest that while the model captures relative changes well, its absolute predictive performance needs improvement. More comprehensive evaluation metrics that could balance both relative and absolute performance should be explored.
    4. Clinical Translation: The feasibility of integrating this zero-shot learning framework into existing clinical workflows has not been extensively discussed. Practical deployment issues such as computational requirements, integration with existing diagnostic tools, and user training are not addressed, which are crucial for real-world application.
    5. Comparison with State-of-the-Art Methods: The paper claims improvements over existing methods but lacks a deep comparative analysis with other cutting-edge approaches that might employ different strategies for handling unseen gene types. For instance, methods leveraging other forms of semi-supervised learning or transfer learning could provide a good benchmark to truly highlight the advantages of the proposed zero-shot approach.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.Technical Content and Novelty

    • Recommendation: The novelty of the zero-shot learning approach in spatial transcriptomics is commendable. However, it would strengthen the paper to discuss in more depth the technical limitations and potential biases of using LLMs for gene type description. References to prior work that have encountered similar challenges could be included to frame these discussions. 2.Experimental Validation
    • Recommendation: While the experimental results are promising, the evaluation could be expanded to include additional datasets, especially those representing a wider variety of conditions and tissue types. This would not only enhance the paper’s credibility but also its applicability to a broader range of clinical scenarios, addressing health equity by ensuring the technology is effective across diverse patient populations. 3.Clinical Translation and Impact
    • Recommendation: The potential for clinical application is a strong aspect of this work, yet the paper lacks a detailed discussion on how this technology could be implemented in a clinical setting. Issues such as integration with existing diagnostic workflows, the computational cost, and the training required for clinical personnel should be addressed. Real-world applicability, including regulatory considerations and patient safety, should also be discussed to advance the clinical translation of the methodology.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation to reject the paper, Here are the major factors that influenced this decision: 1.Gneralizability and Validation Concerns: One significant concern is the generalizability of the proposed zero-shot learning model. The paper presents results primarily from a limited set of conditions and datasets. To ensure the model’s applicability to a broader clinical context, it is crucial that the model is tested across more varied datasets, including those representing different tissue types and diseases. This would better demonstrate the model’s robustness and adaptability, key aspects in clinical applications. 2.Ethical and Privacy Considerations: Given the use of potentially sensitive genetic data, the paper would benefit from a more detailed discussion on how it handles data privacy and the ethical implications of its methodologies.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have satisfactorily addressed my previous concerns regarding generalization, robustness, and the thoughtful integration of large language models (LLMs). Their detailed rebuttal, which demonstrates effective testing on unseen patients and gene types, strongly supports their model’s generalization capabilities. Furthermore, their strategic use of the best-performing LLM to minimize biases, as explained in their response, shows a prudent approach to dealing with potential LLM vulnerabilities. The additional experimental validation and comparison with state-of-the-art methods they provided are convincing. Based on these comprehensive responses, I am confident in raising my evaluation score.



Review #2

  • Please describe the contribution of the paper

    This paper presents a novel approach for predicting gene expression in spatial transcriptomics data, utilizing domain knowledge priors from large language models to attempt zero-shot expression prediction for unseen gene types, achieving promising results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the paper is readable and represents a commendable application of current popular large language models (LLM); the methodology is technically sound. The structure of the paper is well-organised, and the experimental content is reasonably rich, though it lacks some content that readers might find interesting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Overall, the paper is readable and represents a commendable application of current popular large language models (LLM); the methodology is technically sound. The structure of the paper is well-organised, and the experimental content is reasonably rich, though it lacks some content that readers might find interesting.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    There should be more clear descriptions of technical implementation and dataset usage.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As concerns above

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Nice novelty, lack of interpretable results, lack of clear explanation of dataset usage.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces the Semantic Guided Network (SGN) approach for zero-shot gene expression prediction in spatial transcriptomics (ST). By dynamically embedding gene types based on their functionality and phenotype, SGN enables the prediction of gene expression for unseen gene types. The study designs a prompt to leverage a pretrained LLM for querying the description, given a gene type of interest. The obtained gene type is then used to project each window to the expression of the gene type in the feature space.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a novel approach, the Semantic Guided Network (SGN), enabling zero-shot prediction of gene expression for unseen gene types. The paper integrates natural language processing for gene type description querying, showcasing a comprehensive approach to address the research problem. Through experimentation on standard benchmark datasets, the paper demonstrates competitive performance compared to past state-of-the-art supervised learning approaches. The paper provides a clear and detailed description of the SGN framework.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper does not extensively discuss the scalability of the proposed method, particularly in terms of handling large-scale datasets or computational resources required for training and inference. There is lack of explicit discussion or strategy regarding the handling of noise in the expression maps generated by the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The paper’s structure is generally well-organized and the methodology section provides a comprehensive overview of the SGN framework.
    • Addressing noise in spatial transcriptomics data is critical for the accuracy and reliability of gene expression predictions. It would be beneficial to explicitly discuss the strategy employed to handle noise within the SGN framework.
    • Consider addressing scalability concerns and discussing any optimizations implemented to improve computational efficiency.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study introduces an interesting approach to extend the prediction ability to unseen gene types using a Semantic Guided Network. The paper provides a clear methodological description and potential impact. The study can benefic from additional analysis on the robustness to enhance the comprehensiveness of the paper.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely thank all reviewers for their comments and for acknowledging that our paper is well-organized [R1,R4], illustrates a novel [R1,R3,R4] and commendable [R1,R3] LLM application, and achieves promising [R1] and competitive [R3] zero-shot gene expression prediction performance compared to SOTA supervised learning approaches.

#R1.1: More technical implementation and dataset usage. In addition to implementation details in Sec. 1 (supplementary material) and dataset usage in Sec. 3 (main paper), we will open-source our code to facilitate research reproducibility. #R1.2: Interpretable results. Thank you for the suggestion. We will visualize embeddings of seen and unseen gene types.

#R3.1: Generalization, robustness, and experimental validation. We follow [9,22,2,1,23] for standard dataset validations. Moreover, testing on unseen patients and gene types has affirmed our generalization and robustness. #R3.2: Vulnerability and bias of LLM. Kindly note that the training data of LLM is usually hidden from the public, and the different SOTA LLMs have shown promising PCC@M performance, as seen in Fig. 4. Furthermore, we use the best-performing LLM to directly embed the retrieved reference without any LLM querying. It finds 0.258 PCC@M, which is 0.011 lower than our recommended LLM querying approach. #R3.3: Evaluation metrics. We use standard metrics for validations [2,1,22,23]. The prediction task is biased in capturing relative variation [22,23], which is the area in which our method excels. #R3.4: Clinic translation. The gene expression is directly predicted by giving a slide image with windows and gene types of interest. Please see #R3.6 and #R4.1. #R3.5: Comparison to SOTA. The latest SOTA method we compared was published in January 2024 [23]. The semi-supervised learning and transfer learning approaches have been studied by [22] (unsupervised exemplar retrieval and supervised gene expression prediction) and [9] (supervised pre-training on ImageNet-1K), respectively. Our zero-shot performance is competitive with them in PCC-related evaluation metrics.
#R3.6: Ethical and privacy considerations. For training, we use the published datasets [9, 10xProteomic] that have thoroughly addressed the ethical and privacy issues. Our method can be tested locally on a consumer-level GPU with at least 16GB memory under half-precision, addressing these considerations.

#R4.1: Scalability. When using 25%, 50%, and 100% of data from the STNet dataset, the PCC@M is 0.207, 0.240, and 0.269, respectively. Similarly, we test the scalability of our model, which has 0.269 PCC@M and 7.3M parameters, excluding PTExtractor and PTLLM parameters. Halving and doubling the parameters have PCC@M of 0.245 and 0.273, respectively. This investigates our data and model scalability. Benchmarking with an H100 GPU, the overall training takes 28 hours. For inference, we averagely take 1.482 seconds to obtain descriptions of a gene and 0.445 seconds for the remaining computations to infer on a slide image from the STNet dataset. Note that each slide image has approximately 450 windows on average on the STNet dataset. The training and inference can be done by a consumer-level GPU with 16GB memory under half-precision. Thank you, and we will include it in our revised version. #R4.2: Noise in ST data. We respond to the noise by applying GraphSAGE on windows connected by similar features and nearby positions, as those windows usually share similar gene expression and can be used to mitigate the noise (Fig. 2 of [23]). We will clarify it in our revised version. #R4.3: Reproducibility. Kindly see #R1.1.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers are inclined to accept this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers are inclined to accept this paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers agreed on novelty of the method in predicting zero-shot gene expression prediction in spatial transcriptomics, leveraging an LLM for projection of tissue slide windows to gene expression in feature space. All the major concerns raised by reviewers have been addressed in detail in the rebuttal including interpretability, dataset usage, experimental validation and generalizability, impact, addressing noise in spatial transcriptomics, as well as scalability. Paper appers well suited to MICCAI

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Reviewers agreed on novelty of the method in predicting zero-shot gene expression prediction in spatial transcriptomics, leveraging an LLM for projection of tissue slide windows to gene expression in feature space. All the major concerns raised by reviewers have been addressed in detail in the rebuttal including interpretability, dataset usage, experimental validation and generalizability, impact, addressing noise in spatial transcriptomics, as well as scalability. Paper appers well suited to MICCAI



back to top