List of Papers Browse by Subject Areas Author List
Abstract
The groundbreaking development of spatial transcriptomics (ST) enables researchers to map gene expression across tissues with spatial precision. However, current next-generation sequencing methods, which theoretically cover the entire transcriptome, face limitations in resolving spatial gene expression at high resolution. The recently introduced Visium HD technology offers a balance between sequencing depth and spatial resolution, but its complex sample preparation and high cost limit its widespread adoption. To address these challenges, we introduce HISTEX, a multimodal fusion approach that leverages a bidirectional cross-attention mechanism and a general-purpose foundation model. HISTEX integrates spot-based ST data with histology images to predict super-resolution (SR) spatial gene expression. Experimental evaluations demonstrate that HISTEX outperforms state-of-the-art methods in accurately predicting SR gene expression across diverse datasets from multiple platforms. Moreover, experimental validation underscores HISTEX’s potential to generate new biological insights. It enhances spatial patterns, enriches biologically significant pathways, and facilitates the SR annotation of tissue structures. These findings highlight HISTEX as a powerful tool for advancing ST research. Our source code is available at: https://anonymous.4open.science/r/HISTEX-42AD.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1633_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/wenwenmin/HISTEX
Link to the Dataset(s)
N/A
BibTex
@InProceedings{XueShu_Inferring_MICCAI2025,
author = { Xue, Shuailin and Wang, Changmiao and Fan, Xiaomao and Min, Wenwen},
title = { { Inferring Super-Resolved Gene Expression by Integrating Histology Images and Spatial Transcriptomics with HISTEX } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15972},
month = {September},
page = {296 -- 305}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper proposes HISTEX, which integrates spot‑based ST data with histology images to predict super‑resolution spatial gene expression. The framework first applies linear interpolation to the low‑resolution gene data, then performs cross‑attention with the image features, and finally uses multiple instance learning (MIL) to aggregate the high‑resolution predictions during training. Experiments on multiple samples demonstrate the strong performance of the proposed method compared with the baseline methods.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1.The proposed HISTEX outperforms all baseline methods. 2.In the “Insights from Downstream Analysis” section, the authors provide clear visualization figures for downstream analyses.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
Limited generalizability: performing cross‑modal attention makes sense, but it requires each sample to have low‑resolution gene data. In previous methods like iStar, the model can use only the input image to directly predict high‑resolution data at test time. While incorporating low‑resolution data during training improves performance, it limits the method’s generalizability.
-
The performance metrics, such as RMSE and SSIM, are very high. Could this be because a large proportion of the high‑resolution data are zero? In standard gene expression prediction tasks, the Pearson Correlation Coefficient (PCC) is used as the evaluation metric. Why didn’t the authors use PCC here?
-
The shared code does not include the evaluation script, so it’s unclear how SSIM is calculated.
-
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
See the strengths and the weaknesses.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
The paper introduces HISTEX, a novel multimodal fusion model that integrates spatial transcriptomics (ST) data with histology images to predict super-resolution (SR) spatial gene expression. The method employs a bidirectional cross-attention mechanism for deep integration of gene expression and histological features, and utilizes multiple instance learning (MIL) to optimize the model without requiring SR-level labels. HISTEX demonstrates superior performance in accurately predicting SR gene expression across various datasets compared to state-of-the-art methods. It also provides new biological insights by enhancing spatial patterns, enriching biologically significant pathways, and facilitating the SR annotation of tissue structures.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Novel approach: The introduction of a bidirectional cross-attention (BCA) mechanism for integrating multimodal data is innovative and addresses the limitation of existing methods that rely on single data sources. Strong evaluation: HISTEX outperforms existing state-of-the-art methods in terms of root mean square error (RMSE) and structural similarity index measure (SSIM) across multiple datasets. Biological insights: The method enhances spatial gene expression patterns and identifies new significant pathways, providing valuable insights for biological research. Reproducibility: The authors provide a link to the source code, which facilitates reproducibility and further research.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
While the paper demonstrates strong performance, it could benefit from a more detailed comparison with other multimodal fusion techniques beyond the ones mentioned. The paper could expand on potential limitations or challenges in applying HISTEX to different types of tissue samples or varying quality of histology images.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
HISTEX represents a significant advancement in the field of spatial transcriptomics, offering a robust solution for super-resolution gene expression prediction. Future work could explore its application across a broader range of biological contexts and investigate its utility in clinical settings.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a strong methodological contribution with the introduction of HISTEX, which effectively addresses the limitations of current ST technologies. The integration of a novel bidirectional cross-attention mechanism and multiple instance learning framework enhances its predictive capabilities. The availability of the source code further supports its reproducibility and potential impact in the field. Despite minor areas for improvement, the paper’s contributions and findings warrant a high acceptance score.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The authors introduce HISTEX, a multimodal fusion approach that employs linear interpolation to generate high-density gene expression and a general-purpose foundation model to extract features from histology images. A bidirectional cross-attention mechanism is applied to integrate the high-density gene expression with histology features, enabling the prediction of super-resolution spatial gene expression through multiple instance learning.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The approach is innovative.
- The method surpasses state-of-the-art techniques.
- Insights are offered through downstream analysis.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
It would be beneficial to provide better contextualization regarding the methods that HISTEX is compared against.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Novelty and quality of exposition
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #4
- Please describe the contribution of the paper
The authors introduced HISTEX, a multimodal fused image processing approach that combines Spatial transcriptomics image and histological image to predict super resolution spatial gene expression. which addresses the low-resolution limitation of spatial transcriptomics with a few key innovations:
- perform linear upsampling to match ST data with the scale of histological data
- Using a large pretrained foundation model (UNI2, a ViT backbone) to extract histological features
- Use a bidirectional cross-attention module to fuse information from the two modalities
The authors demonstrated that the method outperforms the state of the art by a large margin - perform linear upsampling to match ST data with the scale of histological data
- Using a foundation model (UNI2, a ViT backbone) to extract histological features
- Use a bidirectional cross-attention module to fuse information from the two modalities
The authors demonstrated that the method outperforms the state of the art by a large margin
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The authors successfully bridged two vastly different modalities: a low spatial-res gene expression map with high-resolution histological maps by introducing cross attention mechanism, the intuition makes sense and engineering execution was good
- The methodology could inspire future cross-modality imaging algorithm developers for other problems like light/EM microscopy etc.
- Infered super-resolution pathological segmentation has high potential clinical value
- Thorough and convincing ablation studies that clearly demonstrate how much contribution each components added to the system
- Open-sourced and well-documented
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The authors jumped into discussing deploying BCA in the context of ST as a novel contribution, but the idea of using BCA in multimodality fusion is not entirely new. It would be nicer if the authors have included more background and references on similar ideas explored in other domains, to justify choosing BCA over other alternatives.
potentially relevant prior works:
LXMERT: Learning Cross-Modality Encoder Representations from Transformers https://arxiv.org/abs/1908.07490
A Subabdominal MRI Image Segmentation Algorithm Based on Multi-Scale Feature Pyramid Network and Dual Attention Mechanism. CoRR abs/2305.10631 (2023)
And potentially others like in Lidar+Camera etc.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(6) Strong Accept — must be accepted due to excellence
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Solid contribution integrating advancements in vision transformers into crucial biological problems.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank the AC and reviewers for the positive feedback. To Reviewer #1 (1) Comparison with other multimodal fusion techniques The integration of histological images and spatial transcriptomics data via multimodal fusion is still a challenging and relatively underexplored area. In future studies, we will dedicate efforts to advancing fusion strategies for this specific application. (2) Explanation of the limitations of HISTEX Due to space limitations, we will further elaborate on the limitations and potential challenges of HISTEX in an extended version of this work. (3) Exploration of applications in biological contexts In fact, we have conducted extensive downstream analyses to validate the biological utility of HISTEX. However, due to space limitations, these results were not included in the current manuscript. We plan to present these downstream analyses in detail in a future extended version of this work. To Reviewer #2 (1) Background and references on similar ideas about BCA We agree that bidirectional cross-attention has been explored in other multimodal fusion settings (e.g., LXMERT, DeepFusion). These references are highly relevant, and we will incorporate them into the revised version to better situate our work within the broader context of multimodal fusion. However, due to space limitations, a more comprehensive background on the use of BCA and other multimodal fusion strategies in various domains—as well as our justification for selecting BCA—will be provided in a future extended version of this work. To Reviewer #3 (1) Contextualization of the compared methods While we have conducted an extensive review and summary of the baselines, due to space limitations, a more comprehensive contextualization of the compared methods has been omitted in this version. We plan to include a more detailed discussion in a future extended version of this work. To Reviewer #4 (1) Limited generalizability In fact, iStar also utilizes spot-based spatial transcriptomics (ST) data during training, albeit in a limited way, as pseudo-labels. Our method explicitly aims to enhance low-resolution sequencing-based ST data, where such spot-level data are naturally available and should be leveraged. Therefore, we believe that the effective utilization of low-resolution data during training is not only justified but also essential for the task we address. (2) Very high performance metrics and missing PCC evaluation The relatively high RMSE and SSIM values are mainly due to the benchmark setting we followed from prior works (e.g., iStar, scstGCN), where both the ground truth and the predicted super-resolution gene expression profiles are normalized to the [0, 1] range. This normalization affects all methods, resulting in generally high metric values. We will clarify this normalization procedure in the revised version. PCC is commonly used in gene expression prediction tasks, but it is less frequently applied in tasks with extremely high data dimensionality, such as super-resolution reconstruction. Super-resolution gene expression profiles typically consist of hundreds of thousands of bins, and PCC is highly sensitive to sparse expression data. In fact, we calculated PCC to evaluate the performance of HISTEX and other methods. Under super-resolution settings, HISTEX performs only slightly better than baseline methods such as iStar. However, as the level of resolution enhancement decreases, the sparsity of the ground truth expression gradually diminishes, and HE2Exp demonstrates a higher average gene-level PCC. Due to space limitations, the above experimental results were not included in the manuscript. We plan to present PCC experimental results in a future extended version of this work. (3) The shared code does not include the evaluation script We have now updated the shared code repository to include the evaluation script used for computing RMSE and SSIM.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A