Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Spatial transcriptomics (ST) is crucial for understanding cellular heterogeneity and tissue organization. However, integrating spatial transcriptomics across multiple slices remains challenging for downstream analyses, as ST slices may exhibit significant batch effects. Current methods mostly depend on forced integration via contrastive learning, which may ignore the inherent biological heterogeneity, thus impacting the performance of downstream analyses. To address these challenges, we introduce MoST-IG, a multimodal framework for morphology-guided multi-slice ST integration. MoST-IG comprises two key components: (1) Cross-modal alignment between histology prior and ST: we integrate histological patterns derived from the pathological foundation model with ST using our proposed Visual-Genomic Graph Optimal Transport (VG-GOT) module. This visual-genomic alignment preserves biological heterogeneity through morphology-guided regularization while enriching the spatial context of ST data with morphological features to provide a more discriminative representation and enhance downstream performance. (2) Integration of Multi ST-Slices: a multi ST-slices Contrastive Learning (mST-CL) module is designed via two complementary triplet losses—one for both inter-slice and intra-slice cell type mapping. Experiments show that MoST-IG outperforms leading methods in both cancer grading and detection, as well as tissue structure clustering, while better preserving tissue landmarks in multi-slice ST integration.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2403_paper.pdf

SharedIt Link: https://rdcu.be/eHc7f

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05162-2_47

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/HKU-MedAI/MoST-IG

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YuLit_MoSTIG_MICCAI2025,
        author = { Yu, Liting AND Ma, Tao AND Zhao, Weiqin AND Liang, Zhuo AND Yu, Lequan},
        title = { { MoST-IG: Morphology-Guided Spatial Transcriptomics Integration via Visual-Genomic Graph Optimal Transport } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {490 -- 499}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a morphology-guided framework for integrating multiple spatial transcriptomics (ST) slices, leveraging both gene expression and histological information. The paper introduces a module named VG-GOT, which aligns the embeddings of histological and gene expression features at the node level (local) and graph level (global) to prevent overmixing during ST integration. The proposed method demonstrates significant improvements in two downstream tasks—cancer grading and healthy tissue clustering—as evidenced by Table 1 and Table 2 results.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed morphologically guided framework effectively combines two-level visual-genomic alignment with inter- and intra-slice contrastive learning to preserve global tissue structure and local cell-type distinctions. This synergy leads to robust, biologically meaningful representations across diverse spatial transcriptomics slices, as shown by the performance gain in Tables 1 and 2.

The paper presents a thorough evaluation across four public datasets based on three different evaluation metrics to establish the superiority of the approach.

Experiments also include ablation on the effect of the morphology module, clearly demonstrating the need for morphological features.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While the overall framework is well-structured, the key modules, contrastive learning, graph-based embedding, and optimal transports have been derived from existing techniques. The contribution lies primarily in combining these elements rather than introducing fundamentally new algorithms or mechanisms, which may limit the methodological novelty. Particularly, alignment with H&E images was proposed in DeepST [1] , triplet loss was introduced in STAligner [2], the Graph Attention Network was used in SpaDAC [3], and the Optimal Transport alignment algorithm was introduced in Spiral [4]. Moreover, the paper lacks direct, component-wise comparisons against these individual techniques, making it difficult to assess each module’s isolated contribution and justify the superiority of the combined framework.

While the paper includes ablations for morphological features, the paper does not provide any ablation for two-level alignment, the inter- and intra-slice contrastive loss, or uniform loss.

In Table 1, the “w/o Morph” ablation of MoST-IG has performed worse than STAligner, suggesting that the core contrastive component alone is not superior to prior methods. However, the authors do not evaluate a variant of STAligner enhanced with morphology features, which would be the fairest baseline for isolating the benefit of their visual-genomic alignment module.

[1] Xu, C., Jin, X., Wei, S., Wang, P., Luo, M., Xu, Z., Yang, W., Cai, Y., Xiao, L., Lin, X. and Liu, H., 2022. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Research, 50(22), pp.e131-e131. [2]Zhou, X., Dong, K. and Zhang, S., 2023. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nature Computational Science, 3(10), pp.894-906 [3] Huo, Y., Guo, Y., Wang, J., Xue, H., Feng, Y., Chen, W. and Li, X., 2023. Integrating multi-modal information to detect spatial domains of spatial transcriptomics by graph attention network. Journal of Genetics and Genomics, 50(9), pp.720-733 [4] Guo, T., Yuan, Z., Pan, Y., Wang, J., Chen, F., Zhang, M.Q. and Li, X., 2023. SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biology, 24(1), p.241
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The authors have proposed a well-motivated framework for multi-slice ST integration using morphology-guided alignment and contrastive learning, with strong performance across tasks. However, the core modules draw heavily from prior work, DeepST (morphology), STAligner (triplet loss), and SPIRAL (optimal transport), and the novelty lies primarily in combining them.

The ablation study is limited to morphology, with no analysis of the individual impact of node vs. graph alignment, contrastive components, or uniformity loss. Additionally, STAligner with added morphology has not been evaluated, which would be a more fair baseline. Addressing these points would clarify the contribution and better validate the framework’s effectiveness.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the paper introduces a cohesive and practically relevant framework for multi-slice ST integration, its core components, including contrastive learning, optimal transport, and graph-based attention networks, are drawn from prior work, with the primary novelty being their integration rather than the introduction of fundamental new mechanisms.

The ablation study is incomplete. Critical components such as the two-level alignment, contrastive losses, and uniformity loss are not individually evaluated. The “w/o Morph” baseline underperforms compared to STAligner, and no variant of STAligner with morphology is tested, limiting the strength of the claimed improvement.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper propose MoST-IG to integrate ST across multiple slices. By combining multi ST-slices contrastive integration and visual-genomic Graph optimal transport alignment, MoST-IG effectively integrates multi ST-slices, mapping identical cell types from different slices together while preserving biological heterogeneity. The learned robust representation is utilized in several downstream tasks and obtains the best performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The paper has a clear starting point, improving upon previous methods by incorporating pathological prior knowledge for spatial transcriptomics (ST) integration. (2) The proposed VG-GOT innovatively introduces two optimal transport (OT) algorithms to achieve node-level distribution alignment and graph topological structure alignment, thereby addressing the “over-mixing” issue in existing methods caused by neglecting histological semantics. (3) The Uniformity Loss proposed in the paper is novel, enforcing all slice embeddings to follow a uniform distribution in the latent space to mitigate batch effects while balancing global alignment and local structure preservation through synergistic effects with contrastive learning. (4) The experiments are comprehensive, demonstrating the generalizability and superiority of the proposed method through multiple tasks and evaluation metrics.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

(1) Lack of experimental details: How is the entire model trained? Please provide the training hyperparameters. Is it self-supervised training? (2) In the left-side experiment of Table 1, the cLISI metric of the proposed method is 1. Can this be simply interpreted as 100% accuracy in the classification task? Is this result accurate? (3) In the “Modality-specific Graph Construction” subsection, what is the origin of the similarity threshold formula? In the “Two-level Alignment via VG-GOT” subsection, citations should be added for the two distance functions, along with a brief explanation of why different types of distance functions are used. (4) There is an error in the formula description below Equation 2:”$z_{neg}$ a negative randomly sampled”. (5) At which dataset was the discriminator in Equation 3 pre-trained, and what is its network architecture? (6) Limited Novelty in Formulation: While the VG-GOT module and mST-CL module are presented as novel contributions, the use of graph-based methods and contrastive learning in spatial transcriptomics is not entirely new. Previous works, such as those leveraging graph neural networks (GNNs) or optimal transport for biological data integration, have explored similar approaches. For example, studies like “SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network” (Nat Methods, 2021) have already utilized GNNs for spatial transcriptomics. (7) Scalability Challenges: The proposed framework involves computationally intensive steps, such as graph optimal transport and contrastive learning, which may not scale efficiently to very large datasets or high-resolution spatial transcriptomics data. The paper does not provide detailed discussions or benchmarks regarding the computational cost, which could limit its applicability in large-scale
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Innovation: The paper introduces a novel multimodal framework (MoST-IG) that integrates Visual-Genomic Graph Optimal Transport (VG-GOT) and multi-ST slice Contrastive Learning (mST-CL), demonstrating significant innovation in the integration of spatial transcriptomics. Experimental Results: The experimental results indicate that this method outperforms existing approaches in cancer grading and tissue structure clustering tasks, showing significant performance improvements, but it lacks detailed experimental procedures and the accuracy of the results requires further verification. Computational Complexity: However, due to the involvement of multimodal alignment and complex learning modules, the method may lead to high computational complexity. Further evaluation of its scalability and efficiency in practical applications is needed.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors’ responses have answered my concerns, and I improve my score to accept.

Review #3

Please describe the contribution of the paper

This paper proposes MoST-IG, a multimodal framework for integrating spatial transcriptomics (ST) slices using morphology-guided spatial alignment and multi-slice contrastive learning. The Visual-Genomic Graph Optimal Transport (VG-GOT) module aligns histological and genomic data, preserving biological heterogeneity while enhancing spatial context in the ST data. The multi ST-slices Contrastive Learning (mST-CL) module utilizes triplet losses to effectively integrate genomic data across multiple slices, improving both intra-slice and inter-slice cell type mapping. The proposed method is validated on various cancer and healthy tissue datasets, demonstrating superior performance in cancer grading, tissue structure clustering, and biological pattern preservation compared to existing integration methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The Visual-Genomic Graph Optimal Transport (VG-GOT) module is a novel and effective approach for aligning histological and genomic data. This ensures that both genomic and morphological features are accurately integrated, preserving biological heterogeneity and improving downstream performance.
- The multi-slice contrastive learning (mST-CL) module, using complementary triplet losses for inter-slice and intra-slice cell type mapping, effectively integrates genomic data across multiple slices while maintaining distinct tissue features, which is crucial for accurate cancer grading and tissue structure clustering.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- While the model uses optimal transport and graph-based methods, which can be computationally expensive, the paper does not report runtime, memory usage, or scalability.
- The performance of Triplet Loss is sensitive to the balance between positive and negative samples. However, the paper does not provide details on how these ratios were handled, leaving questions about whether class imbalance or sample distribution issues were addressed. Without clear sampling strategies, the method may be affected by biases in positive and negative sample selection, potentially impacting performance.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a thoughtfully integrated multimodal framework that outperforms prior methods.
Reviewer confidence

Not confident (1)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I keep my original score.

Author Feedback

We thank reviewers’ positive comments. We now address specific concerns raised by individual reviewers:

Q1: Further clarify our novelty and contribution (R1&R3) A1: Compared to previous works, our main contribution/novelty is the first to identify the twin pitfalls of over- and under-mixing in ST integration and balance them using both contrastive and uniform constraints and morphology guidance (Mor.). Although prior methods (SpaGCN, DeepST) also used morphology features, they only treated them as an augmentation to spot features, while we align these two modalities to prevent over-mixing.

In addition, we need to clarify that the incorporation of Mor. to prevent over-mixing is not trivial. Two naive Mor. STAligner variants - (a) network embedding fusion (from SpaDAC) and (b) using Mor. similarity as edge weights to enhance gene data (from DeepST), shows worse performance than our method (ARI on DLPFC: a vs b vs ours is 0.609 vs 0.610 vs 0.614). This further indicates the effectiveness of our VG-GoT module to align morphology with gene profiles at both node and edge levels.

Q2: Ablations (R3) A2: (1) Both contrastive and uniform losses are effective: on the prostate dataset, removing the contrastive loss drops ARI from 0.375 to 0.197, and removing the uniform loss lowers ARI to 0.227, demonstrating each module’s contribution. (2) MoST-IG w/o Mor. outperforms STAaligner in some cases (breast ARI:0.170 vs 0.163) but not in others (DLPFC ARI:0.569 vs 0.607). This is probably because of the previously ignored over-mixing problem, which further underscores the critical role of morphology guidance in our work, as illustrated in Q1.

Q3: Scalability (R1&R2) A3: Our method maintains excellent scalability. During training, we run VG-GOT on a random mini-batch of 2,048 nodes each epoch; during inference, OT is not recomputed. With this strategy, the large prostate dataset (23,282 spots) is trained in 10 mins, and uses only 3 GB of VRAM during test—far faster than SPIRAL (several hours) and without the out-of-memory failures encountered by DeepST.

Q4: Sampling strategy for triplet loss (R1&R2） A4: For each anchor spot, we select one positive spot from every other slice and simultaneously pick a random negative spot from the anchor’s own slice. This fast negative-sampling strategy avoids positive–negative imbalance and ensures scalability (R1W7).

Q5: Training details (R1-W1&5) A5: The entire model is optimized using self-supervised contrastive, uniformity, and VG-GOT losses, and the key hyperparameters are: GAT has 4 layers (dims=[512,30,512,n_input]); Adam optimizer’s learning rate: 0.001; discriminator: an MLP (hidden_dim=128). In addition, the architecture of the discriminator is a 2-layer MLP, and it’s pretrained on the same ST dataset to be integrated.

Q6: Other technique details (R1) A6: (1) The evaluation metric cLISI ∈[1, N_celltype] represents the median neighborhood purity of spots. A cLISI value of 1 does not indicate 100 % accuracy; rather, it means that at least 50% of spots have neighborhoods containing only spots with the same label. The result is correct, and we will add the above clarification about cLISI in our paper. (2) In the Modality-specific Graph Construction section, the similarity threshold is taken from “Multistain Pretraining for Slide Representation Learning in Pathology.” The two distance functions are node- and topological-level cosine distance, which are widely used in previous GOT works. (3) “$z_{neg}$” denotes a randomly sampled negative example.

Q7: Code & data availability, Reproducibility (R1&R2&R3) We’ll release our processed data and code in the final version.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

MoST-IG introduces a multimodal framework that aligns histological and genomic features across multiple spatial transcriptomics slices using visual-genomic graph optimal transport and multi-slice contrastive learning. Reviewers initially praised its clear motivation and strong performance but raised concerns about experimental details, novelty, and ablations. After the rebuttal, most reviewers upgraded to an “Accept” recommendation, while one did not submit a final score, but the rebuttal covered most of their concerns. Given these clarifications, novel empirical gains, and thorough responses, I recommend accepting this paper. The authors are encouraged to include these clarifications from rebuttal in the final version of the manuscript.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

MoST-IG: Morphology-Guided Spatial Transcriptomics Integration via Visual-Genomic Graph Optimal Transport

Author(s):