Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Multiplex tissue imaging (MTI) is a powerful tool in cancer research, allowing spatially resolved, single-cell phenotype analysis. However, MTI platforms face challenges such as high costs, tissue loss, lengthy acquisition times, and complex analysis of large, multichannel images with batch effects. To address these challenges, we propose a novel computational method to model the interactions between dozens of panel markers and Hematoxylin \& Eosin (H\&E) staining, enabling {\it in-silico} generation of marker stains. This approach reduces the reliance on experimentally measured markers, bridging low-cost H\&E data with MTI’s high-content information. Our approach uses a two-stage framework for channel-wise bioimage synthesis: first, vector quantization learns a visual token vocabulary, then a bidirectional transformer infers missing markers through masked language modeling. Comprehensive benchmarking across different MTI platforms and tissue types demonstrates the effectiveness of our method in improving marker prediction while maintaining biological relevance. This advance makes high-dimensional multiplex tissue imaging more accessible and scalable, supporting deeper insights and potential clinical applications in cancer research.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4789_paper.pdf

SharedIt Link: https://rdcu.be/eHaVI

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04965-0_26

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{SimZac_Language_MICCAI2025,
        author = { Sims, Zachary AND Govindarajan, Sandhya AND Mills, Gordon B. AND Eksi, Ece AND Chang, Young Hwan},
        title = { { Language of Stains: Tokenization Enhances Multiplex Immunofluorescence and Histology Image Synthesis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},
        page = {274 -- 284}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes the use of H&E images to enhance multiplex immunofluorescence and histology image synthesis. It introduces MaskGIT with a novel channel-wise tokenization to model interactions between markers and H&E images. Experimental results demonstrate the effectiveness of the proposed framework, outperforming existing MAE-based methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The integration of H&E images to improve multiplex immunofluorescence image synthesis is a compelling contribution.
2. The proposed channel-wise tokenization is highly suitable for multiplex immunofluorescence images and demonstrates strong empirical effectiveness.
3. The paper is well-written, clearly structured, and easy to follow.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The generation results are heavily dependent on the tokenizer, yet critical details about its training process (e.g., train/test split) and evaluation (e.g., reconstruction performance) are not provided. Additionally, it remains unclear whether the scale of the VQGAN model impacts performance, which warrants further investigation. 2. The paper employs Spearman Correlation and SSIM to evaluate performance but does not justify the choice of these metrics. Why not use other metrics, like PSNR etc. ?
2. The tokenized channel embeddings comprise three components (position, marker, and token). Given that the token embeddings derived from VQGAN inherently preserve spatial information despite being reshaped into flattened vectors, the necessity of applying position embeddings to different token embeddings extracted from a single immunofluorescence image is unclear. An ablation study on the embeddings should be included to clarify their individual contributions.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Based on the strengths and weaknesses outlined, I recommend a weak accept. The final score will depend on the rebuttal.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper tackles challenges in multiplex tissue imaging by proposing a two-stage computational framework. First, image channels are discretized via vector quantization. Next, a bidirectional transformer—in the spirit of BERT—is used in a masked language modeling approach to impute missing imaging channels. Incorporating Hematoxylin & Eosin staining provides additional structural context, while experiments on colorectal and prostate cancer datasets using their proposed channel selection procedure support its potential to reduce costs and enhance scalability.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The method effectively integrates complementary information from H&E images, enhancing marker prediction while preserving spatial tissue context.
2. Comprehensive benchmarking against baseline models, using metrics such as Spearman correlation and SSIM, demonstrates the model’s robust performance across diverse marker configurations.
3. The Methods section is overall well written, clearly conveying the core ideas of the paper. It strikes a good balance between technical detail and accessibility, making it understandable for both experts and newcomers.
4. The section on IPS and rIPS is particularly strong, with both strategies clearly explained and evaluated. The discussion highlights their respective advantages and limitations effectively.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Ambiguous terminology and notations—such as the term “center cell region” and the labeling of E_ components in Figure 1—require clarification. Additionally, some methodological details are insufficiently described, making it difficult for readers to fully understand the proposed approach.
2. Interpretability concerns in Figure 4: While tokenization is shown to enhance embedding similarities, purportedly reflecting biologically meaningful relationships, the clustering of CD20 (a B cell marker) with CD4 and CD8 (T cell markers) contradicts established immunological knowledge. This raises concerns about the biological validity of the learned representations.
3. Real-world applicability: The study primarily emphasizes computational benchmarks, with limited evaluation of practical relevance. Incorporating cell type annotations would strengthen the analysis by demonstrating how well the predicted expressions support biologically meaningful downstream tasks, such as cell type classification.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

I appreciate the presentation of results in Figure 2, which effectively conveys the model’s performance to the reader. However, Figure 3 would benefit from additional explanation, as its complexity makes it difficult to interpret—especially for readers without a biological background. Further clarifying the meaning of the red and blue colors would help readers better understand the information being conveyed. Furthermore, it is important to discuss why T cell and B cell markers appear to have a high degree of association in this figure, despite being biologically distinct. CD3 shows strong association with CD45Ro but not with CD45 also warrants further explanation.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well written overall, with a particularly strong Methods section, and it demonstrates solid performance aligned with the scope of the conference. However, to be suitable for publication, some figures require refinement, and certain aspects of the results would benefit from further clarification.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper
This paper proposes a two-step framework for imputing missing channels in mIF images. The approach involves tokenizing individual image channels, prioritizing marker channels based on prediction difficulty, and applying masked language modeling to predict marker intensities in absent channels. Specifically, the authors incorporate two approaches:
1. Vector Quantization: break down images into a vocabulary of visual “tokens” – use a sort of autoencoder for images (VQGAN)
2. Marker Imputation via Contextual Prediction: A masked language modeling framework (inspired by BERT) learns to predict missing markers based on the context provided by visible ones. The model shares a single codebook across all image channels, enabling it to capture relationships between different markers and learns which markers can be predicted from the others and thus can be removed from the assays, allowing for fewer runs of staining cycles.
The optimal set of input markers is then determined based on the Iterative Panel Selection (IPS) algorithm (greedy algorithm) or reverseIPS (rIPS). The authors evaluate performance across different combinations of features and demonstrate that incorporating H&E staining consistently enhances imputation accuracy. Furthermore, the learned embeddings show biologically meaningful patterns, reflecting marker co-expression across distinct cell types.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The framework introduces a creative application of masked language modeling to the domain of multiplex image imputation.
- The tokenized embeddings capture co-expression relationships among markers, aligning well with known cellular phenotypes.
- The model has the potential to identify markers that can be used to predict the expression of other markers. This reduces the number of markers that need to be experimentally stained, leading to savings in time, cost, and tissue consumption.
- Learning one representation space across all markers encourages cross-marker understanding and compresses the model.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The model may show bias toward highly correlated markers—such as those co-expressed across a large number of cells—while having reduced sensitivity for markers expressed in small, distinct cell populations. It would be beneficial to report the variance in prediction performance across channels and to assess whether sparsely expressed markers are less accurately predicted.
- To further evaluate the model’s biological relevance relative to potential technical variation, cross-validation across datasets is recommended. For example, applying a model trained on the prostate dataset to impute shared markers in the CRC dataset would help assess the model’s generalizability and robustness.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a novel implementation of masked language modeling, which is a valuable contribution to the field of multiplex image analysis.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

R1.1: We agree that reporting per-marker prediction performance would better summarize the model’s capabilities. Due to space constraints, we could not include a full table. However, comparing randomly selected panels (Table 1) to optimized panels (Table 2) shows only a modest performance drop when predicting more challenging markers (0.77 to 0.73 ± 0.02 for 14 markers). R1.2: We acknowledge that cross-tissue inference is a potential limitation. Evaluating model performance across datasets (e.g., training on prostate and testing on CRC) is an important next step, which we plan to address in future work. R2.1: We agree that generation performance depends on the tokenizer (VQGAN). In line with prior masked modeling approaches, we excluded tokenizer performance metrics to maintain focus on the language model (LM). We used the original VQGAN (MaskGIT) and trained it on a subset of the LM training data. On the same test set, reconstruction Spearman correlation and SSIM across all channels (IF and deconvolved H&E) are 0.98 and 0.78, respectively. These results indicate that the tokenizer provides high-fidelity representations, and improvements in tokenizer design can further benefit the LM – something we plan to explore in future work. R2.2: Spearman correlation compares mean marker intensities between ground truth and reconstructions, preserving rank relationships across cells. Since staining intensity varies across batches, ordinal metrics are more appropriate than absolute ones. SSIM complements this by assessing structural similarity and intensity patterns, which is valuable in virtual staining and denoising tasks. Both metrics are commonly used in the literature (e.g., Burlingame et al. 2021, Ternes et al. 2022). PSNR, while useful for natural images, is less suited to highly variable biomarker distributions. R2.3: To clarify, we do not use the VQGAN codebook vectors as embeddings; instead, we use the codebook indices as discrete tokens and learn a new embedding space within the transformer. Thus, positional embeddings are critical to convey spatial structure. Attention heads in the model consistently capture intra-positional relationships across channels, indicating that spatial context is effectively utilized. R3.1: Figure 1 is adapted from the BERT paper (Fig 2) to illustrate our embedding strategy. To improve clarity, we will revise “mean intensity within the center cell region” to “mean intensity of the cell at the center of the image.” Additional notational clarifications will be added. R3.2: We agree that B and T cell markers are expected to form distinct clusters. One potential explanation the observed similarity is spatial co-localization of T and B cells in the images, especially given that each input may contain multiple cells. This is supported by Lin et. al. 2023, where topic modeling revealed co-expression patterns involving CD20, CD3 and CD4, suggesting that these cell types may frequently appear in proximity within our data (Lin et al. 2023). R3.2: We appreciate the suggestion to include downstream tasks. Due to space limitations, we focused on marker imputation as a proxy for pretraining quality. We plan to explore fine-tuning applications, such as cell typing, in future work. R3.3: Thank you for the thoughtful feedback. We agree that some relationships may seem spurious without further investigation. We include this figure to ultimately show that our tokenization approach allows for a richer feature space to be learned. We will revise the text like so: “We found that the baseline MAE (left) captures relatively weak interactions between different biomarkers compared to our VQGAN+BERT model (middle). The increase in positive similarity (red) achieved by our new approach implies that relationships are more easily identifiable in latent token space than in pixel space.”

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

his paper presents a novel and promising approach applying language modeling techniques to mIF channel imputation, effectively leveraging H&E data. It is well-motivated and shows strong initial results. The recommendation leans towards acceptance. The authors need to address several key points in their final version: provide more details and justification regarding the tokenizer and embedding components, justify metric choices, discuss potential biases and limitations (e.g., rare markers), clarify terminology/figures, and address the biological interpretability concerns raised regarding specific figure results.

back to top

Language of Stains: Tokenization Enhances Multiplex Immunofluorescence and Histology Image Synthesis

Author(s):