Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Esophageal squamous cell carcinoma (ESCC) has high incidence and mortality rates. While immunotherapy shows promise for some ESCC patients, others can experience severe side effects. Accurate pre-screening of individual patients’ immunotherapy response to ESCC is a crucial but difficult task. Subtle differences in pre-treatment biomarkers hinder physicians’ judgment in pathological diagnosis. While pathological foundation models (PFMs) have shown potential in pathology image analysis, traditional PFMs focused on image-level features still struggle to capture nuanced preoperative characteristic differences. To address this, we propose a fine-tuning framework for PFMs based on the tumor microenvironment (TME). First, morphological and topological attributes are extracted from larger field-of-view patches to better analyze TME interactions. Next, we utilize PFMs which are typically constrained to small inputs to extract image features. To address this limitation, larger patches are subdivided to prevent precision loss, with trainable position encodings maintaining relative spatial positional relationships to guide the re-aggregation of large patch-level representations. Finally, a TME-guided learning algorithm trains all trainable layers to understand ESCC-specific characteristics. Our framework demonstrates superior performance in the downstream task of predicting ESCC immunotherapy response compared to those fine-tuned using self-supervised learning methods. By allowing flexibility in patch sizes, our approach captures more contextual information. Code is available at https://github.com/stoney03/ESCC.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2632_paper.pdf

SharedIt Link: https://rdcu.be/eHdUp

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04978-0_63

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/stoney03/ESCC

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LinYix_Tumor_MICCAI2025,
        author = { Lin, Yixuan AND Lin, Weiping AND Guo, Chenxu AND Yang, Xinxin AND Meng, Hongxue AND Wang, Liansheng},
        title = { { Tumor Microenvironment-Guided Fine-Tuning of Pathology Foundation Models for Esophageal Squamous Cell Carcinoma Immunotherapy Response Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {660 -- 669}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors tried to design a study to use unbiased way to predict IO response of SCC, by using WSI.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

it is timely, well designed and have clinical application
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

low n number and no external validation set. and the “unbiased strategy” might not be good in this era. as the authors might want to focus on some known features such as immune cells or tumor morphology, or at least use them as ablation or so to show they have considered it.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think this is mediocre study and the main weakness - low N number as well as no external validation set make it difficult to show any actual meaning to the field.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors proposed a framework that leverages larger field-of-view patches to capture TME interactions, subdivides them into smaller sub-patches for PFM compatibility, and uses trainable position encodings to maintain spatial relationships during feature re-aggregation.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors innovatively combined TME features with PFMs, addressing the limitation of traditional PFMs in capturing subtle biomarker variations. The authors conducted extensive experiments with five-fold cross-validation, ablation studies, and comparisons against four MIL methods, demonstrating superior performance in immunotherapy response prediction.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The study uses only 128 ESCC patient WSIs, which may raise concerns about generalizability. The choice of combining learnable and row-column encodings is not theoretically justified. A deeper analysis of why this hybrid approach outperforms alternatives (e.g., sinusoidal encodings) would strengthen the methodology. The proposed framework involves multiple steps (patch subdivision, position encoding, and aggregation), but the computational overhead is not quantified.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a technically sound and novel framework for TME-guided PFM fine-tuning, with clear potential to improve ESCC immunotherapy prediction. The strengths lie in its innovative integration of TME features, flexible patch handling, and strong experimental validation.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Given the feedback from other authors, my previous concerns were satisfactory for the most part.

Review #3

Please describe the contribution of the paper

The paper presents a new tumour-microenvironment (TME) aware finetuning framework for pathology foundation models, applied for esophageal squamous cell carcinoma (ESCC). In addition, the authors propose a global positional embedding for the whole pathology slide (row-column encoding) to maintain spatial relationships within whole-slide images (WSIs).
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. At the core, this paper proposes a novel patch-embedding aggregation strategy to get WSI-level representations from pathology foundation models. This is an active area of research, has broad applicability and could have high impact on the field.
2. The paper contains an extensive quantitative evaluation: the pre-experiments with frozen UNI and TME features, the comparison of different MIL techniques, patch aggregation sizes, and a comprehensive ablation study. Although, all done on one non-publicly available dataset, this evaluation is a strong point of the paper.
3. The framework is potentially generalizable to other WSI classification and regression tasks beyond ESSCC.
4. Their fine-tuning method does not require additional manual data annotation but shows an increase in performance. The core idea of their fine-tuning method could enhance other PFMs in other settings as well but is not validated broadly enough to really tell.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The TME-guided learning is not well motivated. Why do you want your deep features to be close to the hovernet + sc-MTOP based features?
2. The evaluation was performed on a relatively small dataset (128 ESSCC patients in total), posing questions regarding generalizability and robustness of the presented results.
3. The optimization and hyperparameter search procedures are unclear. The paper does not give any information on the optimization of the final classification task (ESSCC therapy response prediction), for the PFM fine-tuning it seems that no hyperparameter search was performed. Giving detailed information here would greatly enhance reproducibility.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
1. „Unfortunately, these approaches have struggled in predicting clinical outcomes of ESCC immunotherapy.” - Please add a reference to this argument.
2. What is the kernel size of the convolutional layer in Eq. 2? Does it change with the number of subpatches?
3. “The freezing layers of PFMs (UNI [5] and Prov-GigaPath [18]) were set to 300.” - Please explain what this means.
4. Does the “Without PE” setting in the ablation study refer to the row-column encoding only or to the combination of learnable position encoding and row-column encoding in combination? I’d be more interested in the former, since the learnable position encoding is part of the PFM anyways and not part of your contribution.
5. You can actually change the input size of pathology foundation models by resizing and bilinear interpolating their learned positional embedding tensor (e.g. via timm.layers.pos_embed.resample_abs_pos_embed). This could be done instead of the sub-patch aggregation method but would be more computationally demanding because of the n² complexity of the attention blocks. Probably something for future work.
6. The idea of aligning PFM features with morphological features could be expanded to a independent original contribution. Maybe the increase in performance of PFM features is reproducible on other datasets, so this would introduce a new post-training task to increase performance of PFMs.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper has many good ideas (global positional embedding for WSIs, TME-guided fine-tuning, patch-aggregation) and does a good quantitative evaluation, albeit only on a single in-house dataset which makes reproducibility hardly possible. Anyways, I think the community is better off with this work published, since some of the ideas could be re-used and better evaluated in future work.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the reviewers for their critiques and suggestions. We will address these and update the manuscript if allowed. Abbreviations below represent: R1-W1/O1 (response to Reviewer 1’s first Weakness/Optional comments). R1-W1: Given limited clinical treatment history, high costs, and poor patient compliance of immunotherapy for ESCC: cases satisfying the inclusion criteria with complete preoperative biopsy pathology and prognostic data are limited. This makes data collection challenging yet valuable as no such datasets are publicly available. To address this, we used 5-fold cross-validation and aggregated all predictions for analysis to avoid instability caused by small samples in test sets (Table 2). Previously, we also conducted external validation (please see R3-W2). More cases will be included in the future. R1-W2: Our design combines learnable encodings for task-specific dependencies and fixed row-column grids for rigid geometric constraints, preserving structural information while enabling adaptive spatial reasoning. The two positional encoding vectors are nearly orthogonal (cosine similarities close to 0). Ablation study also confirms that removing either component degrades performance (acc ours: 0.7094 vs. 0.6679/0.6728). Comparative experiments against sinusoidal encodings show that our design performs better on this task. We will provide a deeper theoretical analysis in Section 2.3. R1-W3: PFM training: patch subdivision; position encoding; and aggregation take 1.4s, 1.8s, and 7.9s per epoch, respectively. Inference takes 170.6s (UNI) vs. 213.5s (ours) per WSI, with the above stages requiring 63.1s, 92.4s, and 0.2s, respectively. Despite the longer inference time, our method achieves significant performance improvements. R2-W1: HoverNet + sc-MTOP quantify cellular attributes (e.g., morphology, phenotype) and inter-cellular spatial relationships to establish common TME representations [4,21], guiding the model to extract beyond image-level features and instead interpret TME characteristics. We mentioned this motivation in Section 1, paragraph 3, and will explain more clearly in the text. R2-W2: Please see R1-W1. R2-W3/O2/O3: (1) We used Adam with default MIL parameters as cited. (2) For fine-tuning, a. please see Table 3 for the number of sub-patches / patch size selection; b. “freezing layers were set to 300” refers to freezing the first 300 layers; we tested freezing layers = {500, 300, 100}, with 300 balancing accuracy and efficiency; c. the conv kernel size was fixed at 3 (Pytorch’s default), independent of the number of sub-patches, to better preserve fine-grained spatial relationships in cell interaction. R2-O1: Wang, X. et al. Spatial interplay patterns of cancer nuclei and tumor-infiltrating lymphocytes (TILs) predict clinical benefit for immune checkpoint inhibitors. Sci. Adv. 8, eabn3966 (2022). References will be included in the final version. Our preliminary experiments (Table 1, first row) also show that conventional MIL-based approaches achieve limited performance in this task. R2-O4: “w/o PE” refers to removing row-column encoding, and we will explain more clearly in the final version. R2-O5/O6: We appreciate the insightful feedback, which will inform our future research. R3-W1: Please see R1-W1. R3-W2: We performed external test in early experiments (pos:neg=15:36) with acc=0.6857. Previously, we could not show these results due to ethical restrictions. We will advance the ethics approval process to promote the dataset’s public availability. R3-W3: Model fine-tuning used patch pairs. Although slide numbers were limited, we obtained around 15,000 patches, forming 22,500 pairs for sufficient training. We calculated correlation coefficients of: (1) similarity scores between pairs of patch embeddings from fine-tuned PFM (2) similarity scores between corresponding pairs of patch-level TME features. The correlation score 0.86 indicates that the model effectively captures known features.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Main issues are addressed by the authors after the rebuttal. I recommend to accept this paper.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

his paper presents a novel and well-motivated patch-embedding aggregation strategy. Despite the limitation of using a single non-public dataset, the thorough experimental evaluation and innovative integration of TME features with PFMs make a compelling case for its contribution.

back to top

Tumor Microenvironment-Guided Fine-Tuning of Pathology Foundation Models for Esophageal Squamous Cell Carcinoma Immunotherapy Response Prediction

Author(s):