Abstract

Survival prediction using whole slide images (WSIs) can be formulated as a multiple instance learning (MIL) problem. However, existing MIL methods often fail to explicitly capture pathological heterogeneity within WSIs, both globally – through long-tailed morphological distributions, and locally – through tile-level prediction uncertainty. Optimal transport (OT) provides a principled way of modeling such heterogeneity by incorporating marginal distribution constraints. Building on this insight, we propose OTSurv, a novel MIL framework from an optimal transport perspective. Specifically, OTSurv formulates survival predictions as a heterogeneity-aware OT problem with two constraints: (1) global long-tail constraint that models prior morphological distributions to avert both mode collapse and excessive uniformity by regulating transport mass allocation, and (2) local uncertainty-aware constraint that prioritizes high-confidence patches while suppressing noise by progressively raising the total transport mass. We then recast the initial OT problem, augmented by these constraints, into an unbalanced OT formulation that can be solved with an efficient, hardware-friendly matrix scaling algorithm. Empirically, OTSurv sets new state-of-the-art results across six popular benchmarks, achieving an absolute 3.6% improvement in average C-index. In addition, OTSurv achieves statistical significance in log-rank tests and offers high interpretability, making it a powerful tool for survival prediction in digital pathology. Our codes are available at https://github.com/Y-Research-SBU/OTSurv.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1359_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Y-Research-SBU/OTSurv

Link to the Dataset(s)

N/A

BibTex

@InProceedings{RenQin_OTSurv_MICCAI2025,
        author = { Ren, Qin and Wang, Yifan and Fang, Ruogu and Ling, Haibin and You, Chenyu},
        title = { { OTSurv: A Novel Multiple Instance Learning Framework for Survival Prediction with Heterogeneity-aware Optimal Transport } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {444 -- 454}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    (1) this paper proposes a novel OT-based MIL framework for WSI survival prediction; (2) this paper designs two tailored marginal constraints in OT to model global and local heterogeneity in WSIs.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors incorporate Optimal Transport (OT) princinple into WSI modeling to capture WSI heterogeneities for survival risk prediction of cancer patients.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) some results for the proposed method such as on TCGA-BRCA, and TCGA-CRC are not so promising. (2) explanations for the OT methods are not easy to be understood by readers. (3) It is recommended that the authors reproduce existing methods instead of directly copying results from the paper: MMP[20]. (4) Why is this work, as a MIL method, limited only to the survival prediction task? MIL is applicable to various tasks such as classification and grading. (5) The authors are expected to include comparisons with CLAM and HIPT, which are relatively easy to reproduce. Ref: 1. Data-efficient and weakly supervised computational pathology on whole-slide images; 2. Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    the methodological novelty and results

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Based on rebuttals to my concerns, I would suggest acceptance.



Review #2

  • Please describe the contribution of the paper

    The authors introduce a new MIL algorithm for survival analysis from WSI. The algorithm is based on unbalanced optimal transport, where marginal constraints are relaxed to introduce prior on the heavy tail distribution of prognostic patches, and increasing total mass constraint. The performance of their algorithm is demonstrated on 6 indications against 7 baselines and prior arts, including interpretability experiments and ablation study.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Although the use of optimal transport to find coupling between patches is not new in WSI analysis, the learning of survival patches is novel, along with the use of marginal constraint to introduce priors. Survival analysis is a complex task, for which the paper demonstrates improvement over prior method (Table 1), along with a convincing interpretability experiment (Section 3.4). The paper is clear and well organized, except for the section 2.2 for which I raise concerns.

    A release of the code upon acceptance is mentioned.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    In my opinion the main weakness of the paper is the discrepancy between the theoretical tools used and the intuition presented.

    For the first constraint, the “desired long-tailed prior“ in equation 2, presented in the various figures, reveal to be an uniform distribution over the K learnable survival tokens. This uniform distribution is exactly the same as in classical optimal transport. I fail to understand why relaxing equality constraints to KL regularization induce a prior toward heavy tail distribution for survival tokens. In classical unbalanced optimal transport this relaxation allows to ignore too-costly pairs of samples, which is very similar to the mass ignored in your second step. Could you give more detail on the difference between both ? [https://arxiv.org/pdf/2211.08775]

    Moreover the link between the claimed local uncertainty-aware constraint and total mass constraint is unclear to me. The discussion about curriculum learning is interesting, however since the total mass is constrained, I fail to see why the transport plan should focus on the more confident patches, and not on the survival tokens that have been initialized close in L2 to some patches. The notion of easy and hard samples is unclear, and the link between total mass constraint and curriculum learning is not obvious to me, although I feel it is one of the key claims of the paper.

    Few additional methodological points are unclear to me, so I raised them as questions in the following questions.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    “Alternatively, when F1 and F2 impose inequality constraints (e.g., using the KL divergence)”

    I don’t see the link between inequality constraint in a LP optimization problem and KL divergence.

    Eq 4: Is the inequality element-wise ?

    Eq 1, under the min Q should be in $\mathcal{R}_{+}^{N \times K}$

    It is unclear to me whether in the Cox model optimization the differentiation is done through algorithm 1, or considering the transport plan fixed. Could you comment on the computational complexity of your training algorithm ?

    Could you explain where the results from other works come from ? Did you perform any hyper-parameters optimization ?

    Other minor comments:

    • In Section 2.1, $C$ denotes both the number of color channels and the cost matrix.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The algorithm presented is novel and seems to improve upon prior art. However the current state of the Method section is confusing and fails to convince me of the intuition behind the design choice. I look forward to reading the revised version, and the authors’ answers to my questions!

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    I would like to thank the authors for their responses. Although I acknowledge the experimental results, as other reviewers, I am still skeptical about the formal explanation of the algorithm.

    Due to the great experimental results, I hesitated between Accept and reject. Finally I believe that the Method section needs major changes to ease its readability. In the current state of the paper, the method section tries to build intuition by calling various tools which are used in singular ways. For instance, claims about uncertainty aware curriculum learning in my opinion are overclaiming. In the same way, the modeling of prior morphological distributions is not easy to grasp. The result still makes the method hard to fully understand - at least for me - and even after reading the rebuttal (and the paper) several times. To that extent, I am still not inclined to test the method, and I am afraid that the method will be ignored by practitioners, in spite of the great results reported.

    Note also that the potential future release of the code may change this, but for now, no anonymized version of the repository was shared to better understand the practical implementation of OTSurv.



Review #3

  • Please describe the contribution of the paper

    The authors presents OTSurv a MIL framework for survival prediction. OTSurv has two key constraints: a global long-tail constraint to model morphological distributions and prevent mode collapse, and a local uncertainty-aware constraint that amplifies confidence in predictions while minimizing noise. The framework reformulates the OT problem into an unbalanced version solvable via a matrix scaling algorithm.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel formulation of the OT problem is done and adapted to modeling whole slide images. This formulation is beneficial to the community as it is efficient and hardware-friendly.
    • Site-stratified splits are used to counter any data leakages in TCGA. I highlight this point as not many people know of this issue and we as a field need to shift towards using site-stratified splits.
    • Extensive ablations are done to gain insights into the model. I appreciate this thoroughness.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • MMP is not a very valid comparison as it is a multimodal baseline (histology + genomics). However, the proposed method is histology only. I recommend using the PANTHER baseline (Song et al., CVPR 2024).
    • Can the authors also show that their method is not dependent on UNI patch encoder and also try other strong patch encoders such as Virchow? This will be a valuable ablation for Table 2.
    • I would recommend applying your method to independent test sets from CPTAC after training on TCGA. This will show robustness and generalizability.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The paper introduces a novel MIL + OT method which is scaleable.
    • Thorough ablations are done.
    • Writing is clear and messages are easy to follow.
    • Authors take efforts to avoid train-test leakage in TCGA.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers so much for thoughtful feedback!

[R1, R3] OT Intuition: Capturing global and local pathological heterogeneity is vital for survival prediction: global ones arise from long-tailed cluster-size distributions, while local ones stem from ambiguous regions. We model this via OT, which assigns instance features to learnable survival tokens, akin to supervised clustering. In OT, transport matrix denotes assignments; column sums reflect cluster sizes (global), while total mass constraint prioritizes high-certainty patches (local). In Fig. 2b, OT heatmap shows patch-level contributions, indicating its ability to capture local heterogeneity.

[R1, R3] Result Reproduce and Hyperparameter: For consistency, we cite results of ABMIL, TransMIL, AttnMISL, IB-MIL, ILRA, and MMP from MMP, and implement Mean-Pooling, as MMP provides benchmarks with default settings, all training details, and codebase where our OTSurv is built on. As noted by R1, we reproduced these baselines in same codebase with their hyperparameters. Mean C-index scores (0.596, 0.576, 0.610, 0.571, 0.591, 0.610) are consistent, supporting the reliability of Tab. 1. We will revise results in final version.

[R1] Task-agnostic OT: Our OT is task-agnostic and applicable to other tasks like classification and grading. Here we focus on survival prediction due to its clinical significance.

[R1, R2] Additional baselines: We implement CLAM and HIPT on six benchmarks, with mean C-index scores of 0.608 and 0.595, 3.8% and 5.1% lower than OTSurv. Due to character limits, details will be in the final version. Since PANTHER is MMP’s unimodal variant, its results are in Tab. 1 under MMP.

[R2] Generalization to CPTAC: We validate on CPTAC, yielding strong mean C-index (0.632), confirming generalization and robustness.

[R2] Virchow: Using Virchow yields similar mean C-index (0.638), confirming OTSurv is not dependent on UNI.

[R3] Relaxing equality in Global Constraint Equality constraints demand a long-tail target distribution, which OT cannot handle. Inspired by Unbalanced OT, we use KL regularization to follow a uniform target distribution, thereby making OT tractable to derive long-tailed distribution from data.

[R3] Unbalanced OT vs Local Constraint: Unlike 2211.08775, which leaves transport mass unspecified, our local constraint explicitly sets the transport mass and progressively increases it during training.

[R3] Link between mass constraint and curriculum learning: Our total mass-based local constraint uses curriculum learning mechanisms: learning from easy patches to harder ones. Since OT minimizes total transport cost, easy patches are low-cost samples, while high-cost patches are harder or uncertain. In ablations of “Patch Select.” in Tab. 2, manually thresholding cost matrix (ie, L2 between patches and survival tokens) is brittle and largely underperforms our method. Instead, we implement soft patch selection via total mass budget, which implicitly prioritizes low-cost (high-confidence) patches. So OTSurv selects informative patches based on overall transport cost, rather than raw distance alone.

[R3] Inequality constraint vs KL divergence: KL relaxes hard equality constraint into a soft form that implicitly penalizes deviations from target distribution, which is similar to literal inequality constraint.

[R3] Is Inequality element-wise? Total transported mass $\rho$ is sum of all elements in Q. It is a global scalar, not element-wise.

[R3] Differentiability: Operations in Alg. 1 are element-wise, making it differentiable. Survival tokens are differentiable, and transport matrix preserves gradients for Cox loss optimization.

[R3] Computational Complexity: Our OT has time complexity of O(NTK), where N is patch number per WSI, T is scaling algorithm iteration, and K is survival token number, comparing to O(N^2) of a Transformer layer. Generally, N » TK. At inference, it runs about 0.3s per WSI on a RTX3090.

[R3] Typos: Typos will be revised.Thanks!




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper is borerline+. It addresses a relevant question, and brings novel insights into the integration of Optimal Transport for survival risk prediction on WSI. According to the reviews, this paper is borderline, as despite the highlighted experimental validations (R1 and R2) several questions were raised on the clarity (R1 and R3) and soundness (R3) of the propositions. After the rebuttal, while R1 upgraded his rating, R3 maintains a rejection suggestion considering the estimated effort to update the methods section. We encourage authors to take this remark seriously and clarify the formalisation and claims in the revised version of the paper.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top