List of Papers Browse by Subject Areas Author List
Abstract
Accurate prognosis of non-small cell lung cancer (NSCLC) patients undergoing immunotherapy is essential for personalized treatment planning, enabling informed patient decisions, and improving both treatment outcomes and quality of life. However, the lack of large, relevant datasets and effective multi-modal feature fusion strategies pose significant challenges in this domain. To address these challenges, we present a large-scale dataset and introduce a novel framework for multi-modal feature fusion aimed at enhancing the accuracy of survival prediction. The dataset comprises 3D CT images and corresponding clinical records from NSCLC patients treated with immune checkpoint inhibitors (ICI), along with progression-free survival (PFS) and overall survival (OS) data. We further propose a cross-modality masked learning approach for medical feature fusion, consisting of two distinct branches, each tailored to its respective modality: a Slice-Depth Transformer for extracting 3D features from CT images and a graph-based Transformer for learning node features and relationships among clinical variables in tabular data. The fusion process is guided by a masked modality learning strategy, wherein the model utilizes the intact modality to reconstruct missing components. This mechanism improves the integration of modality-specific features, fostering more effective inter-modality relationships and feature interactions. Our approach demonstrates superior performance in multi-modal integration for NSCLC survival prediction, surpassing existing methods and setting a new benchmark for prognostic models in this context.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0301_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{XinQil_CrossModality_MICCAI2025,
author = { Xing, Qilong and Song, Zikai and Gong, Bingxin and Yang, Lian and Yu, Junqing and Yang, Wei},
title = { { Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15974},
month = {September},
page = {136 -- 146}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper curates a large dataset comprising 3D CT images and clinical records of immunotherapy patients and introduces a cross-modality masked learning approach to enhance feature fusion for survival prediction.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Addresses a clinically significant prognostic problem by focusing on multimodal data.
- Introduces an innovative yet straightforward method applicable across various scenarios.
- Utilizes a robust cohort with a sufficient sample size.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The comparative analysis in Table 2 relies on a limited set of methods, predominantly older works (the latest being DAFT from 2022). Including more recent approaches in multimodal prognostication could enhance the credibility of the results.
- There is ambiguity regarding the specific tabular data used and whether Response Evaluation data are included in training and inference (as shown in Table 1). Given that Response Evaluation and Overall Survival are future-oriented and strongly correlated, their inclusion as input might raise concerns about information leakage. Clarification on this point is needed.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The recommendation is based on the strengths and weaknesses outlined above, with the expectation that the authors will address and improve the identified weaknesses.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors have effectively addressed my concerns, and I recommend accepting this paper.
Review #2
- Please describe the contribution of the paper
The main contributions of this paper can be summarized as follows: -This paper introduces a large-scale dataset containing 3D CT images, clinical records, and progression-free survival and overall survival data from NSCLC -To achieve more effective multi-modal feature fusion, the paper introduces a cross-modality masked learning approach.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Firstly, the dataset is large and clinically relevant, providing a valuable resource for survival prediction in NSCLC patients undergoing immunotherapy. The paper proposes a novel framework for multi-modal feature fusion aimed at enhancing the accuracy of survival prediction. And this framework integrates data from different modalities, including CT images and clinical records, and employs an innovative fusion strategy to improve the complementarity and relationships between these data types.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The main weaknesses of the paper are as follows: First, in the experiments presented in Table 2, the paper fails to compare the proposed method with more recent multi-modal fusion approaches in medical image analysis, such as MCAT and MoCAT, which have shown effective strategies in pathological image survival analysis and could be applied to CT images. Second, the “masked encoding” method proposed in the paper lacks sufficient innovation, as it is a common technique in contrastive learning and does not present significant advancements over existing methods. Lastly, while the dataset introduced in the paper is highly valuable for medical research, the authors do not clarify whether it will be publicly released, which would greatly enhance the accessibility and reproducibility of the work. -Jiang S, Gan Z, Cai L, et al. Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024: 329-339. -Jaume G, Vaidya A, Chen R J, et al. Modeling dense multimodal interactions between biological pathways and histology for survival prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 11579-11590.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
My score was primarily based on the design of the proposed methodology and the transparency regarding the availability of the dataset and code. While the paper presents an interesting approach, the novelty and robustness of the method need to be further demonstrated through comparisons with more recent techniques. Additionally, the lack of clarity on whether the dataset and code will be publicly released is a significant factor, as making these resources available would enhance the reproducibility and impact of the research. These factors together influenced my overall assessment.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
- Large-Scale Dataset: A comprehensive dataset comprising 3D CT images, clinical records, and survival data (PFS and OS) from 2,128 NSCLC patients treated with immune checkpoint inhibitors (ICI).
- Cross-Modality Masked Learning Framework: A novel multi-modal fusion method that integrates 3D CT images and clinical data using masked learning.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- It introduces an innovative masked learning approach for multi-modal fusion.
- It curates a large, clinically relevant dataset with paired imaging and tabular data.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Although the dataset is large, it lacks validation on external cohorts.
- Please provide the sample size for each numerical variable.
- The explanation of symbols should be improved. In {z1, …, zd} ∈ R^d×c, d denotes the number of clinical variables. However, in {x1, …, xn} ∈ R^d×c, the meanning of n is unclear—please clarify.
- The use of symbols should be more rigorous. For example, if zi is a row vector in {z1, …, zd} ∈ R^d×c, semicolons (z1; …; zd) should be used to represent stacking of row vectors.
- In Tables 2 and 3, an asterisk (*) indicates statistical significance. Please specify which row serves as the reference for comparison.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This study effectively leverages existing technologies to promote progress in medical AI and exhibits promising performance compared to alternative methods.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank the reviewers for their constructive comments and appreciations of our strengths such as ‘an innovative masked learning approach’ (R1), ‘addresses a clinically significant prognostic problem’ (R2) and ‘a novel framework’ (R3). Here we respond to the reviewers point by point:
Lacks validation on external cohorts (R1) We appreciate the concern regarding external cohort validation. As our dataset focuses on patients undergoing ICI treatment, data collection is logistically and clinically complex, making it challenging to assemble sufficient external data for validation at this stage. However, multi-center data collection is ongoing, and we plan to include comprehensive external validation in future work. In the absence of external cohorts, we perform rigorous cross-validation, with our method consistently outperforming baselines and demonstrating strong generalization.
Sample Size (R1) The sample sizes for each numerical variable after filtering out NaN values are as follows: Age (2127), Albumin (2054), ALP (1743), ALT(2073), BMI (2068), PLR (2101), Bilirubin (2064), CA (2057), Ici Cycles (2038), LDH (1916), NLR (2101), AST (1974), SC (2065).
Writing issues (R1) (1) We apologize for the notation error in the manuscript. The symbol x_n should be corrected to x_d. Specifically, for each clinical variable, there exists a global feature z_i, which captures the semantic representation of that variable, and a sample-specific node feature x_i, which encodes information unique to each patient sample. The final graph is built from both features to support effective node feature updates. (2) Regarding the use of the asterisk (*) to indicate statistical significance, we use the second-best performing method, the Interactive-Model, as the reference baseline for comparison. (3) All of the above issues will be corrected or clarified in the revised version, and semicolons will be used to denote vector stacking.
Comparison with latest methods (R2) Thanks for the suggestion. We followed the advice and conduct a comparison. Our method outperforms MCAT (ICCV 2021), MoCAT (ICCV 2023), and MCTI (MICCAI 2024), with PFS (0.701 vs. 0.673/0.690/0.685) and OS (0.705 vs. 0.676/0.689/0.694).
Tabular data splits for training and inference (R2) Thanks for pointing this out. The Response Evaluation information shown in Table 1 is only provided to facilitate a clearer understanding of the dataset. During the training process, only the survival value is used as the supervision target. The Response Evaluation variable is excluded from training, while all other variables are used as input clinical features. We will make this point clear in the revision.
Comparison with latest methods (R3) Thanks and please see our response to R2.
Novelty of the Framework (R3) The masked encoding indeed is a common strategy in contrastive learning. However, the novelty of our work lies in how we integrate it into a cross-modality completion module specifically designed to enhance multi-modal fusion. This module leverages masked learning not merely for feature reconstruction but to explicitly model inter-modal dependencies—an aspect not addressed by standard masked encoding approaches. Additionally, we propose a multi-level masked token embedding mechanism for the tabular branch. Unlike conventional designs, our method aggregates global variable embeddings from each encoder block to initialize masked tokens in the decoder. This encourages the model to learn more expressive representations of clinical variables through targeted reconstruction from unmasked context. These components collectively constitute a novel framework that extends beyond standard masked encoding, providing new insights into modality-aware representation learning.
Dataset release (R3) We have released our code on GitHub. The dataset is still being collected from multiple centers. We will release the dataset once we clean up the data and pass the privacy checks.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper does not establish clear advances over recent SOTA methods and offers insufficient evidence of robustness,. This has been pointed out by reviewers and is the main score driver.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A