Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Significant progress has been made in AI-based prediction of therapeutic response to neoadjuvant chemotherapy (NAC) in breast cancer. However, current studies primarily rely on data from a single time point, neglecting the dynamic changes in tumor characteristics during treatment. To address this limitation, we propose a novel Dynamic Temporal Feature Difference Fusion (DTFDF) framework, which integrates image features from multiple time points throughout the treatment process to predict therapy response more precisely. Based on tumor spatial features, we design an innovative DTFDF strategy and introduce a treatment response-based triplet contrastive loss function to facilitate the learning of longitudinal tumor changes and enhance feature representation. Additionally, we incorporate biomarker prediction as an auxiliary task and introduce a feature decoupling-based multi-task learning module. This module generates feature representations for different tasks by accounting for both shared and task-specific information, improving response prediction. Experiments with data from 786 patients in the I-SPY 2 trial dataset demonstrate that our method achieves the highest AUC of 0.835 in predicting radiation therapy response, outperforming state-of-the-art (SOTA) approaches on longitudinal dynamic contrast-enhanced MRI data. Our source code is available at https://github.com/AlexNmSED/DTFDF.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0445_paper.pdf

SharedIt Link: https://rdcu.be/eHxe5

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05325-1_42

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/AlexNmSED/DTFDF

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HaoXin_Predicting_MICCAI2025,
        author = { Hao, Xinyu AND Xu, Hongming AND Zhang, Qibin AND Xu, Qi AND Wang, Xiaofeng AND Polonen, Ilkka AND Cong, Fengyu},
        title = { { Predicting Radiation Therapy Response based on Dynamic Temporal Feature Difference Fusion from Longitudinal MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {440 -- 449}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper tries to predict radiation therapy response from longitudinal MRI on the I-SPY dataset using deep learning framework that covers modules such as spatial feature fusion at different scales, temporal contrastive learning, temporal feature difference fusion module and multi-task module for feature disentanglement.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors present a highly engineered solution that could harness multiple aspects of longitudinal MRIs for the problem statement. The authors borrow multiple concepts and incrementally join them to form a pCR binary prediction framework.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Triplet loss is not clearly illustrated, had difficulty in understanding the sections in writing. How are ER, HER2, MP prediction adding to the performance - any interpretebility? Temporal feature difference can be simply explored through a TCN block, why such complexity? DTFDF model and module - same name makes it confusing. Multiple simple baselines could have been designed for comparison. Do all patients have T1 and T2, the intermediate timepoints?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please address the points raised in the weakness section
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The presented work introduces a novel framework for processing time-series imaging data comprising four time points during nCRT. The network integrates three newly designed components: (1) a DTFDF module that captures tumor dynamics across time points; (2) a treatment response-guided triplet loss that contrasts intermediate features extracted from different treatment stages; and (3) a FDMTL module that incorporates biomarker prediction (ER, HER2, MP) as auxiliary tasks to improve generalization. The proposed method is evaluated on the I-SPY 2 dataset, comprising 786 breast cancer patients, and demonstrates superior performance compared to three baseline approaches. An ablation study further highlights the individual contributions of each component.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper addresses a topic of high scientific relevance for the medical image computing community, the processing of time-series medical images, and is well-motivated by clinical needs.
2. The proposed DTFDF module is novel in its ability to capture tumor dynamics across multiple time points.
3. The FDMTL-based multi-task optimization strategy is particularly interesting and adds depth to the overall framework.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The proposed TR-TCL is conceptually similar to the TCL introduced in [20], but distinguishes itself by incorporating hard sample mining to enhance optimization during training.
2. Although the model is thoughtfully designed, it introduces a complex pipeline involving multiple modules, loss functions, and tasks, raising the question of whether the architecture might be over-engineered. Additionally, including attention map visualizations or saliency maps would greatly enhance the transparency and explainability of the method.
3. The comparative results would benefit from statistical significance testing to strengthen the validity of the reported performance gains.
4. The description of “decoupling” in the FDMTL module is somewhat vague and would benefit from a clearer and more detailed explanation.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the presented work is well-motivated and explores an important topic, the proposed method appears overly complex, and a more in-depth analysis is needed to identify which components contribute most significantly to its performance.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

All of my concerns have been well taken

Review #3

Please describe the contribution of the paper

This article introduces a novel approach for utilizing longitudinal breast cancer DCE-MRI data to predict treatment outcomes, specifically pathological complete response (PCR). The proposed framework, which incorporates innovative modules, effectively integrates image features from multiple time points during the treatment process to predict therapy response with greater precision. The model is trained and validated on a publicly available dataset. Overall, the article is well-structured and provides valuable insights into the potential of temporal image analysis in predicting treatment outcomes.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Advantages:
1. The model presents an innovative design, incorporating intriguing new modules such as DTFDF-M, FDMTL-M, STFAM, and triple contrastive loss.
2. The study includes a thorough set of ablation studies and comparative analyses.
3. A comprehensive comparison with other methods, including LSTM and Transformer models, demonstrates that the proposed approach achieves superior classification performance, particularly in metrics like accuracy (ACC), area under the curve (AUC), and sensitivity (SEN).
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Critical Concerns:
1. Definition of Time Points (T0, T1-2, T3): The definitions of T0, T1-2, and T3 are unclear. Could you please clarify how these time points are defined and why they were chosen?
2. Rationale for Shared Weights Between Pre-treatment and Inter-treatment: The rationale behind using shared weights between pre-treatment and inter-treatment data is not adequately explained. Could the authors elaborate on why this design choice was made? Why not using shared weights for post-treament sequence?
3. Triple Contrastive Loss (TCL) Description: The description of the triple contrastive loss is quite confusing and long. Specifically, the explanation regarding the definition of positive sample features and positive samples is unclear and potentially misleading. More clarification is needed on this point.
4. Rationale for Stacking Four TCB Layers: The decision to stack four TCB layers (i.e., four of these blocks) requires further justification. Have the authors conducted any ablation studies to support this choice?
5. Model Versatility: The model’s applicability appears limited, as it rigidly requires DCE-MRI data from four distinct time points. This presents a problem when some sequences are missing, making the model unusable in such cases. It would be useful to consider a more flexible approach.
6. Feature Extraction Backbone: Have the authors considered using other feature extraction backbones instead of the HiFuse module?
7. Explainability Study: The paper lacks an explainability study. It would be beneficial to investigate whether the model is focusing on tumor changes or if it is instead identifying features from other breast regions. This would help in understanding the model’s decision-making process.
8. Rationale for Choosing Clinical Tasks (HER2, MP, ER Classification): The rationale for choosing tasks like HER2, MP, and ER classification is unclear. Specifically, are these annotations done on pre-treatment MRI data? A clearer justification is needed for why these particular tasks were chosen, as well as ablation studies to show which task contributes most to predicting pCR.
9. Performance Evaluation (SPE): The model’s performance with respect to Specificity (SPE) is significantly lower than other methods. It is important to provide a clear explanation of what constitutes a positive and negative sample and why the SPE performance is so low. Additionally, it appears that DTFDF-M is associated with low SPE (likely due to negative samples); this needs to be addressed and explained.
10. Comparison with Gao et al. [3]: Although there are comprehensive comparison with other methods but the methods being compared are rather outdated. The authors do not compare their approach with the work by Gao et al. [3], which also involves a longitudinal MRI module. A comparison with their method would provide a more comprehensive evaluation of the proposed model.
11. Computational Expenses Comparison: A comparison of the computational expenses, such as FLOP (floating-point operations), model size, and inference time, is missing. This would help assess the efficiency of the proposed model relative to other methods.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The study presents an innovative approach, but further experimental validation is required to substantiate its effectiveness.
1. Although the study compares the results with four other methods, several state-of-the-art methods like reference [3] are not included in the comparison, and incorporating these would strengthen the evaluation.
2. Some design choices, such as the rationale for selecting specific modules and tasks, are not sufficiently explained. A clearer justification for these choices would enhance the model’s credibility.
3. Additional ablation studies are necessary to provide a more thorough understanding of the model’s components and their individual contributions.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have addressed most of the raised concerns. For instance, the definitions of T0, T1-2, and T3. Also, they promised to add the explanation about the visualization and explainability.

Author Feedback

We thank all the reviewers for their recognition and valuable feedback. Below, we provide responses to their comments.

1: TR-TCL (R1-R3) In TR-TCL computation, the post-treatment feature of each patient in the batch is used as an anchor. For positive sample selection, we choose the post-treatment feature of a non-pCR patient within the batch that has the largest distance from the anchor. Negative sample selection is handled in two cases: 1) For a pCR anchor, its own pre-treatment feature is used as the negative sample. 2) Otherwise, the post-treatment feature of the pCR patient closest to the anchor is selected. Compared to TCL in [20], we introduce dynamic hard sample mining to improve discriminative capability, enabling the model to better distinguish between visually similar but semantically different samples. An illustrative figure will be added for clarification.

2: Model design DTFDF Module (R2, R3): Unlike basic TCNs, DTFDF-M is designed to capture dynamic changes and adaptively fuse differential and current spatial features. Our ablation study confirmed its advantages, where replacing it with a Transformer-based fusion led to a 3.3% drop in AUC. To balance model complexity with dataset size, DTFDF-M used four TCBs.

Shared Weights (R3): T0 to T2 capture continuous treatment-related features, making shared weights beneficial for consistency and parameter efficiency. In contrast, T3 reflects the final treatment outcome and exhibits distinct morphological changes critical for pCR prediction, as verified by ablation studies. To avoid interference from T0-T2, independent weights are used for T3.

Multi-task learning (R1-R3): HER2, MP, and ER are biomarkers closely associated with pCR outcomes. Incorporating their prediction as auxiliary tasks introduces domain knowledge, which enhances the primary pCR prediction, as demonstrated in Table 2. Given higher pCR rates for HER2-positive patients, we hypothesize that HER2 prediction may contribute most to pCR prediction. Decoupling in FDMTL refers to the separation of unique and shared features across tasks. We will add these additional explanations to revised manuscript.

Model Versatility (R3): We agree that relying on complete data limits model versatility. However, this study focuses on modeling tumor dynamics during NAC, requiring full time series for accurate pCR prediction. Addressing incomplete data challenge is beyond the current scope but will be considered in future.

Backbones (R3): HiFuse employs a joint local-global modeling strategy combined with multi-scale feature fusion, enabling integration of fine-grained details with global semantic information. Given its strengths, we adopt HiFuse as our backbone without considering other alternatives.

3: Dataset Description (R2,R3) Time points are defined as: pre-treatment (T0, pre-NAC), after 3 cycles (T1, early NAC), after 12 cycles and between drug regimens (T2, mid-NAC), and post-treatment (T3, post-NAC, before surgery). All patients have four time points. We will provide these explanations.

4: Performance Simple Baselines (R2): Table 2 includes several baselines, such as using features from individual time points or various combinations of time points. These baselines effectively validate key components of our framework.

Comparison with Gao et al. (R3): Gao et al. used a static one-hot time encoding strategy for sequential modeling, whereas our DTFDF adopts dynamic modeling to learn temporal dependencies. Due to this difference, we did not compare Gao et al.’s method.

Statistical Test & Visualizations (R1, R3) & Computational Cost (R3): We fully agree and will include corresponding explanations in the revised manuscript.

Lower SPE (R3): The small proportion of pCR patients leads to a severe class imbalance. DTFDF-M is designed to enhance the dynamic characterization of pCR patients, thereby improving detection sensitivity for pCR samples. However, this increases false positives, resulting in a lower SPE.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper tackles a relevant clinical problem and presents a method with several interesting components, including the DTFDF module and dynamic hard sample mining in TR-TCL. While there were concerns about model complexity, clarity, and the need for additional analysis, the rebuttal provides reasonable clarifications and plans for improvement. Overall, the contributions are solid, and I recommend acceptance.

back to top

Predicting Radiation Therapy Response based on Dynamic Temporal Feature Difference Fusion from Longitudinal MRI

Author(s):