Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The acquisition of 4D medical images, which are crucial for monitoring disease progression, poses significant challenges due to the expensive cost and the imaging mechanism constraints. Existing solutions attempt to interpolate the volumes between the acquired volumes with linearly scaling the initial deformation between two distant phases like end-systole and end-diastole, to generate detailed 4D image. However, the simple linear motion assumption fails to accurately model the anisotropic deformation induced by respiration and heartbeat. In this paper, we propose a temporal modulated multi-scale deformation fusion framework for 4D medical image interpolation via knowledge distillation, to directly generate the bidirectional deformation and volume at any intermediate time without the sub-optimal linear motion assumption. Guided by the teacher model with extensive priors, the student model, modulated by surrogate timestamps, learns to approximate the deformation modeling ability of teacher without any need for intermediate volumes. Particularly, a multi-scale deformation fusion decoder is proposed including the temporal modulated deformation feature generator and the deformation fusion module. The former generates modulation parameters with timestamps for temporal-aware transformation and then models the bidirectional deformation in a coarse-to-fine manner. While the latter adaptively fuse deformation features at different scales to improve the accuracy of predicted deformation. Compared with nine state-of-the-art methods, the proposed method achieves superior performance on two public datasets, fully demonstrating its effectiveness and generalization.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1810_paper.pdf

SharedIt Link: https://rdcu.be/eHwZg

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04984-1_53

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaJia_Temporal_MICCAI2025,
        author = { Zhang, Jiaju AND Ai, Danni AND Gan, Zhikun AND Fu, Tianyu AND Fan, Jingfan AND Song, Hong AND Xiao, Deqiang AND Yang, Jian},
        title = { { Temporal Modulated Multi-Scale Deformation Fusion via Knowledge Distillation for 4D Medical Image Interpolation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {551 -- 561}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a knowledge distillation-based volume interpolation framework with temporal modulated multi-scale deformation fusion to predict bidirectional deformation and corresponding volumes at arbitrary time. The authors validate their approach using two public datasets: ACDC and 4DLung. The authors train a teacher model using supervision from intermediate frames, then guide the student model’s learning through distillation loss functions. The ablation experiment results show significant performance improvement compared to training methods using only end-frame data (represented by UVI-Net, which doesn’t use any intermediate supervision training method). The multi-scale deformation fusion decoder shows substantial performance improvements compared to linear scaling interpolation methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The author’s formulation method demonstrates a certain novelty, and the research problem has clinical significance.
1. One of the primary strengths of this paper lies in its successful integration of several established techniques into a cohesive framework for 4D medical image interpolation. The authors have effectively combined knowledge distillation with a multi-scale deformation fusion approach, which is a notable contribution.
2. The use of a teacher-student model, where the teacher network guides the learning of the student network, is a well-established technique, but the authors have adapted it to the specific task of volume interpolation.
3. The author demonstrates performance across four metrics (PSNR, NCC, SSIM, NMSE) with two public datasets ( ACDC and 4DLung) .
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The lack of clear articulation regarding the novelty of the proposed method: from the introduction, the authors fail to provide reasonable motivation for introducing knowledge distillation.
2. To some extent, knowledge distillation introduces more detailed training data, while the comparison method advocates using self-supervised approaches (i.e., not using intermediate results for supervised training). The superiority over the comparison method is obvious (the authors directly used UVI-Net’s results). In terms of pure model innovation, the proposed method is not better than the comparison method UVI-Net.
3. The paper claims novelty in the combination of these techniques, specifically the “temporal modulated multi-scale deformation fusion” framework and the use of “surrogate timestamps” to avoid linear scaling. However, the paper does not provide a detailed comparison of its multi-scale deformation fusion implementation with existing ones to highlight its specific advancements. This makes it difficult to assess the true innovation of the proposed method. At the same time, it is recommended to provide more comparative results of technical metrics for the models, such as parameters size.
4. Although the division of training and testing was introduced, the specific usage of each sample was not provided. It is recommended to provide detailed data usage information to give readers a clear understanding.
5. The author did not provide measures of uncertainty (such as standard deviations) or statistical analysis.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The lack of clear articulation regarding the novelty of the proposed method: from the introduction, the authors fail to provide reasonable motivation for introducing knowledge distillation.
2. To some extent, knowledge distillation introduces more detailed training data, while the comparison method advocates using self-supervised approaches (i.e., not using intermediate results for supervised training). The superiority over the comparison method is obvious (the authors directly used UVI-Net’s results). In terms of pure model innovation, the proposed method is not better than the comparison method UVI-Net.
3. The paper claims novelty in the combination of these techniques, specifically the “temporal modulated multi-scale deformation fusion” framework and the use of “surrogate timestamps” to avoid linear scaling. However, the paper does not provide a detailed comparison of its multi-scale deformation fusion implementation with existing ones to highlight its specific advancements. This makes it difficult to assess the true innovation of the proposed method. At the same time, it is recommended to provide more comparative results of technical metrics for the models, such as parameters size.
4. Although the division of training and testing was introduced, the specific usage of each sample was not provided. It is recommended to provide detailed data usage information to give readers a clear understanding.
5. The author did not provide measures of uncertainty (such as standard deviations) or statistical analysis.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

Reference [21] UVI-Net also seems to disagree with the assumption that physiological motion is linearly uniform over time. I’m not clear on what basis the authors designed such a complex network structure. The authors mentioned in their feedback that ‘intermediate volume I_t are taken as the input of teacher model.’ Given that the Teacher model and Student model are similar, with the Teacher model using more supervision (intermediate volume I_t) for training and then supervising the Student model, I’m curious whether using direct supervision from the intermediate volume I_t for the Student model wouldn’t be more effective than this soft supervision? Additionally, detailed parameters about the network structure aren’t provided, and the network parameter count is unclear. What might be beneficial to reference is the good use of intermediate volume supervision from the training set to help with inference.

Review #2

Please describe the contribution of the paper

The use of knowledge distilation for learning spatiotemporal deformation fields in 4D medical image interpolation. The teacher model has access to the intermediate volume and learns high-quality deformations; the student, using only the endpoint volumes and timestamp, learns to mimic the teacher’s behavior.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The major strengths of this paper is mainly focused on technical contributions: (1) Knowledge Distillation for Deformation Modeling (2) Temporal Modulated Deformation Learning (3) No Need for Intermediate Ground Truth at Inference
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The paper is a bit hard to follow, even for experts. (1) The Introduction could be improved to better highlight the key ideas of the paper. (2) Figure 1 is too busy, with many modules and notations—consider simplifying it for clarity. (3) How do the authors ensure the preservation of topology in the estimated intermediate images? How can their clinical utility be evaluated?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is technically sound. If the writing can be improved and the code is provided, it would be suitable for acceptance.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have addressed most of the concerns raised by the reviewers; however, the explanation regarding the preservation of topology remains unconvincing. I would like to maintain my initial rating: weak accept.

Review #3

Please describe the contribution of the paper

The paper addresses the task of medical image interpolation by employing a teacher-student paradigm. This approach guides the student network to predict the deformation of intermediate volumes at arbitrary times between acquisition points, aiming to enhance the accuracy of the interpolated images. The authors introduce a multi-scale deformation decoder to minimize the impact of significant differences between endpoint phases. This component captures both global and local deformations in a coarse-to-fine manner, ensuring high fidelity and anatomical accuracy in the interpolated images. The method is evaluated on two publicly available datasets and compared against nine existing techniques, demonstrating superior performance across various metrics.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The authors conducted an extensive quantitative evaluation, comparing their method to nine existing approaches, and demonstrated strong performance.
- Additionally, the authors provided a comprehensive ablation study, highlighting the significance of each proposed component in contributing to the overall results of the method.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1 - The paper lacks a discussion on limitations and runtime. It would benefit from addressing the method’s limitations, potential failure cases, and the computational cost. Given the large networks involved, details on training time and whether the additional computational expense is justified compared to existing methods would be valuable.

2 - The paper does not provide explanation of why certain design choices were taken: a - Why the use of 4 scales of features? Is it enough to use only 2? b - How useful is the regularization $L_{reg}$?
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Very minor comments. The authors should:
1 - Think of a running title. 2 - Avoid using blue/red colors in table 1. Bold/Underlined text would be more friendly to readers. 3 - Specify what the reader should look at when red arrows are used in qualitative results. All images have a red arrow and it is not clear what they mean.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes a method that outperforms the existing methods. However, the method explanation is not smooth and is over-engineered.
Reviewer confidence

Not confident (1)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I am leaning towards accepting this work since the authors have addressed my concerns. The presentation of the paper was not optimal but the method might be useful for the community.

Author Feedback

We thank all reviewers for their valuable feedback and sincerely appreciate they found the paper technically novel [R1, R3] and clinically significant [R3] with extensive evaluation [R2, R3]. We address the main concerns below. Motivation [R3Q1-3, R1Q3] The deformation induced by heartbeat and breath is spatially anisotropic and temporally nonlinear. We propose the knowledge distillation based temporal modulated multi-scale deformation fusion to directly predict the deformation and volume at intermediate time with the non-linear motion assumption. Specifically, the teacher model takes intermediate and two terminal volumes as input to extract multi-scale features. And the priors of deformation at intermediate time are progressively modeled with the cross-scale feature fusion. In the student model, timestamps are used to replace intermediate volume and improve the temporal relevance of features. And the intermediate deformation is then directly predicted with the guidance of priors without the sub-optimal linear scaling operation. The student model approximates the feature distribution of teacher with the distillation loss, and keeps the topological preservation with the additional reconstruction loss and alignment loss. [R3Q1-2] Compare with UVI-Net. The UVI-Net takes only two terminal volumes as input with the limited assumption that the physiological motion is linearly uniform over time. The intermediate deformation is approximated by linearly scaling the difference between terminal volumes, which is difficult to capture the temporally nonlinear changes. [R3Q3] Compare with multi-scale methods. The multi-scale based SVIN or MPVF separately predict deformation at different scale where the scale-wised deformable features are independent without cross-scale fusion, resulting in limited performance on PSNR, NCC, SSIM and NMSE. [R1Q3] Preservation of topology. The preservation of topology is achieved by dual mechanisms of the temporal modulated multi-scale deformation fusion and knowledge distillation. Besides, the clinical usability can be evaluated by observing whether the temporal changes of structures like heart and lung realistic and continuous. The changes of heart over time is shown in Fig. 3. And the edge changes in our results are most smooth and continuous when the heart contracts over time. Technical details [R2Q2a] Number of scales. Compared to using 2 scales of features, a 4-scale architecture provides a comprehensive resolution hierarchy which can accurately capture larger deformation. And such setting is consistent with that in baseline methods like UVI-Net, VM, and TM. [R2Q2b] Regularization term. The alignment loss incorporates the regularization term that constrains spatial gradients of the deformation field to enhance smoothness and reduce the physically unrealistic foldings. The regularization term prevents the network from over-pursuing the similarity in appearance between volumes, significantly improving the plausibility of the predicted deformation. Experiment [R3Q4] Sample. In each sample, two terminal volumes I_0, I_1, and one randomly chosen intermediate volume I_t are taken as the input of teacher model. While I_0, I_1, and the relevant timestamp t are taken as the input of student model. [R3Q5] Statistical analysis. Thanks for your suggestion. We have added the standard deviations in Table 1, 2, and 3 in the final version. [R2Q1] Runtime and limitation. ①Our method achieves average inference times of 0.156s on ACDC and 0.291s on 4D_Lung, comparable to SOTA methods, while attaining superior accuracy on PSNR, NCC, SSIM, and NMSE. ②The current work focuses on the interpolation with the regular movements like heartbeat and breathing. There are few considerations for occasional scenarios like patient pose changes. In the future, the stability of our method in such occasional scenarios needs to be improved. Writing [R1Q1] Thanks for your suggestion. We have improved the description in the final version.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Temporal Modulated Multi-Scale Deformation Fusion via Knowledge Distillation for 4D Medical Image Interpolation

Author(s):