Abstract

We address the challenge of parameter-efficient fine-tuning (PEFT) for three-dimensional (3D) U-Net-based denoising diffusion probabilistic models (DDPMs) in magnetic resonance imaging (MRI) image generation. Despite its practical significance, research on parameter-efficient representations of 3D convolution operations remains limited. To bridge this gap, we propose Tensor Volumetric Operator (TenVOO), a novel PEFT method specifically designed for fine-tuning DDPMs with 3D convolutional backbones. Leveraging tensor network modeling, TenVOO represents 3D convolution kernels with lower-dimensional tensors, effectively capturing complex spatial dependencies during fine-tuning with few parameters. We evaluate TenVOO on three downstream brain MRI datasets–ADNI, PPMI, and BraTS2021–by fine-tuning a DDPM pretrained on 59,830 T1-weighted brain MRI scans from the UK Biobank. Our results demonstrate that TenVOO achieves state-of-the-art performance in multi-scale structural similarity index measure (MS-SSIM), outperforming existing approaches in capturing spatial dependencies while requiring only 0.3% of the trainable parameters of the original model. Our code is available at https://github.com/xiaovhua/tenvoo.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3932_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/xiaovhua/tenvoo

Link to the Dataset(s)

UK Biobank: https://www.ukbiobank.ac.uk/ ADNI: https://adni.loni.usc.edu/ PPMI: https://www.ppmi-info.org/ BraTS2021: https://www.med.upenn.edu/cbica/brats2021/

BibTex

@InProceedings{LiBin_ParameterEfficient_MICCAI2025,
        author = { Li, Binghua and Chang, ZiQing and Liang, Tong and Li, Chao and Tanaka, Toshihisa and Aoki, Shigeki and Zhao, Qibin and Sun, Zhe},
        title = { { Parameter-Efficient Fine-Tuning of 3D DDPM for MRI Image Generation Using Tensor Networks } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},
        page = {379 -- 389}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present TanVOO, a new method for fine-tuning DDPM with 3D convolutional backbone. The objective of TanVOO is the minimization of parameters number maintaining the representation of 3D convolution operations.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper il well written. The authors carried out the right way to face and enhance the problem. Also the comparison was done in the right way. The paper could represent an imoprtant step on the field. The problem formulation is well exaplained and demonstrated.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The scenario of application is too strict (3D convolutional backbone); probably it could be enlarged to other scenarions.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method has the right quantity of novelty to be published, is wiritten in good way, also from a formulation point of view. Also the comparisons were done in the right way with SOTA comparison and “ablation studies” (different version of the method).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This work proposes TenVOO, a tensor network-based parameter-efficient fine-tuning (PEFT) method for 3D U-Net-based DDPMs in MRI image generation. By leveraging tensor decomposition to represent 3D convolution kernels with low-rank tensors, TenVOO achieves state-of-the-art structural similarity (MS-SSIM) while using only 0.3% of the original model’s trainable parameters. Experiments on three downstream datasets (ADNI, PPMI, BraTS2021) demonstrate its effectiveness compared to existing PEFT baselines.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. First application of tensor networks for PEFT in 3D DDPMs for MRI generation. The TN-based design effectively addresses the challenge of capturing spatial dependencies in 3D convolutions with minimal parameters.
    2. Comprehensive experiments across diverse MRI datasets (ADNI, PPMI, BraTS2021) validate generalizability.
    3. Significant reduction in the number of trainable parameters compared to full fine-tuning and other PEFT methods
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The use of a frozen tensor network copy for initialization (Section 2.4) is intriguing but lacks sufficient empirical or theoretical justification. An ablation study comparing this approach to standard zero-initialization would clarify its necessity and impact.
    2. While PEFT aims to reduce resource demands, the paper does not quantify training time, memory usage, or inference speed.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a contribution to medical image generation by introducing TenVOO, a novel PEFT method that leverages tensor networks to efficiently fine-tune 3D DDPMs for MRI synthesis. Its strengths lie in its innovation, strong empirical performance, and parameter efficiency, making it highly relevant to the MICCAI community.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel method for parameter-efficient fine-tuning of U-Net based 3D diffusion models based on tensor networks, applied on the task of brain MRI generation. The method successfully addresses the issue of accurately modeling complex spatial dependencies. The authors present two variations of the method TenVOO-L which is a novel formulation and TenVOO-Q which extends an existing method (QuanTA) from 2D to 3D. The method is tested on 3 brain MRI datasets and compared against existing state-of-the-art methods in terms of the number of trainable parameters and generation performance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper has the following strengths: (1) The paper tackles the problem of parameter-efficient fine-tuning with a special focus on preserving complex spatial dependencies which is particularly important in 3D medical imaging where such characteristics are prevalent. (2) The authors present a novel technique based on tensor networks with two variations and clearly present the difference between this method and LoRA which is a widely used PEFT technique. The proposed methods are generalizable to other architectures and tasks as they are a modification to the basic 3D convolutional blocks. (3) The method is evaluated on 3 different brain MRI datasets and compared to other PEFT techniques which demonstrate its generation performance. (4) The authors include an ablation to show the trade-off between efficiency and generation performance when modifying the rank r of the tensor networks.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    This work could be improved by tackling the following limitations: (1) The authors could expand on the differences between TenVOO-L and TenVOO-Q and highlight the potential of each of them compared to the other. (2) If I understand correctly, the authors empirically found that the method needs some weight initialization of the TNs to be performed during the training phase of the model (e.g. when training on the UKB in the experiments?). This seems to limit its usability to fine-tune already existing models which do not take assumption into account. It would be valuable to include the results of the empirical evaluation which lead to this initialization to show how much it improves the model performance over the initialization to zero. (3) The evaluation includes the number of trainable parameter as a metric of efficiency but it does not seem to be significantly different between the different methods especially in the case of jointly-finetuning. It would be nice to include more efficiency metrics such as training and inference time and GPU memory used. (4) It would be nice to assess the variability of the metrics (using standard deviation or confidence intervals for example) and apply statistical testing to assess the significance of the difference in performance between the methods. (5) In table 1, what does the “Real” row represent and why is MMD not computed for this example?

    (6) Minor comment: I would like to know if the original dataset used for training (UK biobank) included any T1 brain MRIs and if so what was their percentage in the dataset?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    It would be great to provide an open-source implementation of the method to ensure reproducibility but also allow other researchers to use the methods developed in other models/applications.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an interesting and novel approach for efficient parameter tuning of 3D CNNs. However the evaluation could be improved including more efficiency metrics to demonstrate the superiority of this method compared to the other techniques, as well as assessing the variability in the metrics.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely thank all reviewers for their positive feedback, particularly regarding the novelty of our work and its significance to the medical community. To address any areas that may have lacked clarity, we provide the following responses to their comments. [R1] The scenario of application is too strict (3D convolutional backbone); it probably could be enlarged to other scenarios. -Agreed. The 3D diffusion model was chosen because its large-scale pretraining better highlights the strengths of the TenVoo. Broader applications will be explored in future work. [R2 & R3] An ablation study comparing the proposed initialization method to standard zero-initialization is necessary. -We appreciate this suggestion. We are conducting an ablation study comparing our initialization method with standard zero-initialization, and the results will be included in the revised manuscript to demonstrate its effectiveness. [R2 & R3] The paper should provide a quantitative analysis of training time, memory usage, and inference speed. -Thank for this valuable suggestion. As noted, memory usage and training speed are important aspects of PEFT methods. We will include a detailed discussion and provide experimental results in the revised version. [R3] The authors should describe and highlight the potential differences between TenVOO-L and TenVOO-Q. -Based on preliminary experiments, we hypothesize that TenVoo-L is more robust in low-rank settings, while TenVoo-Q performs better at higher ranks. However, we have not yet examined this in depth and will leave it for future work. [R3] Why were the MMD metrics for the “Real” setting not included? -Like FID, MMD estimates the divergence between generated and target distributions. For “Real”, this would compare training and validation sets, whose distributions are nearly identical. As MMD is sensitive to small differences, this can cause numerical instability. Thus, following “Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis”, we do not report this result. [R3] Please report result variability and test if performance differences are statistically significant. -Repeated runs are costly, especially for FT methods. Still, we agree with the reviewer and will aim to report statistical results (e.g., mean over multiple runs) in the final version to demonstrate stability. [R3] Pretraining data details from UKB. -We use 59,830 MRI scans from the UKB, all of which are T1-weighted images. [R3] It would be great to provide an open-source implementation of the method. -We will release our code on GitHub upon acceptance of the paper.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top