Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

All-in-one medical image restoration (MedIR) aims to address multiple MedIR tasks using a unified model, concurrently recovering various high-quality (HQ) medical images (e.g., MRI, CT, and PET) from low-quality (LQ) counterparts. However, all-in-one MedIR presents significant challenges due to the heterogeneity across different tasks. Each task involves distinct degradations, leading to diverse information losses in LQ images. Existing methods struggle to handle these diverse information losses associated with different tasks. To address these challenges, we propose a latent diffusion-enhanced vector-quantized codebook prior and develop DiffCode, a novel framework leveraging this prior for all-in-one MedIR. Specifically, to compensate for diverse information losses associated with different tasks, DiffCode constructs a task-adaptive codebook bank to integrate task-specific HQ prior features across tasks, capturing a comprehensive prior. Furthermore, to enhance prior retrieval from the codebook bank, DiffCode introduces a latent diffusion strategy that utilizes the diffusion model’s powerful mapping capabilities to iteratively refine the latent feature distribution, estimating more accurate HQ prior features during restoration. With the help of the task-adaptive codebook bank and latent diffusion strategy, DiffCode achieves superior performance in both quantitative metrics and visual quality across three MedIR tasks: MRI super-resolution, CT denoising, and PET synthesis.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1126_paper.pdf

SharedIt Link: https://rdcu.be/eHxd2

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05325-1_7

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{CheHao_AllinOne_MICCAI2025,
        author = { Chen, Haowei AND Yang, Zhiwen AND Hou, Haotian AND Zhang, Hui AND Wei, Bingzheng AND Zhou, Gang AND Xu, Yan},
        title = { { All-in-One Medical Image Restoration with Latent Diffusion-Enhanced Vector-Quantized Codebook Prior } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {67 -- 77}
}

Reviews

Review #1

Please describe the contribution of the paper

1.The paper exploits the VQ codebook prior for all-in-one MedIR by pre-training a task-adaptive codebook bank. 2.The paper retrievals the VQ prior by leveraging a latent diffusion model. 3.The quantitative and qualitative experiments validate the effectiveness of proposed method.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper introduces a novel method that combines the VQ codebook and the latent diffusion model to achieve adaptive compensation for different types of degraded information.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Due to the multi-stage nature of the proposed method’s training process, its computational overhead and deployment efficiency appear to be lower than those of the compared approaches. This issue should be discussed in more detail.
2. There are some points of confusion regarding Fig. 1: (i) The arrows are overly cluttered. (ii) The TAC module is not explained well, particularly why it has both I_lq and I_hq inputs. (iii) The “frozen” icon could make readers think the model is not trained in any of the stages. Consider refining Fig. 1 for clarity.
3. Concerning how the retrieved reference image is fused with the backbone, is there any ablation study on this design choice (e.g., using cross-attention or other mechanisms)?
4. The diffusion-based retrieval is a major highlight of the paper. It is recommended to show some illustrative retrieval examples in the experimental results to further validate its effectiveness.
5. Section 2.4 requires a clearer exposition, including more details about the task-aware classifier and a discussion of how TARM differs from the Task-Adaptive Routing mechanism employed in AMIR.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This study shows a degree of methodological innovation by integrating vector-quantized priors with diffusion-based retrieval to adapt to all-in-one MedIR. However, there is still room for further discussion regarding resource consumption, methodological details, and the validation of results. Therefore, I rate it as “weak accept.”
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper
- The integration of task-specific VQ codebook priors with a latent diffusion refinement process in a single framework for medical image restoration.
- Introduction of a codebook bank architecture, which avoids codebook conflicts by learning task-separated priors that can be dynamically retrieved based on input.
- Use of latent diffusion not as a generative model per se, but as a prior refiner that improves the fidelity of retrieved latent features before decoding.
- Empirical validation across three highly heterogeneous modalities (MRI, CT, PET), showing that the unified DiffCode model outperforms both task-specific methods and prior all-in-one models.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- To the best of my knowledge, this is the first work to combine VQ-based codebook priors and latent diffusion in an all-in-one restoration setting. While both components have been explored individually in other domains, their integration with task-aware retrieval is new.
- The use of task-specific codebooks and routing reduces inter-task interference, addressing a key challenge in all-in-one models. The ablation comparing shared vs. banked codebooks supports this claim.
- The latent diffusion module improves upon retrieved codes by modeling residual uncertainty or degradation-induced noise. This relax the rigidity of codebook retrieval while avoiding the overhead of full diffusion sampling at pixel level.
- DiffCode performs competitively across MRI, CT, and PET—modalities with widely different acquisition principles and appearance. This shows robustness of the method to data heterogeneity.
- The paper includes detailed analysis of the benefit of each design component (codebook bank, routing, diffusion), and visualizations support the qualitative gains over baselines.
- Inference efficiency: Unlike pixel-space diffusion models, DiffCode’s lightweight latent diffusion module allows fast inference and makes the system viable for real-world deployment.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

None critical. The main architectural novelty is well-supported, and while prior work has explored VQ or diffusion individually, no known work has integrated them into a unified task-adaptive framework for multi-task restoration. The combination is both non-trivial and empirically effective.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- While the paper compares with strong baselines (e.g., HiDiffusion, TransEM), it could benefit from mentioning related works like DiffBIR or Diff-Restorer from the natural image restoration domain for broader context.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

DiffCode introduces a technically novel and well-motivated hybrid framework, addressing a real limitation in current restoration research: generalization across tasks without sacrificing quality or requiring per-task fine-tuning. Its design draws strengths from both generative modeling (diffusion) and discrete latent representation (codebooks), and combines them in a task-adaptive way—something not previously done (as far as I know) in medical image restoration or general restoration.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper proposes a latent diffusion-enhanced vector-quantized codebook prior framework (DiffCode) for unified modeling of multi-task medical image restoration. Its main contributions include: First, to address the significant differences in degradation patterns and diverse information loss across tasks (e.g., MRI super-resolution, CT denoising, PET synthesis) in multi-task medical image restoration, this paper innovatively constructs a task-adaptive vector-quantized codebook bank. Second, to resolve the distribution shift between low-quality (LQ) image features and high-quality prior features in latent space, this paper introduces a latent diffusion strategy that optimizes LQ feature distributions progressively through diffusion models’ iterative denoising capability, enhancing prior feature retrieval accuracy. Additionally, it designs a task-aware global routing strategy that alleviates multi-task interference by assigning dedicated network experts to different tasks. Extensive experiments demonstrate that DiffCode outperforms existing methods in metrics such as PSNR and SSIM, while also exhibiting superior visual restoration quality.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper proposes a novel framework that integrates task-specific high-quality (HQ) prior features from diverse medical image restoration tasks (e.g., MRI super-resolution, CT denoising, PET synthesis) through the construction of a multi-task shared codebook bank. Each task maintains an independent codebook, which is concatenated to form a unified yet task-adaptive codebook bank. This design enables customized compensation for information loss across different tasks, thereby addressing the prior adaptation challenges in multi-task scenarios through task-specific feature preservation.
2. This work introduces an iterative denoising mechanism derived from diffusion models (DM) to optimize latent feature distribution alignment. It represents the successful integration of diffusion models with vector-quantized (VQ) codebooks for medical image restoration, establishing a novel paradigm for cross-task feature alignment through hybrid architecture innovation. The synergistic combination of VQ-based prior preservation and DM-based iterative refinement demonstrates unprecedented capability in handling heterogeneous medical imaging modalities.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. It is recommended that the authors further supplement the introduction of the NAF Block, particularly its design motivation, structural characteristics, and specific role in this study, to enhance the completeness and readability of the article.
2. Additionally, it is suggested to include average performance metrics of all compared methods under identical tasks in Table 1 to more comprehensively and intuitively demonstrate the advantages of the proposed method.
3. Regarding the design choice of adopting the L1 loss function in both Equation (6) and Equation (7), it is also recommended that the authors further elaborate on its applicability and potential advantages over L2 loss in the main text, to help readers better understand how this choice contributes to the model’s performance.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This study demonstrates outstanding methodological innovation, which is primarily reflected in the following two aspects: 1) the innovative construction of a task-adaptive vector-quantized codebook bank, and 2) the introduction of latent diffusion models for multi-task medical image restoration, significantly enhancing model performance. The paper presents a logically clear methodology and a comprehensive, systematic experimental design: extensive comparisons with state-of-the-art baseline methods demonstrate notable improvements in visual results, while ablation experiments validate the effectiveness of each module. Given these innovations and experimental validations, the research findings make a significant contribution to the field, and I recommend acceptance for publication.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely appreciate the reviewers for acknowledging our methodological contribution and providing constructive comments for further clarification. Our feedback is as follows.

Q1(R3): Resource Consumption A1: Regarding training, Stage 1 and Stage 2 incur a one-time cost and can subsequently provide guidance for training any restoration network. For inference, we measured the computational overhead on 256×256 images, with Diffcode, AMIR, and NDR requiring 153.38, 127.06, and 139.99 GFLOPs, respectively. Our model does exhibit slightly higher overhead, and we will discuss this issue in our manuscript.

Q2(R3): Refinement of Fig. 1 A2: (i)We will merge redundant arrows, such as the task-aware global routing in each stage. (ii) At each stage,TAC takes either I_lq or I_hq as input, but never both at once. Since identifying task information from heterogeneous medical images is not particularly challenging, TAC performs reliably with either HQ or LQ inputs. We will clarify in the figure caption that TAC only takes one input type (HQ/LQ) per stage. In addition, we will use distinct colored lines in Fig. 1 to visually inform this. (iii) We will include explanatory notes for the “frozen” icon in Fig. 1, specifying exactly after which pretraining stage the model is fixed.

Q3(R3): Fusion of Reference A3: We concatenate the retrieved reference and the LQ image along the channel dimension to fuse them. As our primary focus is the VQ codebook prior, we leave the comparison of fusion mechanisms (e.g. cross-attention) to future work, but we will discuss in the final manuscript that more advanced strategies may further enhance performance.

Q4(R3): Illustrative Retrieval Examples A4: We will add them in the final submission.

Q5(R3): TAC; DiffCode vs. AMIR in Routing Mechanism A5: We will add the network architecture of TAC; DiffCode uses a global, one-shot routing mechanism to assign distinct experts in TARMs across all layers, promoting robust task separation. AMIR performs per-layer routing by fusing intermediate features, enabling layer-wise adaptivity but risking uneven expert use and task interference.

Q6(R2): NAF Block A6: NAF Block is designed to deliver effective and efficient performance by reducing redundant operations and unnecessary complexity. Structurally, the NAF Block removes nonlinear activation functions and adopts a simplified attention mechanism. In this study, we integrate it to boost efficiency, maintaining high performance at a lower computational cost. We will supplement this introduction in the final submission.

Q7(R2): Average Performance A7: We will add them in the final submission.

Q8(R2): L1 vs. L2 loss A8: L1 loss is chosen because it is more robust to outliers, preserves sharp edges without over-smoothing, and enables more stable optimization than L2 loss. We will clarify this choice in the final submission.

Q9(R1): Broader Related Work A9: We will include related works from natural image restoration domain for broader context in the final submission. Special Thanks for your encouraging feedback!

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

All-in-One Medical Image Restoration with Latent Diffusion-Enhanced Vector-Quantized Codebook Prior

Author(s):