Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Solving inverse problems, such as image restoration and reconstruction, is essential in medical imaging. Recently, research on deep learning-based methods for solving 3D data problems has become a focus in the field. Existing diffusion models achieve excellent reconstruction quality but face challenges with volume inconsistency and high computational costs when dealing with 3D medical images. To overcome these challenges, we propose Blaze3DM, a novel approach that combines triplane neural fields with a diffusion model for effective 3D medical image reconstruction. Blaze3DM leverages compact, data-dependent triplane embeddings to ensure volume consistency and significantly improve the computational efficiency of the diffusion model. Furthermore, we introduce a guidance-based sampling method for zero-shot 3D inverse problem solving, enabling Blaze3DM to generate high-fidelity 3D volumes from limited, low-quality 2D slices. We evaluate Blaze3DM on various 3D inverse problem tasks across multiple imaging modalities, including sparse-view CT, limited-angle CT, compressed-sensing MRI, and MRI isotropic super-resolution. The experimental results demonstrate that Blaze3DM not only achieves state-of-the-art reconstruction performance but also markedly improves computational efficiency by approximately 22 to 40 times. Code is available at: https://github.com/Jenn-He/Blaze3DM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2187_paper.pdf

SharedIt Link: https://rdcu.be/eHwMR

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_6

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Jenn-He/Blaze3DM

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HeJia_Blaze3DM_MICCAI2025,
        author = { He, Jia AND Li, Bonan AND Yang, Ge AND Liu, Ziwen},
        title = { { Blaze3DM: Integrating Triplane Representation with Diffusion for Solving 3D Inverse Problems in Medical Imaging } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {56 -- 66}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors present a new neural network architecture that combines 2D convolutional neural networks applied in XY, YZ, and XZ planes to enforce consistent three-dimensional volumetric estimates. The authors also claim to introduce a new guidance method, but it appears to use existing guidance methods to their novel architecture.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The strength of the paper is that it addresses a very important issue of three-dimensional consistency. The idea of a tri-plane neural field is a good solution to this problem.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The description of the triplane representation is not very clear. I understand that you are training an implicit neural representation of the image volume, but the operations of the decoder are not veryclear to me from the text in section 2.2. Specifically, in equation (1), the index p is used for both I(p) and F(p), but I seems to have shape HxWxD and F seems to have shape NxNx3C. Based on my previous understanding of neural fields, I assume you mean that F is a function taking position as an input and outputting a scalar image value at that position, but it not clear to me how that that function depends on the NxNx3C tensor. Improving the clarity of the decoder description would be helpful to the reader.

The forward models for your 4 inverse problems are not defined. The paper references the appendix for implementation details, but this has been removed by the program committee. As the paper currently exists, this is a major issue. Please include forward models in the main text of the paper.

There seems to be an error in equation (8). The log-likelihood of a multivariate gaussian includes the inverse of the covariance matrix, you are showing multiplication by the non-inverted covariance matrix. Since you are using homoscedatic noise, this is not causing an issue because it is a single scalar parameter that can be wrapped up in your step size parameter, \lambda, but please review the mathematical definition of that measurement log-likelihood.

Insufficient details on computational implementation. Number of training iterations/epochs? I’m assuming 128x128x32 is the size of the latent representation, where N=128 and C=32, but the sizes of the original 3D image volume is not specified, nor is the voxel spacing, etc

It seems the proposed guidance method is very similar to existing “latent diffusion posterior sampling” which now has many flavors in the literature. See equation (9) of the Song et. al. 2023 paper cited below. Your version is novel in the sense that it incorporates your triplane representation, but the way I see it, the weights of the decoder are just a type of latent representation of the image, and that part is already covered by your other novelty claim. At minimum, I would recommend citing some very similar methods that currently exist. If I’m getting it wrong about this being a version of latent DPS, I’m very open to hear your argument.

Song, Bowen, et al. “Solving inverse problems with latent diffusion models via hard data consistency.” arXiv preprint arXiv:2307.08123 (2023).
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a high quality research article, however, the major issue is that important details were included in an appendix that was redacted by the program committee. I am happy to reconsider my recommendation if the authors are able to modify the main text of the article within the rules.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This work proposes Blaze3DM, incorporating triplane neural representations with diffusion model to solve 3D medical inverse problems. The main contribution of this work is adopting triplane neural representations to represent 3D medical volume. It is a novel idea to incorporate diffusion model learn the distribution of triplanes representations which can capture the distribution in the 3D space. The experiments results show that Blaze3DM achieves faster inference speed compared with two DM-based methods solving 3D medical inverse problems.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) It is a novel idea to incorporate triplane neural representations with diffusion model to model 3D medical image distribution. It addresses the computational burden of diffusion models to direct learn the 3D volume distribution. Compared with current DM-based methods adopting 2D diffusion model to solve 3D inverse problem，the proposed idea can effectively capture the correlations and distribution within the 3D volume.
2) It shows the potential to provide a faster solution of DM-based method to address 3D reconstruction in medical imaging.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1) Concerns Regarding Triplane Representation Learning in Blaze3DM: a) The reconstruction performance heavily depends on the effectiveness of the triplane neural representation. It is crucial to evaluate the model’s fitting ability of triplane neural representation learning. For example, Blaze3DM utilizes 128×128 spatial resolution triplane feature maps to represent 3D volumes. However, the representational capability of the method for larger 3D volumes, such as 512×512×512 images, requires a more thorough evaluation. b) Blaze3DM directly learns the triplane representation maps without incorporating an encoder. This approach raises concerns about scalability and generalization: when fitting large datasets or datasets with varying distributions, it is unclear whether the learned triplane representations can capture the fine-grained details of the images effectively.

2) 3D Inverse problems in Medical Imaging: a) In clinical applications, the forward model typically involves a 3D acquisition process—such as cone-beam CT for CT or a 3D Fourier transform for MRI—where the operator acts on the entire volume, meaning that the measurements are not slice-wise independent. However, Blaze3DM appears to randomly select slices for guidance-based sampling (Algorithm 1), which implies that its forward model does not operate in 3D. It raises concerns that if Blaze3DM were extended to a true 3D forward model, it would need to process all slices in the volume, potentially causing a significant degradation in inference speed. b) I encourage the authors to clarify the details of the forward model settings of experiments in the rebuttal phase and manuscript. (Supplementary is removed by Program Committee). Although the current 3D models claim to solve 3D inverse problems, they predominantly use 2D forward operators—a simplification that creates a substantial gap compared to the actual 3D acquisition processes in clinical scenarios. As such, the claim of solving 3D inverse problems may be overstated unless the methodology and experimental evidence address this discrepancy.

3) Clarification of Equation 8: The notation of \hat{f}0 (f{t-1}) is confusing and needs to be clarified. Specifically, it is unclear what \hat{f}0 (f{t-1}) represents and how Blaze3DM derives \hat{f}{0} from f{t-1}. If it achieves by Tweedie formula, it require a explicit formulation. If this estimation is performed using the Tweedie formula, the paper should provide an explicit formulation of the approach. The current notation degrades the readability of the equation, so a clearer and more precise explanation is recommended.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

I suggest authors give more insightful discussion on triplane based neural representation modeling and designs, which could significantly enhance the paper.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

A novel idea of adopting DM-based methods for 3D representation. See the weakness. If the authors address my concerns well, I would raise my score up.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I have read the authors’ rebuttal. My questions have been resolved. The principal contribution of this work is its novel triplane-based latent representation, which holds great promise for solving large-scale 3D medical inverse problems. Meanwhile, the paper is not focusing on a new design for latent diffusion models; therefore, the technical novelty concern raised by R1 is not essential.

The innovative idea of a triplane-based latent representation is sufficient for MICCAI. Thus, I recommend acceptance.

Review #3

Please describe the contribution of the paper
1. The authors propose a novel approach that combines triplane neural fields with a diffusion model for effective 3D medical image reconstruction.
2. They utilize triplane neural fields to model 3D medical image distributions to ensure volumetric consistency and to boost the computational efficiency of diffusion models.
3. The guidance-based sampling method for triplane diffusion models allows zero-shot 3D inverse problem solving.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The authors have proposed a novel yet effective solution for a very important problem in medical imaging - 3D volumes.
2. The experiments include two different CT problems (sparse view, limited angle) and two different MRI problems (compressed sensing, super-resolution). This provides a good analysis of the effectiveness of proposed solution for a varied set of 3D medical reconstruction problems.
3. The efficiency analysis includes memory and FLOPs in addition to the computational time.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. A good undertsnading of concepts like diffusion learning, neural fields etc may be needed to fully understand this paper. The paper relies heavily on prior knowledge.
2. The number of slices, \gamma has been decided based on GPU memory. But this number would also affect the model performance and should have some relationship to the data volume sizes also. For example, can we say that using 25% of the slices is generally enough? What about 10%? Enough discussion about the choice of \gamma has not been provided.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors have an effective solution to an important problem and have supported their claims well with a thorough evaluation with respect to 4 different reconstruction problems.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I found the authors remarks satisfactory.

Author Feedback

To Reviewer1 1.Eq(1) I(p) denotes the volume intensity at position p. Note that F is the intermediate triplane feature between “3DAM” and “MLP”, as described in the “3D Aware Module” section and Fig.1(a). F(p) is obtained by concatenating representations queried from the projected position of p across three hyperplanes. 2.Forward model In the formula y=Ax+n, A indicates partial sampling in the sinogram for SV-CT and LA-CT tasks, Poisson disk sampling in k-space for CS-MRI tasks, and resolution down-sampling in the slice dimension for ZSR-MRI. The n denotes additive measurement noise. 3.Eq.(8) As you noted, our homoscedastic noise design integrates the noise parameter into the step size λ, avoiding the need for covariance matrix inversion. $\log p(y|f_{t-1})\approx (f_{t-1}-\mu)\nabla_{f_{t-1}}\log p(y|f_{t-1})\approx \left|\mathcal{A}(D_{\phi^*}(\hat{f}0(f{t-1}(p))))-y(p) \right|^2{f{t-1}=\mu}+C_1$. And then $\log p(f_{t-1}|f_t,y)=\log p(f_{t-1}|f_t)p(y|f_{t-1})+C_1\approx-\frac{1}{2}(f_{t-1}-\mu)^T\Sigma_t^{-1}(f_{t-1}-\mu)+(f_{t-1}-\mu)g+C_2=\log p(z)+C_3, z\sim \mathcal{N}(\mu+\Sigma_t g)$,where $g=\nabla_{f_{t-1}} log p(y|f_{t-1})|{f{t-1}=\mu}$ and C_1,C_2,C_3 are constants. The formula derivation indicates that the mean is shifted by \Sigma_t g, shown in Eq(8). (Please see algins in latex form) 4.Implementation The triplane fitting is trained for 8k steps for CT and 40k for MRI. The diffusion model trains for 200k steps for CT and 600k for MRI. The CT volume follows TPDM’s preprocessing and is sized at 256×256×256 with ~2 mm³ voxels. The MRI volume is resized to 256×256×256 after removing black slices, with ~1 mm³ voxel sizes. 5.LatentDPS The cited work presents latentDPS(Section 2,Eq.9) as a baseline method with its main innovation in Resample(Section 3). Our framework differs by removing Resample’s stochastic process, simplifying optimization, and enhancing efficiency. Unlike latentDPS, we focus on 3D inverse problems rather than 2D. We randomly select 2D slices to compute gradients and provide 3D guidance during sampling, which occurs in Vr(Line 4, Algorithm 1), operating independently of the triplane representation f/D(Line 5, Algorithm 1). We value the feedback and have included technical details and citations in the revised manuscript. To Reviewer2 1.We revised the main text to include crucial prior knowledge. 2.We set γ=16 with 256 slices per volume, yielding a 6.25% slice sampling rate per iteration. This achieves sufficient performance without requiring 10%-25% due to random slice-selection mechanism, ensuring near-complete slice utilization across diffusion sampling. A larger γ is theoretically preferable, so we chose the maximum value feasible under our GPU memory constraints (24G). Additional ablation studies on γ will be conducted in future work. To Reviewer3 1.Triplane Neural Field Representation Ability: Triplane neural fields scale representation (e.g., NFD) in 3D CV. Ablation studies on resolution (64/128/256), channel dimensions (16/32/64), and decoder architecture (3DAM/FPE/Sinusoidal) show that triplane dimensions (resolution + channels) and 3DAM decoder depth must scale together for large 3D volumes or out-of-distribution datasets. Encoder-Free Design: We remove the encoder to greatly cut down on computation and time, which is key for our method to be used in real clinical use. And our current results still achieve promising performance. We will also work on optimizing the method for future applications on larger and more complex datasets. 2.Forward Model Our forward model utilizes TPDM’s slice-independent degradation assumption. For complete-slice inference in real 3D forward models, we suggest faster sampling strategies (e.g., DDS over DPS) to maintain efficiency. 3.Eq(8) \hat{f}0 is estimated using the Tweedie formula, with citation and explicit formula included in the revised manuscript: \hat{f}_0(f{t-1})=E[f_0 |f_{t-1}]=\frac{1}{\bar{\alpha}_t}(x_t-\sqrt{1-\bar{\alpha}_t }\epsilon_t).

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper introduces a novel triplane‐based diffusion framework that convincingly improves 3D medical reconstruction across CT and MRI tasks, and the authors’ rebuttal resolves the implementation gaps raised by Reviewer 1; while one reviewer remains concerned about redacted details, the other two endorse its significance and thorough evaluation, so on balance the paper merits acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper introduces a triplane approach for efficient 3D MR reconstruction. The authors demonstrate the effectiveness of their proposed methods through both quantitative and qualitative results.

back to top

Blaze3DM: Integrating Triplane Representation with Diffusion for Solving 3D Inverse Problems in Medical Imaging

Author(s):