List of Papers Browse by Subject Areas Author List
Abstract
Cardiac magnetic resonance (CMR) imaging is one of the most important imaging modalities for cardiac analysis. However, short-axis CMR imaging can only produce a sparse set of 2D images with an extremely low inter-slice resolution. Moreover, these 2D slices are usually misaligned due to the respiratory and cardiac motion of the patients, strongly affecting the diagnosis and intervention procedures for cardiac diseases. Deep learning-based approaches have been proposed to tackle these problems, but they mostly focus on voxel representation, yielding rough cardiac surfaces that are difficult to analyze. Therefore, we propose a deep learning-based method to perform CMR motion correction and super-resolution simultaneously to acquire high-fidelity left ventricular myocardial surfaces. Given a set of 2D misaligned sparse segmentation masks of the left ventricular myocardium, our method first leverages an end-to-end convolutional neural network to correct and super-resolve the masks to approach the distribution of the motion-free and high-resolution masks. Then, the acquired super-resolved segmentation masks are estimated to form coarse signed distance grids, guiding a latent diffusion model to produce the corresponding high-fidelity myocardial surfaces. The superior performances of our approach are testified through comprehensive experiments in both simulation and clinical settings.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1550_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{ZhaZic_Mask2Surface_MICCAI2025,
author = { Zhang, Zichen and Liu, Zhentao and Zhang, Zeng and Cui, Zhiming},
title = { { Mask2Surface: Motion Correction and Super-Resolution for Cardiac Surface Reconstruction Using Latent Diffusion } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15961},
month = {September},
page = {334 -- 343}
}
Reviews
Review #1
- Please describe the contribution of the paper
They authors propose an approach based on neural networks to correct slice misalignment in CMR and reconstruct cardiac surfaces from the segmentation masks. The main technical contribution appears to be using a latent diffusion model on signed distance grids (SDGs)
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Leveraging SDGs allows for implicit shape representation, which can offer advantages in terms of smoothness and resolution. Also, integrating motion correction and surface generation single framework is practical.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The paper applies latent diffusion models but the innovation is more of a recombination of existing techniques rather than a fundamentally new approach. Also, I fail to understand the point of using low-resolution information from those obtained from the images. The scope is super-resolution, and doing a donwsampling step before seems unintuitive and unnecessary. How does this work in case of myocardial infarction when downsampling remove all small details? Authors should consider the approach using the native resolution masks (instead of downsampling 5 times). The paper does not clearly demonstrate how its method outperforms existing deep learning-based motion correction and super-resolution approaches. Especially, there is a quite strong drop in performance moving from the simulation dataset to the clinical one, which is not the case for other approaches considered. This might indicate that the approach is not robust or that it is overfitting the data. Additionally, a comparison against more classical non-learning-based methods (e.g., model-based motion correction) would strengthen the claims. The ablation study is confusing and does not make much sense. Why would someone consider low resolution masks?
- Please rate the clarity and organization of this paper
Poor
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The novelty of the paper is limited (mainly residing in the use of diffusion models for the generation of the surfaces ) some aspects are not clear. While the topic is per se relevant, it feels just an incremental work with respect to the state of the art
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
This paper introduces Mask2Surface, a method for reconstructing high-fidelity cardiac surfaces from sparse and misaligned CMR segmentation masks. The approach first applies a CNN for motion correction and super-resolution, then generates a mesh from the corrected masks to estimate coarse signed distance grids (C-SDGs). A template mesh is registered to the generated mesh to produce ground-truth signed distance grids (SDGs), which are compressed into a compact latent space using a VQ-VAE. The encoder of the VQ-VAE extracts latent features from both C-SDGs and SDGs, and a latent diffusion model is trained to reconstruct SDGs from the conditional C-SDG features. Experiments on a public cardiac dataset under both simulated and clinical settings demonstrate superior performance compared to previous methods.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Integrates motion correction, super-resolution, and surface reconstruction into a unified framework for cardiac MRI.
- Proposes a novel approach by extracting features from signed distance grids generated from meshes, achieving smoother surfaces than intensity-based methods.
- Compresses signed distance grids into a compact latent space using VQ-VAE, enhancing diffusion model stability and generation quality.
- Achieves higher reconstruction accuracy on both simulated and clinical data from a public cardiac dataset compared to previous methods.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The contributions of the paper are not clearly defined. It is recommended that the main contributions be explicitly listed in the Introduction section to better highlight the novelty and scope of the work.
- In Section 2.1, the SDG computation is not clearly defined, only C-SDG has an explicit equation (eq. 2), and the difference between SDG and C-SDG generation needs clarification.
- The combination method of Z_c and Z_s before the denoiser is unclear, if it is concatenation as implied in Figure 1, the authors should explicitly describe it and justify why concatenation is beneficial.
- It is not stated whether the conditional encoder for C-SDG is frozen during diffusion model training.
- The relationship between MC-Net and SR-Net is not clearly described, it is unclear whether they are trained separately or jointly, and how they are structurally connected.
- Figure 1 is difficult to clearly convey the full pipeline. Specific issues include: (a) SDG and C-SDG seem to be generated from both the mask and the mesh, it is worth showing this clearly in the figure. (b) In the text, the registration operation aligns the template mesh to the generated mesh, but this step is not shown in the figure, it would be better to make it explicit. (c) The encoder for SDG and C-SDG uses the same VQ-VAE, using the same color could make this clearer, and if the encoder is frozen, it would be better to indicate that in the figure. (d) It is unclear whether the motion correction network (MC-Net) and super-resolution network (SR-Net) are trained separately, if they are trained together, it would be better to group them within a larger box to show they form an end-to-end network.
- VQ-VAE is trained only on clean high-resolution data within the same dataset, which may introduce bias when handling noisy clinical inputs. The authors should further explain the rationale for this design choice, such as by assuming a perfect encoder or providing comparisons on outlier datasets to justify its effectiveness.
- Although the authors compare several methods and components, the ablation studies lack systematic organization (e.g., explaining the purpose of each ablation study and specifying which modules are used or modified in each experiment), making it difficult to isolate the effect of individual components.
- Although the authors provide detailed descriptions of some implementation aspects, there are still missing details, such as whether the conditional encoder is frozen and how MC-Net and SR-Net are connected. In addition, code and pretrained models are not released, limiting reproducibility.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The use of SDGs as an intermediate representation and the application of VQ-VAE to stabilize diffusion model training are interesting and promising for cardiac surface reconstruction. Although the ablation studies are not systematically organized, the experimental results provide reasonable evidence supporting the effectiveness of SDG modeling, VQ-VAE compression, and the overall framework. If the authors can address the identified issues, including clarifying the pipeline and figures, explaining missing implementation details, revising the ablation study structure, and explicitly stating the purpose of each experiment, the paper could be acceptable for publication. Additionally, releasing the code would significantly enhance the paper’s reproducibility and impact.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The paper proposes a method for mitigating problems with low inter-slice resolution and motion distortion in CMR imaging. The method is based on a latent diffusion model utilizing discretized signed distance fields, a VQ-VAE and a latent diffusion model conditioned on the noisy, low-resolution initial segmentation.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The paper tackles an important and real-world clinical problem and proposes an appropriate solution.
-
The method is thoroughly evaluated in both a simulated and a real clinical setting demonstrating good performance.
-
The method is compared to five SOTA methods showing a consistent improvement in both simulated and clinical setting.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
The choice of discretized SDFs are not really motivated. How much is gained compared to using standard binary labelmaps? Have the authors considered using continuous SDFs instead?
-
The process of fitting a template mesh to generate the GT SDG is not described. It further seems like a straightforward comparison to fit the same template to the super-resolved mask as a baseline method.
-
The method relies on high-resolution images to train. I suspect that most clinical sites will not have access to a training dataset with high resolution CMR images. Could the authors elaborate on if the model is expected to transfer to other scanners, sites and MRI sequences or would this method rely on specific training data?
-
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
-
Matching cubes = marching cubes (p.2)
-
For usability in a clinical setting it could be interesting to evaluate clinical relevant measures such as wall-thickness to the evaluation.
-
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The method tackles and important problem, proposes a suitable solution, evaluates it thoroughly and demonstrates an improvement over current SOTA.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
Dear reviewers, We greatly appreciate your constructive feedback. As follows, we will address the raised concerns.
- Clarification of the background, scope, and data (Reviewers #1, #2, #4). Clinical short-axis CMR imaging only produces motion-corrupted and extremely sparse image slices, yielding low-resolution segmentation masks. We aim to tackle this hard problem, proposing the first method to reconstruct smooth cardiac surfaces from low-quality masks. The utilization of simulated masks originates from the sparsity of high-resolution CMR data. As mentioned above, clinical CMR imaging devices only produce low-resolution images, and only advanced research-grade CMR imaging devices can produce high-resolution images, causing high-resolution 3D CMR data to be extremely rare. Paired low-resolution and high-resolution CMR data are even sparser. Using simulated data can mitigate this problem, and we believe that this is an appropriate paradigm for many medical domains where data availability becomes a problem.
- Effectiveness and ablation study (Reviewers #1, #4). The ablation study mainly verifies the effectiveness of C-SDGs as conditionings, VQ-VAE, and SDG-based latent diffusion. As clinical CMR imaging only produces low-resolution data, it is an intuitive idea to directly feed the low-resolution masks or the super-resolved masks after applying MCSR-Net as conditions into the diffusion model to yield the reconstructed cardiac surfaces. The ablation study demonstrates these settings lead to inferior results. The study on VAE and conditional GAN aims to verify the contribution of VQ-VAE and the latent diffusion. The detailed analysis is provided in section 3.3.
- SDGs and C-SDGs (Reviewers #2, #4). SDGs, as a structured shape representation, generate smoother shapes via Marching Cubes compared to binary masks. Continuous SDFs mostly rely on MLPs to enable coordinate-value querying, and this usage is not as convenient and compact as our approach because SDGs enable the uniform utilization of CNNs for both VQ-VAE and latent diffusion architectures. SDGs and C-SDGs store the signed distances from the grid points to the meshes. For SDGs, the meshes are the registered watertight template mesh, enabling direct computation of signed distances. For C-SDGs, the meshes are obtained using the segmentation masks and Marching Cubes. These meshes are unnecessarily watertight and can not be used to compute signed distances. Thus, we propose the method in the paper to estimate C-SDGs.
- Some implementation details (Reviewers #2, #4). The template mesh fitting process uses an MLP to deform the template to fit the target meshes. The inputs are the mesh vertex coordinates, and the outputs are the deformation vectors. The loss function is a combination of Chamfer distance, Laplacian smoothing, and an L2 norm of the vectors. The conditional encoder is frozen during diffusion training. MC-Net and SR-Net are sequentially connected and trained.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A