Abstract

Dynamic virtual populations are critical for realistic in-silico cardiovascular trials, yet current approaches primarily generate static anatomies, limiting their clinical and computational value. In this study, we present 4D CardioSynth, a generative framework for constructing dynamic 3D virtual populations of cardiovascular structures that change over time (3D+t). To model the complex interplay between cardiac structure and motion, we develop a factorised variational approach that disentangles spatial and temporal information in latent space, enabling independent control over anatomical variations and motion patterns. We demonstrate 4D CardioSynth’s performance using a diverse dataset of bi-ventricle shapes acquired from 6,500 patients across complete cardiac cycles. Our results illustrate the superiority of 4D CardioSynth over state-of-the-art methods with respect to anatomical specificity, diversity, and generalisability, as well as motion plausibility. This approach enables more accurate virtual trials for cardiovascular interventions.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2701_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{DouHao_4D_MICCAI2025,
        author = { Dou, Haoran and Huang, Jinghan and Zakeri, Arezoo and Zhou, Zherui and Mu, Tingting and Duan, Jinming and Frangi, Alejandro F.},
        title = { { 4D CardioSynth: Synthesising Dynamic Virtual Heart Populations through Spatiotemporal Disentanglement } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {2 -- 11}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a 4D cardiac mesh synthesis approach for simulating virtual patient populations. The key idea is to disentangle the latent representation into spatial and temporal components.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Simulating a 4D virtual cardiac population holds significant clinical potential, enabling in-silico trials that may improve treatment planning and patient outcomes.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The proposed method does not outperform the classical VAE baseline. As shown in Table 1, it performs worse than VAE in 6 out of 9 metrics. This challenges the claim that it is “comparable” to VAE. Moreover, the experiments primarily emphasize reconstruction accuracy rather than the diversity or novelty of the generated virtual heart models, which is central to the paper’s stated goal. Additionally, no ablation study is provided to justify the value of disentangling spatial and temporal latents, making it difficult to assess the effectiveness of this design choice.

    2. Based on the methodological descriptions, the approach appears to be a standard autoencoder rather than a VAE, as there is no mention of a sampling step. Additionally, the factorization of the joint distribution in Eq. 1 appears empirical and lacks theoretical justification. The model considers only two time points instead of the full temporal sequence, raising concerns about its ability to enforce temporal smoothness.

    3. The proposed loss function relies on per-frame segmentations to compute motion supervision as described in Eq.5, which could be prohibitively expensive or impractical in real-world applications where dense annotations are rare.

    4. The manuscript’s writing requires substantial improvement. Key technical details are missing. For example, Eq. 6 defines a KL divergence term, but the priors for $z^s$ and $z_0^t$ are undefined, and the KL derivation is omitted. The paper also lacks a discussion on hyperparameter sensitivity. Furthermore, no dedicated related works section is provided, and the introduction references only a limited number of prior studies.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (1) Strong Reject — must be rejected due to major flaws

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See main weakness.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a novel 3D+t generative model for synthesizing ventricular motion called 4D CardioSynth. The key innovation of 4D CardioSynth is the disentangling of a spatial latent variable and a temporal variable which deforms the initial shape to match the observed sequence. The evaluation is carried out on bi-ventricular models acquired from 6,500 patients, with 50 time points of data throughout the full cardiac cycle. Comparison to CHeart demonstrated improvements as measured by the specificity, diversity, and generalisability of the generated 3D+t sequences.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed work addresses an interesting and relevant problem. The methodology proposed is well rationalized and clearly described. The decomposition into spatial/static and dynamic latents, with the latter modeled by a temporal function, is intuitive and interesting. The modeling of the sequence as the deformation of the initial shape (eq 3) is interesting.

    The paper is overall well written and clear to follow. The evaluation was relatively thorough, considering both initial frame reconstruction and dynamic trajectory generation. The clinical relevance aspect is a particular strength.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The improvement observed is overall at the same level with the reported std. The significance of the reported improvements should be tested.

    The “generalisabilty” metric as defined as “the reconstruction error between the reconstructed and unseen shapes” was unclear — does it mean that for an observed sequence, the model is predicting the trajectory given only the initial frame, and the generated sequence is compared to the observed sequence?

    If the above interpretation is correct — It seems intuitive to assume that the motion sequence is dependent on factors beyond the initial shape, i.e., with the same initial shape, a different sequence may be observed due to difference in the underlying conditions of the ventricles (material property, etc). How do you think the presented model may be modeling such differences?

    Since over time the deformation \phi_k is learned to deform x0, it seems that one could let the temporal component z^t model the deformation only such at z^t_0 can be 0 and evolve over time, such tat phi_k can be a function of z_k^t only. In this way, there is no need to try to separate z^s and z^t_0 from the same x0, which seems to be difficult and it is not clear what differences they should have — in a way, in the current formulation, it seems that z^s and z^t_0 should be the same anyways. Please comment.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper addresses an interesting problem with an interesting and novel approach. The methodology is well rationalized and clearly described. The evaluation was thorough, although some clarification of the evaluation metrics are needed and the significance of the improvements need to be tested.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I have read through the rebuttal and fellow reviewers’ comments. I remain positive about the novelty of the method presented, especially with the clarifications on the statistical significance of results.



Review #3

  • Please describe the contribution of the paper

    VAE-like architecture (operating on cardiac meshes, therefore with graph convolutions) that explicitly disentangles spatial and temporal components of 3D+t data, with a single encoder but dedicated decoders for shape and motion reconstruction. The KL loss considers both components separately, and an additional loss quantifies the consistency between the time 0 mesh propagated to time k with the learnt transformation between these times. Application is on a large database of biventricular meshes from the UKBiobank, with specific examination of generated meshes both on spatial and temporal aspects.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Use of modern techniques for representation learning on dynamic mesh data (graph NN, VAE)
    • Interesting proposal for spatiotemporal disentangling within a VAE
    • Application to a large database of biventricular meshes
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Missing comparisons and literature review regarding previous attempts for disentangling (with VAE) and spatiotemporal models (not necessarily with VAE) / beyond CHeart and 4DCardioSynth.
    • Limited assessment of the clinical value of such spatiotemporal disentangling
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Major comments:

    • see weaknesses listed above
    • The literature review is missing existing attempts for spatiotemporal analysis / atlases in cardiac imaging. I’m thinking in particular of works attempting to better structure the latent space (from VAE) for temporal sequences, targeting better temporal consistency, works much before VAE using bilinear statistical shape models for spatiotemporal aspects of 3D+t cardiac atlases, and to a broader extent VAE architectures explicitly targeting temporal/longitudinal aspects. Still, the proposed modeling in Sec.2.1 is interesting.
    • The literature review and methods comparisons could also be enriched beyond CHeart and 4DCardioSynth: regarding disentangling strategies, focusing on those that may be relevant for this application (e.g. a given continuous scalar (here, time) along a given dimension).

    Other comments:

    • p.5: how were the hyperparameters \lambda_1,2,3 set ?
    • p.5: I appreciate the look for relevant evaluation metrics. Could these be enriched with ones specific to temporal aspects? (not necessarily existing ones)
    • p.6: Are VAE actually needed for such mesh data, which is very smooth? On similar data, it is already hard to justify the need of complex shape analysis methods considering the non-linear data space, compared to simple methods such as PCA.
    • I’m not sure to understand how the validation and testing sets were used in the experiments Sec.3.4.
    • Fig.3 is very qualitative and with very subtle things to assess. Please consider enriching the description of these results, or supporting them by quantitative values.

    Writing and presentation:

    • p.7: please better justify the claim that start/end of the cycle are instant “where anatomical precision is clinically essential”.
    • p.8: although I’m aware of the page limitations, the current Discussion is very brief and could be enriched, in particular regarding the potential limitations.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like the proposed method, which is timely and relevant. However, I also estimate that manuscript updates are needed, in particular to better situate the proposed methods vs. specific literature, and this might involve some additional comparisons. I would therefore like to see the authors’ rebuttal on this.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have provided satisfactory answers to the main concerns from the reviewers (not only mine, which mostly concerned better situating this work in the literature). My decision is also motivated by the originality of the proposed work (whose experimentak setting could still be improved, but maybe not in this MICCAI paper).




Author Feedback

We thank the reviewers for their valuable feedback. Our responses are included below:

Q: Evaluation (R1, R3) A: Student’s t-tests were conducted and confirm that all improvements are statistically significant (p < 10^-3). Generalisability metric evaluates our model’s ability to reconstruct unseen shapes at the ED time point only, not trajectory prediction, assessing how well it generalises to new anatomies.

Our experiments primarily evaluate generated virtual populations - only generalisability involves reconstruction, while specificity and diversity directly assess the quality of virtual heart population. Figure 2 demonstrates the physiological plausibility of the generated virtual population compared with real one across the cardiac cycle.

Q: Literature review and comparisons (R2, R3) A: We will enrich our background discussion covering bilinear statistical shape models (BSMs), VAE structuring latent spaces for spatiotemporal data, and disentanglement strategies. We implemented three additional comparison methods: BSM, Variational Recurrent Neural Network (VRNN), and conditional VAE (cVAE) using time as a condition. Results show that BSM achieves the best specificity (1.72 vs 2.27) but fails in coverage (2.2 vs 48.3), indicating substantial overfitting. VRNN shows marginally higher coverage (+1.4%) at substantial cost to specificity (+20.3% error) compared to 4D CardioSynth. The cVAE performs similarly to CHeart (specificity: 3.12, coverage: 42.1, generalisability: 1.68), confirming that simple time conditioning without proper disentanglement is insufficient. Our approach achieves an optimal balance between accuracy and diversity.

Q: Formulation (R1) A: Our design deliberately separates z^s and z^t_0 to model different attributes of cardiac shape and motion. While z^s captures patient-specific anatomical features that remain constant throughout the cardiac cycle, z^t_0 encodes the initial motion state and temporal dynamics. This separation is physiologically motivated, as it reflects how cardiac anatomy and function can vary independently. Setting z^t_0=0 as suggested would eliminate an essential source of randomness, restricting the model’s ability to capture diverse cardiac behaviours. By keeping z^s fixed while sampling different z^t_0 values, our approach can generate virtual instances with identical anatomical structures but different motion patterns.

Q: Clinical Value (R2) A: While our current evaluation demonstrates that 4D CardioSynth captures realistic LVV dynamics throughout the cardiac cycle (Fig. 2), we plan to expand this in the future by (1) calculating additional metrics (e.g., ejection fraction); (2) demonstrating independent manipulation of anatomy/motion for disease progression simulation.

Q: Comparison to VAE (R3) A: VAE serves as a reference benchmark for showing the performance ceiling of spatial-only modelling (without any temporal capabilities). Unlike other joint modelling methods (CHeart) that typically sacrifice spatial accuracy, our disentanglement strategy maintains spatial accuracy while handling the significantly more complex task of modelling the complete cardiac cycle. We respectfully clarify that 4D CardioSynth outperforms the VAE on 5/9 metrics (RV, BiV diversity, all generalisability).

Q: Methodology Clarification (R3) A: Our model is a proper VAE with sampling. While we simplified notation for clarity, our implementation includes standard VAE sampling during training and inference, and we optimise the ELBO objective with KL divergence shown in Eqs. 6-7. The factorisation in Eq. 1 follows established probabilistic graphical model principles and assumptions of independence between variables. Our method models the conditional distribution between the start point and an arbitrary time point k through TRM, capturing temporal dynamics via the first-order Markov property in Eq. 2. Figure 2 validates this approach with smooth LVV changes and consistent motion patterns.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top