Abstract

Echocardiography is routine for cardiac examination. How- ever, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice.Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quantification.However, it remains challenging due to the rare paired data, complex structures, and US noises. In this study, we introduce a novel generative framework UltraTwin, to obtain cardiac anatomical twin from sparse multi-view 2D US. Our contribution is three-fold. First, pioneered the construction of a real-world and high-quality dataset containing strictly paired multi-view 2D US and CT, and pseudo-paired data. Second, we propose a coarse-to-fine scheme to achieve hierarchical reconstruction optimization. Last, we introduce an implicit autoencoder for topology-aware constraints. Extensive experiments show that UltraTwin reconstructs high-quality anatomical twins versus strong competitors. We believe it advances anatomical twin modeling for potential applications in personalized cardiac care.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0875_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/JacksonYu-321/UltraTwin

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YuJun_UltraTwin_MICCAI2025,
        author = { Yu, Junxuan and Duan, Yaofei and Huang, Yuhao and Wang, Yu and Ling, Rongbo and Luo, Weihao and Zhang, Ang and Xu, Jingxian and Ni, Qiongying and Zhou, Yongsong and Li, Binghan and Dou, Haoran and Liu, Liping and Chu, Yanfen and Geng, Feng and Sheng, Zhe and Ding, Zhifeng and Zhang, Dingxin and Huang, Rui and Zhang, Yuhang and Xu, Xiaowei and Tan, Tao and Ni, Dong and Gou, Zhongshan and Yang, Xin},
        title = { { UltraTwin: Towards Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {616 -- 625}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a novel generative framework that reconstructs high-fidelity 3D cardiac shapes from sparse multi-view 2D US images. The main contributions are:

    1. The authors construct a real-world dataset containing rigorously paired multi-view 2D US and ECG-gated CT volumes, as well as a pseudo-paired dataset for robust pretraining.

    2. The proposed method employs a conditional diffusion transformer architecture with a hierarchical denoising process, enhanced by anisotropic refinement and template guidance.

    3. The integration of a trained implicit autoencoder imposes anatomical and topological priors, refining the outputs during inference without re-training.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The construction of a rigorously paired real clinical dataset is a strength. Such data is rare in the cardiac US community and crucial for training deep learning models for 3D reconstruction. The pseudo-paired data strategy is also effective for pretraining.

    2. The coarse-to-fine conditional diffusion transformer is a good choice for the reconstruction task similar like what is usually used in point cloud-related tasks. The use of a template selector and multi-directional refinement via anisotropic patch partitioning is both innovative and computationally efficient. This hierarchical design ensures both global structural integrity and local detail capture.

    3. The implicit autoencoder ensures topologically consistent outputs by learning a structured latent space of plausible cardiac shapes. This is especially valuable in medical reconstruction tasks where anatomical correctness is non-negotiable.

    4. The author performed thorough evaluation compared to other models and did relevant ablation studies. The results show that the proposed method outperforms other techniques and each proposed component is effective.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. I don’t think this paper should be named as digital twin because it is only focused on 3D anatomical reconstruction. It doesn’t create a dynamic cardiac model over the whole cardiac phases according to the ECG, only ED and/or ES phases. Moreover, there is no cardiac functional twinnings. Cardiac digital twins usually contains two stages, i.e., anatomical and functional twinnings. In this paper, there are no any introductions to cardiac digital twins. Also the related work about 3D cardiac shape reconstruction mentioned in this paper is extremely less (there are a lot in fact).

    2. I am not clear about the implicit autoencoder. During inference (for test data), what is the input to decoder? I mean what are the latent features? Are they generated from 3D segmented CT cardiac shape?

    3. I was wondering if the author found there are still residual motion even after alignment according to the ECG. How the author can handle this?

    4. Data issues
      • I am concerned about insufficient test data (only 24). Though the author proposes a way to pretrain the model on a relatively large pseudo-paired dataset, the test dataset size is still not satisfactory (only 24), which makes me suspicious of the method’s feasibility in real scenarios (generalisability).
      • With or without pretraining, there is a concern about overfitting because the number of data for training are very less (only 96). The author decreases the learning rate compared to pretraining stage, but the number of epochs for training is two times as the pretrainig stage. Moreover, the author directly chooses the best performance on the validation set (only 10 data even less than 24 test data). I don’t think this is a good manner as we have to also look at and compare with the training performance curve to avoid overfitting.
      • The data comes from multi-enter hospitals and there should be heterogeneities? how you handle this issue? Does your dataset split considers different data sources?
    5. The p-value in the Table 1 is not clear, which metric does this p-value in the Table 1 stands for?

    6. The comparison is not sufficient. The author mentions other related methods (reference 4 and 9 in this paper) but the author has not compared their results with these two references. Instead, Out of the 4 models for comparison, there is only 1 model is related to 3D cardiac reconstruction.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper proposes a novel method to reconstruct 3D cardiac shape from multiple but sparse 2D US images and the results show its effectiveness.

    However, I don’t think this is a digital twins-related paper (See weakness) and I am also concerned about the dataset issues such as the limited test data and overfitting (See weakness). Also the comparison to related work is not enough (See weakness).

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes UltraTwin, a conditional generative model for constructing cardiac digital twins from sparse multi-view 2D ultrasound (US) images. The work introduces two key innovations: (1) the creation of a rigorously paired real-world dataset combining multi-view 2D US images, ECG-gated CT scans, and pseudo-paired data for pre-training, and (2) a novel multi-view 3D reconstruction framework leveraging a Diffusion Transformer (DiT) architecture. The authors demonstrate superior performance over baseline methods in both quantitative metrics and qualitative evaluations, suggesting potential clinical utility for cardiac chamber volume measurements.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1)The authors construct a real-world and high-quality dataset containing strictly paired multi-view US images and CT volumes, and pseudopaired data for pre-training. Cardiac imaging datasets with precise spatial alignment between multi-view 2D US and 3D CT are scarce due to the technical challenges of cross-modal registration. (2)The proposed UltraTwin first applies DiT for multi-view US-to-3D cardiac reconstruction is innovative. The claimed advantages in handling global anatomical consistency and local anisotropic features align well with the inherent challenges of sparse US data. (3)The comprehensive evaluation across multiple metrics on a large-scale dataset demonstrates the proposed UltraTwin outperformance all competing methods. This suggest that UltraTwin shows meaningful progress in accuracy for cardiac chamber volume estimation, which is critical for conditions like heart failure monitoring.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) The paper lacks detailed descriptions of the dataset construction process.

    • Firstly, it fails to clearly explain why CT is chosen as the reference for 3D modeling of 2D ultrasound images. There are no references to prior works that have adopted a similar approach, making it difficult for readers to understand the rationality of this choice.
    • Secondly, the spatial registration process between multi-view 2D US images and 3D CT, which is a key step in constructing the paired dataset, is not elaborated.
    • Thirdly, the implementation steps of the Pseudo-paired Real Data Generation Strategy are not clearly presented. Without these details, readers cannot fully understand how the strictly paired dataset is obtained, which limits the reproducibility of the research.

    (2) The paper does not clearly define what the Baseline model is. From the experimental results, it is not clear how the two main modules contribute to the improvement of the experimental results separately. For example, the authors claim that “A coarse-to-fine reconstruction pipeline ensuring both global anatomical consistency and local anisotropic information extraction” and “Implicit Autoencoder reduce noise in 2D US projections and enhance the anatomical plausibility of 3D reconstructions”, but there is a lack of sufficient evidence to support these claims.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has significant strengths, including innovative dataset construction, a novel model framework, and comprehensive experimental validation, which show great potential in the field of cardiac digital twin construction. However the existing weaknesses, such as insufficient dataset construction details and unclear demonstration of model modules, affect the clarity and reproducibility of the research.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes UltraTwin, a generative framework for 3D reconstruction of ultrasound (US) cardiac structure from multi-view 2D US images.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper addresses an important problem of 3d construction of ultrasound images of cardiac structure. The coarse-to-fine approach using conditional diffusion transformer is novel, anisotropic patch partitioning and implicit autoencoder. Experiments seem extensive and make comparisons with a number of baselines, and the proposed approach shows good performance. The paper is clearly written and technically solid. Also the dataset contribution seems be useful to the community.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The figures for anisotropic patching is slightly confusing. In fig.2, the partitioning with respect to x axis (top figure in “Fine Stage”) the patch size is denoted as (p/2, p, p), but the patch in the actual figure seems to have size (p, p/2, p/2) (the orange block). I wonder what the figure means.
    • The authors claim anisotropic patching saves computational overheads (it seems so). Can you provide a comparison in computational overhead with or without anisotropic patching ?
    • Can you be more specific on 3) Dynamic Fusion Mechanism in 2.2? Equation (1), (2) seem to represent variances of voxel values in x, y, z direction and the associated weights, but it is unclear how they are actually used for “fusion” of features.
    • It would be good to provide meaning for the acronyms Myo, LV, LA, etc. in Table 1 for readers unfamiliar to this topic.
    • Ablation study: this paper makes a number of architectural contributions for 3D US reconstruction, and the model components seem reasonable. However, the Ablation study in page 6 seem a bit insufficient. Please provide more detailed study on model components.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The paper is well written with solid technical contribution. Some of the representation is a bit unclear, maybe due to that the paper has many points to address and is a bit dense to read. It would good if the authors can clarify some of the technical details. Also more detailed Ablation study is needed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written with solild technical contribution.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed my concerns.



Review #4

  • Please describe the contribution of the paper

    The purpose of this paper is to achieve accurate 3D cardiac reconstruction from sparse multi-view 2D ultrasound (US) images to build a cardiac digital twin. This is accomplished using a 3D imaging technique known as DiT-3D, which is based on Transformers. DiT-3D enables the conversion of a 3D point cloud into the desired 3D image. The original technique has been modified to incorporate conditional injection (CDiT), which adapts the generation of the 3D image from the input images, specifically US sequences. This modification results in a more controlled process compared to the original DiT-3D design. The proposed method includes successive refinements using the new CDiT technique. Additionally, an artifact removal mechanism is integrated, which is crucial for handling artifactual images such as US images. The model was trained using US sequences and CT volumes collected by the authors. Quantitative evaluation was performed by comparing the results with other published techniques using standard metrics like the Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD). The results indicate that the proposed method outperforms the others in the comparison.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is based on a previously developed and proven technique (DiT-3D) for generating 3D images, but good results were obtained in this specific application. The method underwent both qualitative and quantitative evaluation and an ablation study was also included.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The comparison with other methods was conducted using a specific set of images from the presented work (from 9 hospitals), which may affect the robustness of the results. The distinction between results with pre-training (w) and without pre-training (w/o) is not clearly explained beyond the number of epochs used.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a comprehensive work with excellent preliminary results. However, the results were not obtained from shared databases. The method is accessible via GitHub for further comparisons.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed the main weaknesses identified by the reviewers, responding point by point and including a revision of the paper’s title to better reflect its actual content. In this regard, I find the responses to my review comments to be valid and satisfactory.




Author Feedback

We appreciate all the reviewers for the valuable comments. We will address the main concerns of the reviewers and improve the writing as suggested.

  1. Code & Dataset Open-source (#R1, #R2, #R5) Code will be released for reproducibility. We thank reviewers for recognizing the significance of our multi-center and largest cardiac US-CT paired dataset—the first open dataset resource in this field. 50 testing cases are publicly available on GitHub. In practice, qualified and paired US-CT data are scarce, as only patients with specific cardiac abnormalities undergo both US and ECG-gated CT. Constructing a large-scale dataset is hence challenging, costly and time-consuming. CT was chosen over MRI owing to its faster acquisition, lower cost, wider clinical availability and more accessible foundation segmentation models.

  2. Limited Samples/Overfitting (#R2, #R5) We believe this is the largest real-world US-CT paired cardiac dataset. We are continuously expanding the dataset towards 500 cases in journal version. 24 samples in the test set ensures disease diversity, validating our method’s statistical significance. Pretraining on 795 pseudo-paired cases mitigates overfitting by learning cross-modality mapping priors. Fine-tuning with a lower lr and more epochs refines diffusion and avoids prior overwriting.

  3. Digital Twin (#R2) Thanks for the guidance. We fully realize that “Cardiac digital twin” should encompass anatomical and functional twinnings. Our task is rectified as building Anatomical Twin, basis for Digital Twin. Building from multi-view US facilitates broader Digital Twin applications. Future work will include dynamic 4D modeling and functional blood flow simulations. Title revised: “UltrA-Twin: Towards Effective Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound”.

  4. Dataset Preprocessing (#R1, #R2) Explicit US-CT registration is not required. Our diffusion model can model the implicit US-CT modality alignment via conditional generative learning. -Pseudo Data Generation. Leveraging US-CT consistency in cardiac parameter estimation, we devised a pseudo-paired data strategy. Details were in the demo video on the code repository. -Motion Artifacts. Motion artifacts were minimized. Data were acquired via GE Revolution CT (SSF2 algorithm) and Siemens SOMATOM Force. CT devices minimized artifacts via high temporal resolution and intra-cycle correction. -Data Heterogeneity/Splitting. Multi-center data were split proportionally by site to address heterogeneity.

  5. Method (#R2, #R4) -Baseline. Baseline uses only coarse-stage training with condition injection. Coarse-to-fine training reduces volume error significantly. Implicit AutoEncoder (A_E) improves anatomical plausibility qualitatively. -Implicit A_E Decoder Input. During training, decoder input is latent features from CT segmentation; during inference, input is latent features from diffusion model reconstruction. -Anisotropic Patching (AP). Fig. 2’s orange blocks show patch counts (not sizes). Halving patch size on one axis doubles counts for spatial coverage. Figures will be revised for clarity. -Efficiency. For a forward inference: w/o AP: 1.57s, 1217.6GFLOPs; w/ AP: 0.54s (-65%), 892.3GFLOPs (-27%). -Dynamic Fusion. We use the inverse of variance as weights to fuse features from 3 orthogonal directions, as higher variance in features indicates larger uncertainty.

  6. Experiments (#R2, #R4) Reference 4 Comparison. Direct comparison is infeasible. Reference 4’s approach critically depends on accurate US segmentation, which is not accessible in our workflow. Our method uses US images to reconstruct five cardiac structures instead of only one. Ablation Study. Due to limited space, only key ablation studies were reported. Full results will be included in the journal.

  7. Unclear Definition (#R2, #R4, #R5) We will clarify more details in final version. P-values in Tab. 1 denote statistical tests on DICE. Definitions for acronyms will be added in figure captions.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top