Abstract

Real-time fusion of intraoperative 2D ultrasound images and the preoperative 3D ultrasound volume based on the frame-to-volume registration can provide a comprehensive guidance view for cardiac interventional surgery. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations between 2D frames and 3D volumes to be registered, resulting in real-time and accurate cardiac ultrasound frame-to-volume registration being a very challenging task. This paper introduces a lightweight end-to-end Cardiac Ultrasound frame-to-volume Registration network, termed CU-Reg. Specifically, the proposed model leverages epicardium prompt-guided anatomical clues to reinforce the interaction of 2D sparse and 3D dense features, followed by a voxel-wise local-global aggregation of enhanced features, thereby boosting the cross-dimensional matching effectiveness of low-quality ultrasound modalities. We further embed an inter-frame discriminative regularization term within the hybrid supervised learning to increase the distinction between adjacent slices in the same ultrasound volume to ensure registration stability. Experimental results on the reprocessed CAMUS dataset demonstrate that our CU-Reg surpasses existing methods in terms of registration accuracy and efficiency, meeting the guidance requirements of clinical cardiac interventional surgery. Our code is available at https://github.com/LLEIHIT/CU-Reg.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0300_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0300_supp.pdf

Link to the Code Repository

https://github.com/LLEIHIT/CU-Reg

Link to the Dataset(s)

https://drive.google.com/drive/folders/1dln50wujTLGQW0tTM5HibSdF7ZYrWRsM?usp=drive_link

BibTex

@InProceedings{Lei_Epicardium_MICCAI2024,
        author = { Lei, Long and Zhou, Jun and Pei, Jialun and Zhao, Baoliang and Jin, Yueming and Teoh, Yuen-Chun Jeremy and Qin, Jing and Heng, Pheng-Ann},
        title = { { Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Real time 2D-3D deep learning registration algorithm for ultrasound.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The algorithm seems to achieve real time speed and outperform existing algorithms.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The manuscript is not written very clearly. For example, the authors talk about “epicardium prompt-guided” which only a few pages later appears to mean that the network also calculates the segmentation and then it uses the segmentation as one of the input of their “prompt-guided gated cross-dimensional attention” block.

    The algorithm is critically dependent on a dataset containing corresponding 2D and 3D images and segmentations. It is not really clear how the training data was obtained. From the CAMUS publication and web page, it seems that CAMUS is a 2D dataset, while here the authors talk about 3D. There is a danger of the algorithm overfitting for the particular way the training dataset was prepared. The evaluation was done only on the simulated data

    It seems unjustified to use four adjacent planes during training (two at one side and one at the other) and only one frame (in 4 copies) during inference

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    not clear why convolutions fail to provide critical anatomical cues (page 2)

    Fig 2 is too small given that it is the main description of the method.

    Is the error of 3.9mm sufficiently low for clinical purposes (page 3)?

    How are the channel dimensions “normalized” (page 4)?

    It is not clear how the volume feature is associated with its “spatial corresponding slide feature” (page 5). I understood that the spatial correspondence is not used during training except to evaluate the loss.

    Hybird->Hybrid

    “inter-frame distance” should be defined

    did you use an existing implementation of the state-of-the-art methods?

    in the references, abbreviations (e.g. CT in ref 13) should be capitalized.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Algorithm appears to be working but the description is partly unclear and the evaluation is only on simulated data.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper address the challenge of fusing 2D US frames to 3D US volume accurately in real-time. The authors proposed a novel end-to-end CU-Reg which utilizes epicardium prompt-guided anatomical clues to enhance the correlation between US 2D and 3D features.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Epicardium mask, serving as an antomical landmark, makes the model concentrate better on the 2D/3D features to discover the optimal alignment between image pairs.
    2. The proposed CU-Reg integrates a novel attention and aggregation model, which can help improve the registration quality by comprehending the spatial and temporal features across the ultrasound data.
    3. The authors have carries extensive tests on the reprocessed CAMUS dataset. The model is evaluated based on several metrics including distance error, normalized cross-correlation and SSIM. They also compared against one baseline method and conducted several ablation studies.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The loss function is composed of five segments: translation loss, rotation loss, prompt loss, regularization term and similarity loss. The authors provided the hyper-parameter values in their experiments, but did not mention why to set those values. It would be good if authors can provide more insights which of the terms are more dominant during the training.
    2. Slice-to-volume registration has been a challenging task, especially considering the low signal-to-noise ratio in ultrasound images. And the attention module usually acquires sufficient data to train in order to avoid overfitting. Considering the relative small size of CAMUS (~500 cases augmented to 4000 training samples), is there a way to demonstrate that the model is not overfitting?
    3. What if the epicardium mask is not available for the inferencing patient? Will the quality of this mask significantly impact the registration result?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The loss functions are relatively straightforward to reproduce given the explanation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors can consider rephrasing the abstract, using more active voice and less passive voice.
    2. Consider adding some 3D visualization of the 2D frame to 3D volume registration results in the experimental section. This could better demonstrate the predicted position of different baselines.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    2D/3D image registration has been a challenging but highly demanded task in clinical scenario. The authors state the topic and methodology in a clear way, with solid evaluation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed my concern regarding potential overfitting and clarified the necessity of the epicardium mask supervision. I have no further concerns at this stage.



Review #3

  • Please describe the contribution of the paper

    This paper proposes an end-to-end cardiac ultrasound frame-to-volume registration network, aiming to accomplish real-time registration. The results show that the network achieves state-of-the-art results at significantly higher speed than previous methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well motivated, presenting a useful clinical application.
    • The results include a comparison to alternative methods and an ablation study.
    • The performance is slightly better than previous methods at a significantly higher registration speed.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limitations of the proposed work have not been discussed.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • In Fig 3, it would be useful to have some information on the quantitative metrics for the chosen cases.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written. The approach is novel. The experimental set-up seems appropriate. The results are promising.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely appreciate all valuable comments from reviewers (R1&R3&R8) and AC. Overall, they found our method well motivated (R8) and novel (R3) with extensive tests (R3&R8), addressing a clinical challenge (R3), and achieving real-time speed and superior performance (R1&R3&R8). Below, we address specific comments.

  1. Data leakage from training to evaluation (AC) and overfitting (R1&R3): As described on Pages 6 and 7, for dataset production, we split the data into training and test sets at the patient level, i.e., the training set and the test set come from different patients, so there is no data leakage. In CAMUS, there are two 2D echocardiographic sequences for each patient, they are expressed as 3D volumes in Cartesian coordinates with a unique grid resolution using the same interpolation procedure. Thus, CAMUS is used as a 3D volume dataset. Additionally, the slices are randomly sampled from volumes as in [10,18,19], which simulates the randomness of probe posture during actual 2D ultrasound imaging, so there is no hidden danger of overfitting. During training, there is no overfitting in the loss curve and our model performs well on the test set (Table 1). We have promised we will make our data and code publicly available.
  2. The simulated data (R1), and reliability of registration between 3D volumes and 2D slices from the same volume (AC): Although the simulated data is not a perfect replacement for the real data, it is sufficient to demonstrate the validity of our proposed method, the reasons are: 1) Our volume data comes from the actual human body, and the slice sampling method simulates the randomness of probe posture during actual 2D ultrasound imaging. 2) Related works [6,10,18,19] have also evaluated their models on simulated data, and the registration is performed on both the 3D volume and the 2D slices sampled from the same volume. The results of [Xu et al., MICCAI2022] also show the consistency of the model performance on simulated data and real data.
  3. Four adjacent planes during training and one frame (in 4 copies) during inference (R1): In the training phase, for small differences between adjacent frames, we used four frames (a current anchor frame and three adjacent frames) as inputs and combined them with an inter-frame discriminative regularization term to improve the distinguishing ability between adjacent frames. For inference, only the current frame (in 4 copies) is used as input to infer its relative position to the preoperative volume, which is consistent with the actual clinical scenario. The ablation study (w/o PGCA&VLGA vs w/o PGCA&VLGA&Lreg) in Table 1 demonstrated the effectiveness when 4 adjacent frames are used as input to train our model.
  4. Implementation details and additional information(R1&R3): We will provide sufficient implementation details and additional information in the final version, including: for many cardiac catheterizations, the accuracy requirement is 5 mm [King et al., MIA2010] (R1); In the loss function, translation loss and rotation loss are used to make the network converge quickly, the prompt loss is used to supervise the epicardium prompt, inter-frame discriminative regularization loss is used to distinguish between adjacent frames and improve registration accuracy, and similarity loss is used to make the training process more stable and avoid overfitting (R3); As shown in the last two rows of Table 1, when the epicardium mask supervision is removed, there is a significant performance drop, indicating the importance of the epicardium mask on registration (R3).
  5. Corrections (R1&R3) and Limitations (R8): We will revise the manuscript according to the comments. In addition, we will discuss the limitations in the journal version.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors provided a clear and well-reasoned explanation for their choice of validation method, addressing the reviewers’ concerns. They also detailed the frame selection process, distinguishing between how adjacent frames are chosen during the training phase compared to the inference phase. This clarification helps to understand the consistency and rationale behind their methodological approach.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors provided a clear and well-reasoned explanation for their choice of validation method, addressing the reviewers’ concerns. They also detailed the frame selection process, distinguishing between how adjacent frames are chosen during the training phase compared to the inference phase. This clarification helps to understand the consistency and rationale behind their methodological approach.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top