Abstract

Echocardiography (ECHO) is commonly used to assist in the diagnosis of cardiovascular diseases (CVDs). However, manually conducting standardized ECHO view acquisitions by manipulating the probe demands significant experience and training for sonographers. In this work, we propose a visual navigation system for cardiac ultrasound view planning, designed to assist novice sonographers in accurately obtaining the required views for CVDs diagnosis. The system introduces a view-agnostic feature extractor to explore the spatial relationships between source frame views, learning the relative rotations among different frames for network regression, thereby facilitating transfer learning to improve the accuracy and robustness of identifying specific target planes. Additionally, we present a target consistency loss to ensure that frames within the same scan regress to the same target plane. The experimental results demonstrate that the average error in the apical four-chamber view (A4C) can be reduced to 7.055 degrees. Moreover, results from practical clinical validation indicate that, with the guidance of the visual navigation system, the average time for acquiring A4C view can be reduced by at least 3.86 times, which is instructive for the clinical practice of novice sonographers.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0730_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0730_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Bao_Realworld_MICCAI2024,
        author = { Bao, Mingkun and Wang, Yan and Wei, Xinlong and Jia, Bosen and Fan, Xiaolin and Lu, Dong and Gu, Yifan and Cheng, Jian and Zhang, Yingying and Wang, Chuanyu and Zhu, Haogang},
        title = { { Real-world Visual Navigation for Cardiac Ultrasound View Planning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    • This paper introduces a probe guidance system for easier acquisition of the required views in cardiac ultrasound imaging by guiding the rotations of the probe on a fixed imaging location. • A two-stage training method is proposed: (1) First train the image feature extractor and rotation predictor for arbitrary image pairs within a scan, (2) then train a guidance module that uses the extracted image feature vector and predicts the rotation from current view to the clinically-required target view. • The proposed method is evaluated with real world volunteer scan data for training and evaluation, which is shown to have accuracy similar to human sonographer. Real-world navigation experiment for guiding novice sonographers is also performed, which shows that the required scanning time is reduced.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • The image feature extractor is trained in a view-agnostic way instead of only comparing the views to the target view. This allows the more efficient use of available dataset and helps network learn the dense transformation information between views. • The consistency loss is novel. When training the guidance module, the estimated target viewing plane location from any two views within the same scan should be identical. This extra supervision signal can be helpful in enforcing the network to learn the anatomical structures of the heart to some extent. • The data collection method is novel: starting from the standard view, moving away, and returning to the standard view. This allows the quantification of expert-level standard view retrieval and a straightforward assessment of the algorithm performance: the algorithm is “Achieving a difference of only about 1 degree compared to expert cardiac sonographers.” • Performed real-world navigation experiment for guiding novice sonographers. It demonstrates the average time taken to reach the target pose significantly reduced and no failure case observed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • Not considering the translational guidance? The first step in routine cardiac imaging is usually the placement of the probe so that it is imaging through an appropriate acoustic window without bone occlusion. If the paper assumes the probe starts at the correct imaging window placement and only provides rotational guidance information, then it may not be a “fully automated navigation system” as the paper has claimed.
    • Although both the “Two stage training with view agnostic feature extractor” and “target consistency loss” improves the results as shown in Table 1 and Table 2, the improvements are minimum: “improvements of 0.242 degree on the A4C view and 0.622 degree on the A2C view” for the former and “improvements of 0.153 degree on the A4C view and 0.046 degree on the A2C view” for the latter. It is unknown whether the improvements are statistically significant. • Clarity of the writing: some terms used are a bit confusing, some texts are not well organized, and figure 1 is not very clear, please see the detailed comments section. Examples: (1) The use of the terms “visual navigation”, “view planning”, “human-based”, etc., are confusing or misleading. (2) In related works section, dividing the methods in literature based on the data types used (real clinical data or phantom data) may not help in understanding the state-of-the-art, as methods developed with phantom data may be easily trained and validated given a clinical dataset. (3) The figure. 1 is confusing, missing necessary correspondence to the texts and some parts are misleading.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    • Please comment on the translational aspect of guidance, how the authors plan to achieve the initial placement? • Please comment on the effectiveness of the proposed VFE and consistency loss method, given that the improvements reported in the results section seem to be limited. • “We selected 10% of the data as the validation set”. How is the validation set selected? Is the validation set purely using the data from unseen volunteer subjects? This is important to test how well the algorithm generalizes to new patients. • Here are some clarity issues that may need to be addressed: o The term “human-based visual navigation system” in the introduction is very confusing but can be understood from the related works section – “human-based” is referring to using the actual clinical data for training and validation. A different term could be clearer, such as “a visual navigation system developed with clinical data”. Besides, it sounds redundant to say “visual” navigation: “Therefore, automatic visual navigation for cardiac ultrasound view planning is high demand” it was confusing why “visual” is emphasized here and “view planning” is a misleading term as guiding is not equivalent to planning. Simply saying “Therefore, automatic probe navigation for cardiac ultrasound standard view acquisition is high demand” may be clearer. It can be also misleading to use “view planning” term in the title. o Please make it clearer what p_{vi} represents in the first paragraph of Section 2.1, is this 6 degree-of-freedom or only rotations? Saying “p_{vi} signifies the absolute rotation matrix relative to the standard position” does not suggest it only contains rotation. A better way is to say “p_{vi} is the rotation matrix relative to the standard position of the tracking system”. o For Fig. 1, (1) Please make it shown in figures with symbols E, P and G for easier understanding. “We train a view-agnostic feature extractor E and a rotation predictor P to compute the relative rotation matrix…a guidance module G to predict…” (2) The Stage 2 figure is confusing: Why the feature extractor takes in two images now (x_{vi}, x{vj})? After more careful examination I find it simply represents two data processing pipelines (the green one and the yellow one) which isn’t clear upon first reading. o For Table 1, please make clear what is VFE, readers have to guess this is “View-agnostic feature extractor”. In ablation study section, “As shown in Table 1, the view-agnostic feature extractor consistently enhances visual navigation accuracy across different backbone architectures.” How are the results of “w/o VFE” obtained? You need to specify clearly what is the alternative method for comparison.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    • The paper shows a new way of training an ultrasound image-based probe navigation algorithm for cardiac image acquisition. Various novel techniques were implemented, which improved the navigation accuracy. However, the improvements seem to be limited so it can be hard to evaluate the quantitative contribution of the proposed techniques. • The proposed methods were properly evaluated with real-world datasets and novice sonographer navigation experiments. The guidance system is shown to significantly reduce the required scanning time for reaching the target cardiac view for novice users in field tests. • The writing, organization, and presentation of some aspects of the paper needs proper refinement. It can be confusing and sometimes misleading upon first reading.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    1.) To address the inherent challenges for sonographers in acquiring standardized ultrasound views during echocardiography in the context of the diagnosis of cardiovascular diseases, the authors propose a visual navigation system aimed at assisting novice clinicians in manipulating an ultrasound probe towards the correct anatomical location so that an accurate US view required for diagnosis can be obtained. According to the authors, the proposed visual navigation system for cardiac US view planning is the first that can be used in actual clinical setups that involves human patients rather than phantoms. 2.) The proposed feature extractor of the proposed neural network for visual navigation is view-agnostic which makes the prediction of relative rotation of paired ultrasound planes more robust 3.) The proposed network is trained on real human cardiac ultrasound scans, taking into account the challenges of accurate ultrasound scan aquision caused by movement of the heart. .

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.) Overall, the paper is well structured and its main contribution appears to be in the presentation of a system that can be used for cardiac view planning in actual clinical settings and not just phantom studies, taking into account the movement of the heart during ultrasound image acquisition. 2.) A solid technical description of the proposed neural network architecture and experimental settings comparing three different backbone networks for the view-independent feature extractor. 3.) A real-world verification of the usefulness of the proposed visual navigation method, confirming a certain degree of clinical applicability.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some aspects could have been described more accurately in terms of the actual clinical setup, as described in my comments for the authors.

    Clinical validation is only done by novice sonographers and not by expert sonographers. A comparison between these groups two would have been beneficial.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    There are no indications of publication of the underlying source code. However, the proposed deep learning-based feature extractor is well described using Figure 1 and the corresponding descriptions of the available network architectures and coding frameworks used. The experiments are also well described, including data collection and evaluation methods. Publishing the underlying source code would certainly increase reproducibility, but it appears that the authors have described their methods well, so some level of reproducibility appears to be assured.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It seems the actual representation of the visual navigation system in a clinical setup is not described: Is the guidance information being displayed on an additional monitor that streams the live US feed? How does the concrete setup look like?

    A figure showing the complete system in use incl. user, US system and navigation system would certainly be helpful, two of which you actually provided in the supplementary, but it would make sense to have one figure in the paper itself. I imagine adding this figure will exceed the 8-pages-limit but maybe you could save additional space by combining the 6 bar plots of Fig. 2 into a single one.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, a solid article with sufficient degree of novility, a plausible clinical motivation as well as a good technical description and experimental design. Providing the underlying source code would have been beneficial to support reproducibility. The description of the actual system setup should be improved because it is not 100% clear how the visual navigation system is being used in clinical practise compared to conventional cardiac ultrasound view planning.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces a method for visual guidance within cardiac ultrasound by predicting the rotation matrix required to rotate the probe from a given image to a target plane (e.g. apical four chamber view). The method involves direct regression of rotation matrices using convolutional neural networks, with an additional pre-training step.

    Experiments demonstrate the effectiveness of the method in improving the time taken for novice ultrasonographers to acquire two standard viewing planes.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper is a novel application of visual guidance to a challenging task: that of cardiac ultrasound plane finding.

    The validation wherein it is demonstrated that novice users are able to find viewing planes significantly faster when using the guidance method represents very compelling evidence of its utility.

    The method is simple but well-justified and appropriate for the task.

    Overall, this is a strong paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are no major weaknesses of the paper, but some individual statements are written a little confusingly, and the justification for some parts of the method is not clear, see detailed comments below.

    There is no comparison to other methods, though it is not exactly clear to me what an appropriate comparison would be. There is an ablation study of different components of the proposed method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • This sentence is rather confusing to me “First, significant individual differences make it difficult to establish a unified coordinate system for humans, leading to difficulties in obtaining positions”. The difficulty of obtaining positions seems like an unrelated problem to anatomical differences between humans.
    • What do the authors mean by “human-based”? I find this terminology confusing and would recommend that the authors be more specific. Presumably they mean that the human is the one operating the probe, but “human-based” could be interpreted in many other ways.
    • What is meant by the subscript “t” in R_{vit} and p_{vt}? I would assume time, except that i is the frame index. I guessgiven what happens later in the maunscript, the intention is “target” but this is not clear at the time it is introduced. Further this seems to confusingly mix indices and plain letters as subscripts.
    • The authors appear to be implicitly assuming that navigation of the probe to each view plane can be achieved purely through rotation of the probe, without translation. This assumption should at least be stated explicitly and justified, and preferably there would be some discussion of the implications of the assumption, both in terms of somplification of the modelling and its effect on the real-world usability of the tool.
    • In what sense does retraining make a model “less suitable for medical applications”
    • In section 2.2 the authors state that “the direct regression approach also fails to address the acquisition errors present in clinical scans collected by different physicians and the standard plane selections since the standard planes in ultrasound should be captured within a range of angles rather than a single specific position”. This is true, but I don’t see how the feature extraction pre-training addresses the problem. Ultimately, a single ground truth must be assumed to train the guidance module so the problem is just kicked further downstream.
    • How is the ground truth rotation matrix of the ground truth plane determined from the series? Does the expert choose a single frame of the acquired video to serve as the “best” view, and the rotation matrix of this frame is used? This should be clarified.
    • I am struggling to understand the motivation for the consistency loss. If the rotation matrix already directly supervised, why would consistency between multiple predictions give any further information?
    • During the real world verification, what definition of “locating” the standard planes was used? I.e. how was it determined exactly when the user had achieved a good localisation of the viewing plane.

    Minor comments:

    • Notation like “1e-3” is non-standard programming shorthand and should not be used in a manuscript (use “1 x 10^-3”)
    • “without no” -> “with no” in the introduction
    • “heartsand” -> “hearts and” in the introduction
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is technically sound, addressed an important and challenging problem, and has very compelling results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Dear editor and reviewers, Thanks very much for taking time to review this manuscript. We will clarify confusing expression in our paper and fix syntax problems. Response to Reviewer’s Comments 1.Clinical validation is only done by novice sonographers and not by expert sonographers. A comparison between these groups two would have been beneficial. (#5) Reply: Thank you for your suggestion, we will try to include this part of experiments in future studies. Positioning the cardiac views recommended by the guidelines is a basic skill for expert sonographers, who can quickly accomplish this task. Considering the additional time required when using navigation systems, the efficiency gains for expert sonographers may be relatively small. However, expert-level sonographers are scarce resources. Our approach is focused on assisting novice sonographers, especially in regions with limited medical resources, enabling more sonographers to complete the acquisition of cardiac ultrasound views for diagnostic purposes by senior physicians.

  1. It seems the actual representation of the visual navigation system in a clinical setup is not described: Is the guidance information being displayed on an additional monitor that streams the live US feed? How does the concrete setup look like? (#5) Reply: Yes, we have installed an extra monitor next to the existing ultrasound equipment, where real-time ultrasound video and navigation information are displayed together. Navigation information includes the current position of the probe and the predicted probe position corresponding to standard cardiac views. These positions are represented on the user interface using rotating prisms of different colors. . We will include overall diagrams of the user interface, ultrasound machine, and navigation system in the final manuscript or supplementary materials. 3.Please comment on the translational aspect of guidance, how the authors plan to achieve the initial placement? The authors appear to be implicitly assuming that navigation of the probe to each view plane can be achieved purely through rotation of the probe, without translation… (#3, #4) Reply: For the positioning of standard cardiac ultrasound views, including those based on the parasternal and apical aspects, both sets require identifying the correct initial placement point. Then, simply rotating the probe allows for accessing different standard views. Translational movement is only used in our operations for positioning the initial points of the two sets of series views. Here, we will employ simpler and more effective methods for clinical use, such as bodily landmarks and simple touch training. We will also develop new methods for automating the positioning of the initial points of the two sets of views. 4.I am struggling to understand the motivation for the consistency loss. If the rotation matrix already directly supervised, why would consistency between multiple predictions give any further information? (#3) Reply: Consistency loss can utilize contextual information from the ultrasound video since these frames originate from the same video. Essentially, it can be seen as a fundamental video-based method. In future studies, we will explore incorporating more frames from the same video or employing video-based techniques to improve the accuracy of view planning. 5.The authors state that “the direct regression approach also fails to address the acquisition errors present in …”. This is true, but I don’t see how the feature extraction pre-training addresses the problem. Ultimately, a single ground truth must be assumed to train the guidance module, so the problem is just kicked further downstream. (#3) Reply: We agree that acquisition noise is involved in training guidance module. For view-agnostic feature extraction pre-training, we do not need any specific view localization. This pre-training process is free from acquisition noise. We will introduce new methods to tackle this problem in future.




Meta-Review

Meta-review not available, early accepted paper.



back to top