Abstract

Image-guided cerebral artery navigation (CAN) system can provide precise guidance for intracranial artery examination and surgery by aligning 3D medical data with patient’s head observed by a depth sensor. Existing CAN systems generally suffer from either susceptibility to location marker offset or weak efficiency. This paper presents a real-time marker-less method to track the patient’s head pose based on the MRI data for CAN. Briefly, the 3D facial model is constructed from the patient’s MRI data in the pre-operative stage. Then, a 3D local description is proposed to encode the local geometry of the facial model via thin plate spline function. Subsequently, according to the local description of the facial model, the patient’s head observed by an RGBD camera is registered with the facial model by maximum weight matching. Eventually, the head pose is accurately tracked in real-time via square-root cubature Kalman filter (SCKF) and iterative closest point algorithm (ICP) during navigation. With each estimated head pose, the patient’s vessels in MRI data are visualized onto the RGB image of the patient’s head for CAN. The proposed method is evaluated on comprehensive experiments, showing the best core performance metrics than all comparison methods. The average rotational and translational errors of our method are 2.6° and 1.9 mm respectively on the BIWI dataset. The average tracking rate achieves 0.06 s.


Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1763_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/1763_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanQiu_Markerless_MICCAI2025,
        author = { Wang, Qiuying and Zhang, Pandeng and Chen, Dewei and Tang, Hao and Liu, Chang and Liu, Jia},
        title = { { Marker-less Head Pose Tracking for Image-guided Cerebral Artery Navigation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15968},
        month = {September},
        page = {352 -- 362}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The article proposes a new method for Cerebral Artery Navigation that continuously overlays a 3D model of the face with the patient anatomy using RGB+D video. The approach comprises (i) a pre-operative segmentation step to obtain the facial model, extract model keypoints, and describe those keypoints using Thin Plate Spline Functions (TPFS); (ii) a pose initialization step that uses a pre-trained U-net to detect 2D biomarkers in RGB image (e.g. nose tip, eye corners, etc), transforms those biomarkers into 3D biomarkers by considering the depth channel, matches the 3D biomarkers with the keypoints in the 3D facial model, and obtains a 3D registration solution for the first frame (k=0); and (iii) a real-time tracking step that updates the 3D registration solution at every frame time instant by using a kalman filter to predict the head pose followed by ICP to align the facial model with current depth image. The navigation is then accomplished by augmented reality where the vasculature structure is projected into the RGB images. The experimental results show improvements in registration and tracking accuracy with respect to competing methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper tackles a relevant problem for MICCAI

    • The paper reports improvements with respect to baseline methods both for initial registration and tracking

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Pipeline is not well motivated: There are multiple methods in the literature to accomplish 3D registration, object tracking and/or Head Pose Estimation (HPE). It is not clear for the reader what are the specific challenges posed by the targeted application, why overcoming these challenges is beyond the state-of-the-art, and in which extent the proposed pipeline addresses these challenges. In summary, the oevrall impression is that it is a “me too” solution with many design options lacking a clear justification (e.g. why estimate an initial pose and then track, as opposed to estimate pose at every frame with a fast global 3D registration method? why use maximum weight matching graph instead of PnP?, etc )

    • Lack of clarity and detail: The method is difficult to follow with some steps being hard to understand (see unclear steps below) and some parts lacking enough detail for easy reproducibility

    • Unconvincing Experimental Results: The experimental section shows a dramatic improvement in accuracy with the error in pose tracking being almost zero (last line of Table 2). I am not fully convinced by these results because (i) it is not clear how the proposed method is evaluated in the BIWI dataset that lacks a pre-op facial model (see point 4 below), (ii) methods are not directly comparable, e.g. the methods of table 2 are Head Pose Estimation methods that can provide estimates from a single frame without prior info, while the described pipeline tracks the head pose in continuous RGB-D data by aligning a pre-op model whose pose has been initialied by a 3D registration step

    • No discussion of limitations: The paper does not discuss the limitations of the method namely to issues like non-rigid deformation of face surface that difficult alignment with initial 3D facial model and/or tracking under fast face motion.

    • The companion video is more intriguing than illustrative, showing augmented reality in a scenario where the head of the subject has a marker and barely moves during the entire experiment

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    UNCLEAR STEPS / LACK OF CLARITY:

    1. Section 2.2 says that “keypoints are extracted by voxel grid downsampling”. This process is unclear and I do not see how it ensures that the set of detected points overlaps in part with the facial biomarkers extracted with the pre-trained Res-Net (section 2.3)

    2. The role of the TPSF is not clear. Is it to match key point with 3D biomarkers by minimizing equation 3? if yes how is this minimization exactly carried? And why not to use a more standard scheme for detecting and matching 3D points (e.g. Point Feature Histograms)?

    3. How do equations 3 and 4 play together? My current understanding is that 3 is minimized for matching and 4 is minimized to determine the relative pose. If this is the case why use 4 instead of applying PnP within an hypothesize and trial framework like RANSAC?

    4. Since the pipeline assumes a pre-op facial 3D model obtained from MRI that is not available in the BIWI dataset, how are the results of Table 1 and Table 2 exactly obtained?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses a relevant MICCAI problem but the solution is not well motivated in the sense that it is not clear for the reader why the proposed pipeline is better suited to address the challenges posed by the application than other possible competing methods. In addition there are important clarity issues that difficulty reproducibility and the experimental results are not sufficiently convincing. Thus, I do not think the work in its current form is sufficiently mature to be accepted in MICCAI

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    My initial recommendation was justified by several listed major weaknesses from which the chief one was the lack of a clear motivation for the proposed pipeline. RGB+D cameras have been around for many years, there are several available methods to accomplish accurate 3D registration and/or tracking from RGB+D images, and the targeted problem of registering and/or tracking a human head does not seem to pose challenges beyond the SOTA. Thus, I could not find a clear motivation for the work

    I have carefully read the rebuttal and, although it provides some clarifications on the functioning of the solution and eventual superiority of TPFS with respect to PFH, it does not explain why the targeted application is challenging, and why the proposed pipeline is the solution for those challenges.

    It happens that in the meanwhile I also ran into the VISIE system [a], which is a commercial medical grade scanner that is able (among other things) of accomplishing real-time, markerless head pose estimation. This finding has further strengthen my impression that the tackled problem of HPE is largely solved

    Based on the above facts I stick with my original recommendation of not accepting the submission in its current form

    VISIE: https://visievision.com/technology/



Review #2

  • Please describe the contribution of the paper

    The main contribution of the paper is a novel registration approach to align MRI with the patient’s head to enable cerebral artery navigation without relying on physical markers or tracking devices. Compared to other methods, the proposed method uses a regression-based approach that includes full 6 DOF of head pose as opposed to limited 3 DOF for enhanced accuracy, and is more efficient compared to optimization-based approaches. The accomplished accuracy and efficiency suggest potential for real-time clinical applications.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well organized and written. The authors provide detailed mathematical formulations and clearly describe any existing architectures that are leveraged in the study.

    The evaluation of the method is well-designed and thorough, and the results are clearly presented. The authors assessed the proposed approach using both a public database and additional real data acquired independently. Relevant hyper-parameters are discussed. Quantitative error metrics are provided and results show that the proposed method outperforms other approaches. Additionally, the robustness was evaluated with different levels of noise for each method. The results show that the proposed method maintains high accuracy across various realistic noise levels, which is critical for clinical applications where the quality of input data is subject to artifacts. The runtime performance is also evaluated against other approaches. Both accuracy and efficiency results suggest that the proposed method is sufficient for real time tracking of head pose.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The authors are encouraged to include limitations of the study.

    Maximum intensity perspective projection (MIPP) is included as a component in the system, but the result of MIPP was not quantitatively evaluated. Quantitative assessment of the final navigation accuracy remains challenging to evaluate.

    Though the method was evaluated with real applications (section 3.4), quantitative metrics were not available for the data set.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Minor comments that can be incorporated to increase the readability of the paper are listed below:

    1. When comparing to existing methods (e.g., Lu’s and others), it would be helpful to include whether they are regression based or optimization based to improve the self-containment of the paper.
    2. The abbreviation MDV is not defined in the paper.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My score of the paper is mainly attributed to its technical contributions, rigorous evaluation, promising results, and clinical relevance. The achieved accuracy and efficiency are compelling, especially when compared to other existing approaches, supporting the potential impact of this study.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Rebuttal is satisfactory.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a real-time, markerless registration method for aligning preoperative MR imaging to an RGBD image of a subject’s head for the purpose of image-guided cerebral artery navigation (CAN). The paper outlines an end-to-end pipeline for this system by detailing the methods for (1) facial model representation using thin-plate spline functions (TPSFs), (2) efficiently calculating the initial registration alignment (pose initialization) using a similarity metric that leverages the thin-plate spline functions’ local reference frames and the blossom algorithm, and (3) calculating the subsequent registrations of continuous RGBD frames using ICP and square-root cubature Kalman filter. The authors provide quantitative evaluation of their initial registration method and continuous registration method against previous work. They also provide a qualitative demonstration of their method on volunteers in mock intervention scenarios.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper creatively combines multiple methods together to tackle the specific challenges with markerless CAN application. Establishing correspondences between MR and physical space is challenging in a markerless environment. While a TPSF representation of the MR surface itself is not novel, I am not familiar with prior work that uses a TPSF representation in the similarity metric or the blossom algorithm to compute initial correspondences and an initial pose estimate.

    The chosen comparators in Table 1 are especially compelling. Line 3 (Our metric+1-point RANSAC) shows that the similarity metric based on TPSFs is useful for achieving high accuracy, while Line 4 (Our initialization method) shows that the blossom algorithm has faster computation than the 1-point RANSAC.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    While I do not have any major weaknesses to note, I can provide a few minor comments:

    One interesting avenue that could have been tested was performance variability with a varying number of biomarkers. Since the pose estimation is just at first coarsely estimated with the proposed method and then revised with ICP, I would be interested to see the degree of degradation performance, or potential performance improvement, with a decreasing or increasing number of biomarkers.

    While the supplemental video was appreciated, large head movements to show the algorithm’s performance for continuous pose estimation (rather than a video with little head movement) may have been more compelling.

    Although the methodology is detailed, code-availability would make these methods more accessible to readers.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper creatively combines multiple techniques in a (to my knowledge) novel methodology for computing accurate and efficient initial and continuous pose estimation for CAN. Reliably, accurately, and efficiently computing correspondences in a markerless environment is an ongoing and challenging problem for lots of image-guided applications, and I think the community would benefit from seeing this method and extending it to other clinical application areas.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    No change to additional recommendation - the author’s rebuttal was sufficient to recommend accept.




Author Feedback

We thank all reviewers for their thoughtful and valuable comments, and are encouraged by the positive remarks such as R1: well-designed evaluation, R2: improvements for registration and tracking, and R3: creative methodology. Below, we address the raised concerns. Work motivation (R2): Existing markerless CAN construct local descriptor with point feature histograms (PFH) and perform per-frame global 3D registration [5][6]. PFH has relatively weak descriptiveness to flat facial surface, and per-frame global registration lacks real-time performance. SOTA head pose estimation is based on NN, which generally suffers high computational cost, limited generalization, and 3-DoF pose, and thus unsuitable for CAN. Our method outperforms the existing CAN systems (Table 1, 2) owing to the proposed techniques described in detail below. Methodology Details (R2): Following [8], voxel grid downsampling builds a 3D voxel grid on the model and selects the point nearest to each voxel center, yielding uniform keypoint distribution. For each keypoint, TPSF approximates its neighborhood surface on the facial model, which can extract more local geometric information than PFH (Fig. 2b–k). If a keypoint q should match a biomarker B, their neighborhoods should have similar geometric shape. Eq. (3) models the geometric discrepancy between the neighborhoods of q and B with MSE. We construct a bipartite graph whose two disjoint vertex sets consist of keypoints and biomarkers, respectively. The weight of each edge connecting keypoint and biomarker is computed by Eq. (3). We find from this graph the keypoint-biomarker matching with minimum weight sum by Blossom algorithm. Eq. (4) calculates the confidence score of each biomarker-keypoint match. Then, the head initial pose is first computed from all keypoint-biomarker matches via least squares weighted by the confidence scores, and then refined with facial point cloud and model by ICP. The subsequent pose is tracked by SCKF and ICP. Our method (0.06s/frame, Table 2) is much faster than per-frame fast global registration [5] (3s/frame, Table 1). Limitation Discussion (R1, 2, 3): We agree that the impact of facial deformation and rapid motion should be discussed, although most CAN studies assume rigid faces and minimal head motion during neurosurgical or TCD procedures [1][2][5]. The impact of varying biomarker count also deserves attention. We will focus on such limitations in future work. Experiment Evaluation (R2, 3): The BIWI dataset offers the RGBD data and the 3D facial models, which serve as the pre-op facial model in our method. In Table 2, both [9] and [10] are trained partially on BIWI, and [11] is also facial-model-based. Therefore, all three methods exploit the prior info of BIWI and thus can be compared to ours. We appreciate R2 for questioning the results in Table 2, which was indeed a coding mistake. The corrected numbers are 0.8°, 1.2°, and 0.4°. We sincerely apologize for the error and have carefully verified the correctness of all other results. Our conclusion remains valid in outperforming the others, mainly owing to robust initialization and effective SCKF. Notably, BIWI’s GT pose was also obtained via ICP with facial model [24]. Hence, our estimation will be quite close to GT if SCKF gives good initialization. To support reproducibility, we will release the codes. The supplementary video aims to show the practicality of our method in TCD examination. An Apriltag is attached to the TCD probe—not the head—to track the probe motion and visualize the ultrasound beam (green line). This process requires minimal head motion. Fig. 9 (top two rows) shows robustness to large head motion. Misc: @ R1: MIPP is for visualization only and beyond this paper’s scope. Due to space limits, it will be evaluated in future work. The final navigation assessment will involve collaboration with clinicians. MDV will be corrected as MVD [8]. @R2: Maximum weight matching offers the keypoint-biomarker matching.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This manuscript received polarizing reviews. However, R2’s comments are in fact valid. Given that there will be discrepancy between the surfaces generated from the (static) MRI and (dynamic) RGBD camera, it is somewhat surprising to observe the results reported in Table 2.

    Reproducibility of this manuscript is also in question.

    Nonetheless, the combination of ICP and SCKF is technically interesting.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top