Abstract

Body bradykinesia, a prominent clinical manifestation of Parkinson’s disease (PD), characterizes a generalized slowness and diminished movement across the entire body. The assessment of body bradykinesia in the widely employed PD rating scale (MDS-UPDRS) is inherently subjective, relying on the examiner’s overall judgment rather than specific motor tasks. Therefore, we propose a graph convolutional network (GCN) scheme for automated video-based assessment of parkinsonian body bradykinesia. This scheme incorporates a causality-informed fusion network to enhance the fusion of causal components within gait and leg-agility motion features, achieving stable multi-class assessment of body bradykinesia. Specifically, an adaptive causal feature selection module is developed to extract pertinent features for body bradykinesia assessment, effectively mitigating the influence of non-causal features. Simultaneously, a causality-informed optimization strategy is designed to refine the causality feature selection module, improving its capacity to capture causal features. Our method achieves 61.07% accuracy for three-class assessment on a dataset of 876 clinical case. Notably, our proposed scheme, utilizing only consumer-level cameras, holds significant promise for remote PD bradykinesia assessment.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4133_paper.pdf

SharedIt Link: https://rdcu.be/dV5vY

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72089-5_8

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Qua_CausalityInformed_MICCAI2024,
        author = { Quan, Yuyang and Zhang, Chencheng and Guo, Rui and Qian, Xiaohua},
        title = { { Causality-Informed Fusion Network for Automated Assessment of Parkinsonian Body Bradykinesia } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {78 -- 88}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a graph convolutional network (GCN) scheme for automated video-based assessment of parkinsonian body bradykinesia.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A causality-informed fusion network is proposed to enhance the fusion of causal components within gait and leg-agility motion videos. The model achieves 61.07% accuracy for three-class assessment on a dataset of 876 clinical case.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper made up the concept of PD body bradykinesia assessment and claimed as the first attempt, but eventually refers to a simplified UPDRS grading. UPDRS assessment was well-studied with the same video modalities (gait/leg agility) introduced in the paper. Although multimodal methods are generally expected to achieve better performance than a single modality, the state-of-the-art methods have demonstrated much better performance, even with a single modality or less data. It is unclear why authors does not refer to the UPDRS for evaluation and why other state-of-the-art methods, even with very similar structure of pose+GCN+feature engineering, were not mentioned or compared.
    2. The introduced challenge/motivation of “non-causal features” was not clearly defined, but only vaguely described as “irrelevant to body bradykinesia assessment”. To achieve this, the authors proposed an “Adaptive Causal Feature Selection Module”, but to the reviewer’s understanding, the component is working as a feature selection step that extracts higher contribution/weighted feature channels, which is not necessarily causal. There was not an identified confounder or backdoor adjustment process involved and the claim of “causal” is not sound.
    3. The overall paper is loosely developed with significant lack of clarity. There are independent classifiers ℎ ̂𝑡 in the frameworks that are trained separately, but the training steps and performance was not reported. The design of the causal/noncausal classifiers ℎ ̂𝑐 and ℎ ̂𝑛𝑐 are also not described, and how the output differs from each other was not accounted for. The dataset has 293 patients but there are 876 labeled sets, meaning each patient has differently labeled video sets and the label could include subjectivity of the grader. The choice of backbone EfficientGCN showed relatively low performance in ablation study and other choices were not experimented with.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should carefully revise the claims of this paper as discussed above
    2. The comparison with single modality, SOTA methods like 10.1109/TMM.2021.3068609 and 10.1109/TNSRE.2020.3039297. These two papers shares high similarity but demonstrated much better performance.
    3. Add comparisons with other video motion recognition frameworks like X3D, TimeFormer, etc. and consider commonly-adopted transfer learning strategy
    4. Report AUC and consider reporting a holdout testing result for demonstrating generalizability
    5. Address the clarity issues and provide reproducibility information
    6. Table 1 does not have std information as in 2 & 3. It has to be consistent if repetition is performed
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the paper presents a network design for multimodal video assessment of PD that can be of practical value, the claimed contribution is not justified and the model performance was neither outstanding nor thoroughly evaluated against other SOTA. It requires significant amount of improvements to meet the expectations of MICCAI conference.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    More clarity has been seen through rebuttal but still suffers some issues.

    In the paper, the patients are asked to “perform gait and leg-agility tasks for both left and right legs”, but in the comment bulletpoint 3, the authors stated “this work tackles a more challenging task due to the absence of specific examination action”. This is inconsistent.

    My specific question regarding h_t and y_t was on, whether the leg agility/gait task has separately graded labels. From the rebuttal it sounds like the overall pipeline was using the final label, without the intermediate task label. However in the paper, “…classifier and label for the corresponding task” was presented as available. This is still unclear and makes a huge difference.

    The common concerns regarding performance is not well addressed. Both reviewers mentioned two papers (leg agility & gait) that utilized 4-class scoring and achieves much better performance. 3-class is substantially easier than 4-class and the reviewer is not optimistic about the performance of a 4-class task with this pipeline.

    The authors fail to understand key concepts of causal inference. “features relevant to bradykinesia symptoms in each modality are considered causal, while the rest are non-causal (i.e., the confounding factors), as they fail to contribute to body bradykinesia assessment.” Confounderrs are hidden states and deconfounding requires the removal of these confounders. However, the features all participate in the final classification with only CLS loss as objective. The overall pipeline is deep-learning based, and by only measuring the contribution of each channel, there is no control (nor demonstration, as mentioned by other reviewers) of what is learned through these modules. The design, training, and evaluation of this aspect of model is ineffective.

    I’d like to keep my rating considering some issues have been addressed but more critical ones remains to be clarified.



Review #2

  • Please describe the contribution of the paper

    The manuscript presents a novel approach to automated assessment of body bradykinesia in Parkinson’s Disease (PD) patients using a causality-informed fusion network integrated with a graph convolutional network (GCN). This system processes video-based motion data with an adaptive causal feature selection module aimed at enhancing the accuracy of PD severity classification by effectively mitigating non-causal features. Notably, the study leverages consumer-level camera technology, thus broadening the potential for remote clinical applications and accessible PD monitoring.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed causality-informed fusion approach for feature selection is a clear technical innovation. It potentially increases the relevance and effectiveness of features used for bradykinesia assessment, which could lead to more precise diagnostics.

    2. The method’s use of consumer-level camera equipment for capturing video data could significantly increase the accessibility and ease of use of this diagnostic tool, especially in low-resource settings or in remote patient monitoring scenarios.

    3. The paper is generally well-written and organized. The use of figures and tables effectively aids in understanding the complex methodologies and the results of the implementation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The manuscript claims that “To the best of our knowledge, there is currently no existing research on automated assessment for body bradykinesia” (Section 1). However, it seems overstated to claim that there are no existing automatic assessments for body bradykinesia. The literature [1,2] does include methods that could be considered similar or indirectly related to the work at hand, necessitating a more precise definition of the novelty in relation to existing studies.

    2. The reported classification accuracy of 61.07% might be too low for clinical application, suggesting a need for further model refinement and validation against higher benchmarks to ensure clinical relevance and reliability.

    3. The comparison with existing methods [1,2] seems insufficient. Enhancing this analysis could better position the paper within the current research landscape, providing a clearer indication of its contributions and limitations.

    [1] Lu, M. et al., Vision-based estimation of MDS-UPDRS gait scores for assessing Parkinson’s disease motor severity, MICCAI 2020 [2] Guo et al., Sparse adaptive graph convolutional network for leg agility assessment in Parkinson’s disease, TNNLS2020

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The literature review should be expanded to accurately reflect the current state of research in automated Parkinson’s disease assessments. This includes acknowledging related works and clearly stating the unique contributions of the proposed causality-informed approach.

    2. Broaden the validation of your model by including more diverse datasets and comparisons with multiple expert assessments. This could help establish the reliability and accuracy of the proposed system across different populations and clinical settings.

    3. Include a more comprehensive comparative analysis with existing methods. This analysis should not only compare the performance metrics but also discuss the differences in methodology and the potential advantages of the causality-informed approach regarding model interpretability and clinical applicability.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a novel causality-informed fusion network for assessing Parkinsonian bradykinesia using video data, a promising innovation in medical diagnostics. The use of consumer-grade cameras enhances the accessibility of this technology, potentially democratizing advanced healthcare solutions. However, the reported accuracy (61.07%) requires improvement to meet clinical standards, suggesting further model optimization.

    The paper’s claim of being the first in automated bradykinesia assessments contradicts existing literature, indicating a need for a more accurate review of related works. This overstatement of novelty is a critical issue that needs addressing to ensure the paper’s integrity and scholarly value.

    Given these considerations, I recommend a “weak accept” with substantial revisions focused on expanding comparative analyses, and correctly situating the work within existing research.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The dataset is good. However, my main concern is still on the effectiveness of the proposed method since even a random guess can achieve ~33% for a 3-class classification problem. I keep my original recommendation.



Review #3

  • Please describe the contribution of the paper

    The paper develops an automated assessment of bradykinesia, which is a critical aspect of Prakinson’s disease diagnosis and treatment. The authors utilize a video based approach and develop a causal model to perform 3 class classification of body braddykinesia in a group of 876 subjects.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of the paper are:

    1. The paper develops a comprehensive causal model to automate the classification of bradykinesia, which is important for Parkinson’s disease treatment. The developed method is comprehensive and well motivated.

    2. The developed dataset is well compiled with various tasks related to bradykinesia being carefully recorded. The dataset alone is a strong contribution of this study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses of the paper are:

    1. The authors develop a causal model to identify causal and non-causal features of body bradykinesia and perform classification. It is unclear what the causal features look like and how they can translate to the actual features seen in the person. In other words, improving the explainability of the model would strengthen the paper.

    2. The model developed utilizes a causal fusion network. It would also be beneficial to understand how the networks described in the literature review perform with the collected dataset.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The paper is well written with a detailed mathematical description of the entire algorithm. However, an explanation regarding the various features/outputs seen from different parts of the network would be helpful to further understand the model.

    2. A quick comparison using vanilla CNNs would also serve as a validation of the importance of utilizing graph neural networks for classification of bradykinesia.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main factors leading me to my score are:

    1. The paper is well written, with a focus on the overall method and approach. A detailed mathematical description is a strong contribution.

    2. The developed dataset has the potential to be very useful and convinces me of the rigorous nature of this study.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed my concerns well. I recommend acceptance.




Author Feedback

We thank all reviewers for insightful comments and are especially grateful to R1 and R4 for recognizing our work. The source code will be released in our final version. 1: Explanation of “the first in body bradykinesia assessment” (R1) To the best of our knowledge, we are the first team to explore video-based assessment of PD body bradykinesia (i.e., global spontaneity of movement), delineated in item 3.14 in MDS-UPDRS. Existing studies focus on the assessment of other tasks, such as gait (item 3.10) and leg agility (item 3.8). Following your advice, we will revise the term “first” to avoid misunderstanding. 2: Multimodal methods instead of single modality (R3) The key challenge in automated assessment of body bradykinesia is “absence of specific examination tasks” (1st paragraph of Introduction). According to the MDS-UPDRS, this scoring is based on the rater’s “global impression after observing for spontaneous gestures while sitting, and the nature of arising and walking”. Thus, multimodal schemes outperform single modality due to more comprehensive information. We empirically and experimentally determine leg agility (sitting) and gait (walking) as the inputs. 3: Comparison with SOTA and low performance (R1, R3) Compared to recent studies like TNSRE.2020.3039297 (67.59% for leg-agility) and TMM.2021.3068609 (65.66% for gait), this work tackles a more challenging task due to the absence of specific examination actions. As a pioneering attempt to utilize multimodal video information for indirect assessment of body bradykinesia, the performance achieved is acceptable. 4: Training steps and performance of h ̂_t (R3) The training of h ̂_t is guided by L_cls (Eq. 2) and conducted in Eq. 10, which should be more clearly written as min⁡{E,h ̂_c,h ̂_nc,h ̂_t}. h ̂_t is set to assist E in capturing modality-related motion features. Thus the performance of h ̂_t is not the main focus of this paper, so it was not reported. 5: Why use EfficientGCN as the backbone (R3) Additional experiments (Table 3) show the performance improvement when introducing the proposed modules (i.e., our innovations) into various backbones. EfficientGCN is selected as the backbone due to its advantages in deployability (1/9 FLOPs compared to 2s-AGCN and 1/12 FLOPs compared to MS-G3D). 6: The details of h ̂_c & h ̂_nc (R3) They are both constructed based on a 3-layer MLP with independent parameters. The outputs of h ̂_c and h ̂_nc are prediction logits for the causal and non-causal parts respectively. 7: Each patient has differently labeled videos (R3) Each patient was in distinct states (ON&OFF medication and/or ON&OFF deep brain stimulation) at different follow-up times, and scores were given independently by experienced raters. In cross validation, videos from the same patient are assigned to a single fold to avoid data leakage. 8: Subjectivity of the grader (R3) The videos were captured in “real clinical settings” and scored by the board-certified movement disorders neurologists with over 8 years of clinical experience in PD. Therefore, the label of each video is reliable. 9: The claim of “causal” is not sound (R3) Following your advice, it is better to term the features separated by M as causality-driven features. Here, features relevant to bradykinesia symptoms in each modality are considered causal, while the rest are non-causal (i.e., the confounding factors), as they fail to contribute to body bradykinesia assessment. For example, channels in F_GA representing the continuity of arm swing are causal, while channels represent the walking action itself are considered non-causal. The proposed scheme separated causal features by measuring the contribution of each channel, which is an effective approach of causal intervention at channel level. 10: The explainability of our model (R4) The explainability is reflected in the definition of “causal/non-causal features” (response to Q9). We will demonstrate the explainability through visualization in the future.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper received mixed reviews. But after careful consideration, it appears that the authors have addressed most of the concerns in the rebuttal! They are advised to address the reviewers’ concerns in the final paper and also compare conceptually with the relevant prior work suggested by the reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper received mixed reviews. But after careful consideration, it appears that the authors have addressed most of the concerns in the rebuttal! They are advised to address the reviewers’ concerns in the final paper and also compare conceptually with the relevant prior work suggested by the reviewers.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers generally agree that the paper is clearly written, the causality-informed fusion network is novel, and the application to video-based assessment of body bradykinesia is clinically relevant. However, the authors should further clarity experimental details (R3), acknowledge limitation to handling confounders (R3), properly position the work within existing literature (R3), and generally avoid over-claiming contribution.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers generally agree that the paper is clearly written, the causality-informed fusion network is novel, and the application to video-based assessment of body bradykinesia is clinically relevant. However, the authors should further clarity experimental details (R3), acknowledge limitation to handling confounders (R3), properly position the work within existing literature (R3), and generally avoid over-claiming contribution.



back to top