Abstract

Scoliosis poses significant diagnostic challenges, particularly in adolescents, where early detection is crucial for effective treatment. Traditional diagnostic and follow-up methods, which rely on physical examinations and radiography, face limitations due to the need for clinical expertise and the risk of radiation exposure, thus restricting their use for widespread early screening. In response, we introduce a novel, video-based, non-invasive method for scoliosis classification using gait analysis, which circumvents these limitations. This study presents Scoliosis1K, the first large-scale dataset tailored for video-based scoliosis classification, encompassing over one thousand adolescents. Leveraging this dataset, we developed ScoNet, an initial model that encountered challenges in dealing with the complexities of real-world data. This led to the creation of ScoNet-MT, an enhanced model incorporating multi-task learning, which exhibits promising diagnostic accuracy for application purposes. Our findings demonstrate that gait can be a non-invasive biomarker for scoliosis, revolutionizing screening practices with deep learning and setting a precedent for non-invasive diagnostic methodologies. The dataset and code are publicly available at \url{https://zhouzi180.github.io/Scoliosis1K/}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1763_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

https://zhouzi180.github.io/Scoliosis1K/

BibTex

@InProceedings{Zho_Gait_MICCAI2024,
        author = { Zhou, Zirui and Liang, Junhao and Peng, Zizhao and Fan, Chao and An, Fengwei and Yu, Shiqi},
        title = { { Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a new database for video-based method for scoliosis classification using gait analysis. And they build baseline for this database use simply CNN.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea make sense to scoliosis and after the database released it will be benefit of the committee.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The dataset is too small, and the accuracy is very high, so I am concerned about the risk of overfitting.

    The definitions of ‘positive,’ ‘neutral,’ and ‘negative’ are not clear. We need a medical explanation to understand these terms properly in the context of the study.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Not just silhouettes, but also skeleton will be more interesting for this Research topic.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The model design is not innovative enough. Temporal information appears to be crucial for this task, yet they are only using temporal pooling, which could lead to loss of information.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces “Scoliosis1K” a new dataset of gait silhouettes extracted from a monocular video for adolescent scoliosis screening. It designs and evaluates a classification pipeline consisting of person tracking, segmentation and then classification of silhouette sequences into Scoliosis positive, negative and neutral classes. The authors propose to release the code and data upon acceptance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Video based Scoliosis prescreening is an important area, which might mitigate the limitations of traditional diagnostic and follow up methods of the necessity for clinical expertise and radiation exposure, with this non-invasive method. The introduced dataset is relatively large consisting of 1000 individuals with a total of 1493 sequences, which the authors propose to release upon acceptance. Although the tracking and segmentation methods are not the most recent, they seem sufficient for the described setup (walking towards the camera on a corridor) to extract the silhouettes. The designed classification method performs well on the dataset in terms of accuracy, sensitivity and specificity. The utilized approach can be quite straightforwardly implemented in the future for clinical pre-screening.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Data: The authors do not describe nor mention if the study was approved by any ethics committee, as it is a clinical human study involving children it should have been approved by such a body. The paper neither describes if the study participants, in case of children their guardians would have signed an informed consent to participate in the study The paper should detail more the source of the ground truth diagnosis, what methods were used for it etc. The only reference for the ground truth class definition is “To address the low incidence of scoliosis in the general population and ensure a balanced dataset, individuals diagnosed with scoliosis were encouraged to contribute multiple sequences.” Which does not describe if each individual was analyzed or might be undiagnosed cases in the negative class.

    Methods: Figure 5 “heatmap visualizations for ScoNet and ScoNet-MT” is presented without context, it should be described what kind of heatmap is it, i.e. is is a GradCam based method or what method was used for this visualization.

    The ablation studies are presented without enough detail to reproduce them. It should be significantly more detailed. For example “Baseline CNN” the model is not described nor even the input (is it on image or a sequence etc.). This is true for all baseline models.

    In 4.3: “Class-imbalanced Distribution” should be described in more detail what was the actual training data. Did the authors use any methods to address the class imbalance, i.e.: weighted loss etc.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Without the code release the architecture is just vaguely described by “ScoNet utilizes a ResNet-inspired E architecture to transform participant silhouettes into 3D feature maps f…”. The actual architecture, weight used is not detailed in the paper. The ablation studies have to be described in way more detail.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper has to include the ethical approval and informed consents. The dataset description has to detail and explain more the acquisition of ground truth labels. Although the dataset is relatively extensive the following is a significant overclaim “Diversity and Generalizability: The dataset’s demographic variety ensures that developed models are robust, adaptable, and broadly applicable, facilitating generalization across different populations.” As it is limited to teenagers from a middle school in China thus it is not ensured that it would generalize well across different populations (i.e. adults from North America etc.), thus this claim should be adjusted accordingly.

    For the evaluation the paper should add the f1 score as a standard metric used in classification. The sampling of 30 random frames from the 300 frame sequences does not ensure temporal consistency of the gait patterns (in a corner case this could be only the first 2 seconds).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has to include ethics, informed consents, and more detailed description of ground truth labels. Figure 5 is presented without any context or description of methods, how was it acquired. Section 4.3 Ablation Studies has to include significantly more detail to be reproducable.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    If the authors release the code, data and include all the details they promised in the rebuttal then the paper can be accepted. Although the methods for classification could be further argued if it is the best approach, in my opinion the main contribution of this paper is the first large scale public dataset on this sub-field, which the authors promise to release.



Review #3

  • Please describe the contribution of the paper

    The paper contributes significantly to the field by introducing the first large-scale public dataset Scoliosis1K for scoliosis classification, establishing a new benchmark for the community. It introduces the novel model ScoNet for scoliosis classification through gait analysis and evolves it into ScoNet-MT to better handle real-world data complexities. The ScoNet-MT model demonstrates superior diagnostic performance compared to experienced clinicians, highlighting the potential of gait as a reliable biomarker for scoliosis and showcasing the transformative power of deep learning in healthcare diagnostics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Novel Dataset and Benchmark: The paper introduces Scoliosis1K, the first large-scale public dataset tailored for scoliosis classification, containing over 1,000 individuals. This dataset provides a new benchmark for the community and enables advanced model training and validation. 2) Innovative Gait Analysis Approach: The proposed method uses gait analysis as a biomarker for scoliosis classification, offering a novel non-invasive approach compared to traditional methods. The use of gait as a diagnostic tool represents an original way to utilize data. 3) Robust Model Development: The paper presents ScoNet and its enhanced version ScoNet-MT, which incorporate multi-task learning with gait recognition. This approach demonstrates superior diagnostic accuracy compared to conventional methods and experienced clinicians, highlighting the transformative potential of deep learning in healthcare diagnostics. 4) Strong Evaluation and Clinical Feasibility: The evaluation is comprehensive, including accuracy, sensitivity, and specificity metrics. The model outperforms seasoned clinicians, indicating clinical feasibility. Additionally, the performance of the model is tested across various class distribution ratios, demonstrating robustness in imbalanced conditions. 5) Scalable and Privacy-Preserving Tool: The proposed method offers a scalable and privacy-preserving diagnostic tool, particularly beneficial for widespread screening in resource-limited regions. This aspect addresses an important need in the field.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Limited Dataset Diversity: Although Scoliosis1K is a large-scale dataset, it may lack diversity in terms of age, ethnicity, and geographic location, potentially limiting the model’s generalizability to diverse populations. 2) Reliance on Radiography for Ground Truth: The ground truth labels for the dataset rely on radiography measurements, which are still necessary for initial diagnosis. This dependency may limit the model’s ability to replace traditional methods completely. 3) No Comparison with Other Deep Learning Methods: The paper does not compare the proposed approach with other deep learning-based methods. Direct comparisons would strengthen the evaluation.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors are advised to provide more detailed descriptions of the clinical and demographic information of the dataset, as well as to include comparisons with other existing deep learning methods.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces the first large-scale dataset (Scoliosis1K) tailored for scoliosis classification, which represents a significant contribution in the field.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have basically addressed my concerns.




Author Feedback

  1. Dataset Size and Diversity (R1, R3, R4): Scoliosis1K, the first large-scale dataset for scoliosis classification, includes over 1,000 participants and nearly 0.5 million frames, focusing on adolescent scoliosis screening. While it has limitations in representing adults and ethnic diversity, Scoliosis1K remains a pioneering endeavor. It offers a video-based solution for scoliosis screening during the crucial window for effective intervention to children. We will revise our claim to emphasize its scope and release the dataset upon acceptance.
  2. Temporal Information (R1, R3): Our design is based on two observations: a) The original iteration of ScoNet-MT using 3D convolution slightly decreased accuracy (0.2%) but tripled model size and computation cost. b) Studies like GaitSet (TPAMI 2021) and PointNet (CVPR 2017) show max-pooling effectively models dependencies between frames and points, aligning with our findings. Thus, we consider that ScoNet can capture the necessary temporal characteristics. In this context, the random sampling strategy can be viewed as a form of temporal augmentation, facilitating the acquisition of robust features along the temporal dimension. We acknowledge max-pooling is a straightforward solution and will discuss the need for better-designed modules in the conclusion.
  3. Definitions of ‘Positive,’ ‘Neutral,’ and ‘Negative’ (R1): Definitions based on the Cobb angle in Section 2.1 will be more prominently stated.
  4. Overfitting Concerns (R1): Thank you for your comments. We found that the high accuracy was due to specific class proportions (1:1:2) initially chosen to isolate the impact of class imbalance. With a realistic ratio (1:1:8), accuracy dropped to 82% (Table 4), indicating an underfitting issue. We agree and will use this ratio for evaluation and revise the manuscript accordingly.
  5. Incorporation of Skeleton Data (R1): Our past experiments with a SOTA skeleton-based method, SkeletonGait (AAAI2024), resulted in a 13.8% accuracy drop. We believe that body shape is crucial for scoliosis classification and is not captured in skeleton data. We will discuss this limitation in the revised manuscript.
  6. Ethics, Informed Consents, and Ground Truth Labels (R3): The study was approved by our Institutional Review Board, and informed consent was obtained from all participants and guardians. Ground truth labels were provided by professional doctors through thorough examinations. These details will be included in the revised manuscript.
  7. Heatmap Visualizations (R3): Figure 5’s heatmaps, using Zagoruyko et al.’s technique (ICLR 2017), highlight our model’s focus regions. While the manuscript has explained these focal regions, some technical details are missing as you pointed out. We will include them for clarity.
  8. Ablation Studies and Model Details (R3): Compared to ScoNet-MT, the Baseline CNN does not employ horizontal pooling or BNNeck and uses cross-entropy loss. Other baselines gradually add triplet loss and multi-task learning (Section 4.3). We will ensure detailed descriptions for reproducibility and release all source code upon acceptance.
  9. Class-Imbalanced Distribution (R3): Class proportions are 186:186:373 for 1:1:2, 124:124:497 for 1:1:4, and 74:74:596 for 1:1:8. We have tried various methods to solve the class imbalance problem, such as focal loss, but no satisfying results were achieved. The accuracy decreased 1.6% and more. These details will be included in the revised manuscript.
  10. Reliance on Radiography (R4): Radiography remains the gold standard for scoliosis diagnosis. Our method provides efficient, large-scale early adolescent screening to identify cases needing further radiographic investigation. We will clarify this in the revision.
  11. Comparison with Other Deep Learning Methods (R4): No prior research exists on video-based scoliosis classification. In Table 3, we compare ScoNet-MT with several baselines fairly, highlighting its innovation and superior performance.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top