Abstract

Bone age assessment (BAA) is crucial for evaluating the skeletal maturity of children in pediatric clinics. The decline in assessment accuracy is attributed to the existence of inter-gender disparity. Current automatic methods bridge this gap by relying on bone regions of interest and gender, resulting in high annotation costs. Meanwhile, the models still grapple with efficiency bottleneck for lightweight deployment. To address these challenges, this study presents Gender-adaptive Graph Vision Mamba (GGVMamba) framework with only raw X-ray images. Concretely, a region augmentation process, called directed scan module, is proposed to integrate local context from various directions of bone X-ray images. Then we construct a novel graph Mamba encoder with linear complexity, fostering robust modelling for both within and among region features. Moreover, a gender adaptive strategy is proposed to improve gender consistency by dynamically selecting gender-specific graph structures. Experiments demonstrate that GGVMamba obtains state-of-the-art results with MAE of 3.82, 4.91, and 4.14 on RSNA, RHPE, and DHA, respectively. Notably, GGVMamba shows exceptional gender consistency and optimal efficiency with minimal GPU load. The code is available at https://github.com/SCU-zly/GGVMamba.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2831_paper.pdf

SharedIt Link: https://rdcu.be/dV17z

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72086-4_22

Supplementary Material: N/A

Link to the Code Repository

https://github.com/SCU-zly/GGVMamba

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zho_Efficient_MICCAI2024,
        author = { Zhou, Lingyu and Yi, Zhang and Zhou, Kai and Xu, Xiuyuan},
        title = { { Efficient and Gender-adaptive Graph Vision Mamba for Pediatric Bone Age Assessment } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {230 -- 239}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a Gender-adaptive Graph Vision Mamba (GGVMamba) for bone age assessment based on only raw X-ray images. It proposes a graph Mamba encoder for enhancing performance by utilizing the Mamba’s strength in linear long-range attention. A gender adaptive strategy is formulated to enhance gender consistency by balancing intra-graph and inter-graph consistency.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A novel Mamba-based architecture that improves the long-range dependency learning and prediction performance.
    2. An interesting gender adaptive strategy and DGL are proposed to effectively tackle the gender error and improve the performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My main concern focus on the Mamba part, the overall justifications of this part seem not very convincible, for example:

    1. In the contribution bullet point one: “This module transforms a nondirected sequence into four directed sequences, enhances region features, and improves the generalization ability across various datasets.” This paper has not clearly introduce and justify what is the ‘nondirected sequence’ of a X-ray image after patch embedding. In my opinion, it already be directed.

    2. The justifications of the design of directed scan are not strong enough. Could you justify whether the proposed directed scan module achieves an optimal performance via current directional design?

    3. The paper claims that the four specific directions of Mamba scan can “guarantee that each bone element combines causal information by linear projection”. However, I fail to understand where does this causal information come from and why this statement is proved to be correct via current manuscript.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Could you justify whether the proposed directed scan module achieves an optimal performance via current directional design?

    2. I suggest to carefully revise each statement and make sure they are precise, carefully justified and not overclaimed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My main concern focus on the justifications in the Mamba part, referring to 6.

    I am happy to change the scale if the authors could significantly improve the precise of each justification and solve my concerns.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors included a gender adaptive strategy with low GPU utilization and two technical novel solutions in Bone Age assessment. The first is a new adaptive scan and the other novelty is the inclusion of Mamba.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    They included the new model scan and Mamba and Gender adaptive strategy.

    The authors did not include the clinical translation and visual evaluation of the network output.It should be mentioned that the clinical evaluation by the pediatric radiologist and pediatric endocrinologist.

    GGVMamba network seem useful.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Even the authors mentioned that the adaptive disparity with gender, there is still fluctuations . I could not see baseline fairness gap calculations and there is also no comparison with the “ roi segmented” models and their evaluation metrics.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    They should have mentioned their code availability and link.

    They should also have defined the datasets and their differences.

    Gender is not the only factor contributing the consistency. The authors should have mentioned the race bias and the availability of the datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Dear Authors,

    Your new algorithm and also solution to GPU utilization and the solution to annotation need are helpful. You should share the code and also mention the differences of the current datasets. In addition to gender, race is another contributing factor for bias. You should have mentioned the race factor. Please explain why you recommend zero shot learning for this medical problem. The authors should ask for visual evaluation to the medical experts.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    They implement two new model with a gender adaptive strategy.

    The evaluation metrics not comprehensively discussed and other contributing factors were not mentioned. This should be emphasized as a human in the loop system. The authors did not describe the their contribution to the field which will be required as a publication.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors had novel findings but were not clinically oriented especially gender, which is not only bias factor in this decision making. Even though they claimed to have mentioned the comparisons, it is not clear how accurate their provided segmentations compared to the hand crafted segmentations. The authors also did not include pediatric radiologists or pediatric endocrinologists.It lacks the “ trustabilityv.



Review #3

  • Please describe the contribution of the paper

    This study presents Gender-adaptive Graph Vision Mamba (GGVMamba) framework with only raw X-ray images. The target problem this study tries to tackle is to estimate the bone age by reducing inter-gender disparity automatically with a low computational cost. This method can help to avoid heavy manual annotation cost for the training, while achieving a high accuracy with robustness and low computational cost.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The purpose of this study is very clear, and that is to identify the bone age in pediatric procedure with a light weight network and no prior annotation cost.
    2. As claimed by the authors, inter-gender disparity is traditionally a main factor to hinder the AI performance on BAA; therefore, GGVMamba is proposed to achieve this goal by leveraging Mamba technique from NLP research. 3.The ablation analysis shows the methodology can achieve the goal by combining the 3 modules.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In general , most technical parts are well written, but they appeared before in literature [10]. I am sorry if the authors are the same as those of this reference. However, it seems that the two articles are not of the same writing style.
    2. The logic of applying a successful methodology (Mamba) in NLP to the medical image processing is not clearly stated. Of course, language, audio, and genomics are fields with strong contextual implications and this is the main reason that Mamba could be useful to work well instead of a transformer like technique. However, BAA with X-ray images are not quite adapted to this causal reasons. Anyway, it does not mean the method cannot work or is not without ground.
    3. While the model is efficient, the complexity of the graph structures and adaptive strategies might affect interpretability, which is important in a clinical context
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    If it is possible to release the codes with a github link after acceptance of this paper, i.e. the original Mamba code etc., it will be much convenient for the readers to reproduce the entire experiments because it seems this is not a traditional DNN for readers to reproduce in one-shot.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors introduce the pioneering Graph Vision Mamba network for BAA, achieving robust high accuracy with a one-stage, low-annotation, and computationally efficient approach. GGVMamba effectively integrates the highly heterogeneous epiphyseal regions, and addresses gender consistency in bone age X-ray images. Furthermore, GGVMamba illutsrates the method’s power on three benchmark datasets by employing patch-level data augmentation.

    1. The intermediate state h(t) is not quite well defined, thereby making the paper not quite reading friendly.
    2. In an X-ray test, it is sure that everyone will input their gender as an automatic label if my understanding is correct. So I am not quite convinced why the inter-gender disparity is a major issue in BAA tasks with deep-learning.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The topic is important and interesting, and the technique from NLP field is useful. But it is not with strong ground that the Mamba technique is very suitable for BAA in pediatric tasks. The detailed parts of the formula, network structures, and training procedure in this paper needs further refreshing for clarification before acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I learned some stuff from one reviewer, but it is clear that the author’s rebuttal is correct with “four complementary sequence arrangements containing four types of causal information”. Therefore, I still suggest acceptance of this paper although some statements can be revised to be more precise in the near future.




Author Feedback

First and foremost, we would like to extend our sincere gratitude for the time and effort you have dedicated to reviewing our manuscript. We will explain your concerns point by point. In response to reviewers’ common concerns, the code will be made publicly available upon acceptance. To R1: 3.1: We appreciate your comment and need to clarify that most technical parts, such as the directed scan module, dynamic graph layer, and gender adaptive strategy, are orthogonal to the content in literature [10]. 3.2: Based on literature [13], interconnected regions of interest in the BAA task are related to bone age. We understand your concern, but we propose that BAA with X-ray bone images is indeed adapted to casual reasons. Thus, Mamba is appropriate for the BAA task. 3.3: The gender-specific graphs M and F in GGVMamba offer partial interpretability. Our future research will prioritize improving interpretability. 7.1: Thank you for your suggestion. The h(t) signifies the intermediate latent state in the transition from x(t) to y(t).
7.2: The reason for the importance of inter-gender disparity is that, based on clinical experience [9], the local information of hand X-rays for different genders varies. While existing studies use gender as an explicit input [2,5,6,8,24], they lead to model degradation. Making inter-gender disparity a learnable factor improves robustness, as validated by our experiments. To R3: 3: (1) The fluctuations are due to domain biases among the three datasets. Nevertheless, GGVMamba outperforms the SOTA models. (2) We appreciate your reminder. PEAR-Net [16] is the baseline. Table 1 shows GGVMamba’s performance gains. (3) We believe that the results for “roi segmented” models [2,5,8] are listed in Table 1. 6 and 7: (1) We have stated the public availability of the datasets at the end of the introduction, “we gathered data from three public datasets…”. (2) The race bias and details describing these datasets are documented in their respective references [5,7,12]. (3) We recommend zero-shot learning because GGVMamba demonstrates competitive generalization performance. In future work, GGVMamba could be applied to other unseen radiology tasks, and we will invite medical experts to conduct visual evaluations. 9: This is a valid assessment, but we claim that we have discussed the evaluation metrics in sections 3.2 and 3.3 and detailed our contributions in the introduction. To R4: 3.1: The concept of “directed” is narrowly defined, indicating the order of semantic relevance information among patches. It is pertinent to mention that natural language possesses an inherent semantic sequence order, whereas visual data lacks this order [17]. Consequently, from this standpoint, an X-ray image after patch embedding is nondirected. 3.2: Our design includes four directed sequences that retain complementary spatial information. Unlike the Cross Scan [17], we guarantee spatial adjacency for each patch. Unlike the BiDirectional Scan [25], the directed scan module utilizes four complementary traversal paths to address the limited contextual awareness of S6 in linear complexity. The ablation experiments confirmed that this design enhances performance. 3.3: (1)(What is causal information?) We characterize semantic relevance information as causal information, which relies on the assumption that sequential dependencies exist within the patch sequence. According to [13], X-ray patches exhibit causal information due to the interrelated structure of bone elements. (2)(Why is this statement correct?) The directed scan module represents a linear mapping of bone elements in different sequential orders, with four complementary sequence arrangements containing four types of causal information. Unfolding image patches into sequences along four specific traversal paths enables the vision Mamba block to effectively integrate causal information from different directions, helping to establish a global receptive field [17] for each bone element.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper introduces GGVMamba, a novel model for Bone Age Assessment using X-ray images, featuring a directed scan module, dynamic graph layer, and gender adaptive strategy to enhance performance and interoperability. Also I agree with meta reviewer 4

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper introduces GGVMamba, a novel model for Bone Age Assessment using X-ray images, featuring a directed scan module, dynamic graph layer, and gender adaptive strategy to enhance performance and interoperability. Also I agree with meta reviewer 4



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduced a Graph Vision Mamba module and a Gender-adaptive strategy to reduce the computational cost and gender bias. There are some minor concerns about experiment settings, motivation, and analysis of results. Those concerns are addressed in the author’s rebuttal. Despite that the major improvement in performance gains from the inclusion of Mamba, an existing method, the whole paper is well-written and the experiments are solid.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper introduced a Graph Vision Mamba module and a Gender-adaptive strategy to reduce the computational cost and gender bias. There are some minor concerns about experiment settings, motivation, and analysis of results. Those concerns are addressed in the author’s rebuttal. Despite that the major improvement in performance gains from the inclusion of Mamba, an existing method, the whole paper is well-written and the experiments are solid.



back to top