Abstract

Phantom-less volumetric bone mineral density (vBMD) measurement using computed tomography (CT) presents a cost-effective alternative to conventional phantom-based approaches, yet faces accuracy challenges across varying tube voltages. Current deep learning-based phantom-less solutions frequently overlook the critical role of frequency variance—a crucial factor for precise BMD measurement and cross-voltage generalization. We present a lightweight CT-based phantom-free vBMD measurement framework that addresses critical limitations in cross-voltage generalization. Core innovations include: (1) Frequency-balancing feature modulation with multi-band fusion, preserving spectral measurement cues; (2) A dual-branch architecture combining domain-specific convolutions with cross-frequency interaction; and (3) Asymmetric channel attention, which allocates attention weights based on frequency characteristics, enabling adaptive emphasis on critical low- and high-frequency components. Comprehensive evaluations across 80, 1

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1213_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaMen_MultiTubeVoltage_MICCAI2025,
        author = { Zhang, Mengze and Li, Yali and Yuan, HuiShu and Qian, Zhen},
        title = { { Multi-Tube-Voltage vBMD Measurement via Dual-Branch Frequency Balancing and Asymmetric Channel Attention } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {446 -- 456}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    To calculate the variation in BMD values according to changes in kVp, a deep learning pipeline was constructed by separating the low-frequency and high-frequency components of the images. The evaluation was conducted using the most commonly applied kVp levels—80, 100, and 120 kVp—and a deep learning-based solution was proposed to correct BMD values without the use of phantoms. The performance evaluation demonstrated that the proposed method achieved excellent generalization capability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The approach of correcting vBMD acquired under different kVp conditions through the frequency domain can be considered novel. In this study, low-frequency and high-frequency components were extracted and processed using a dual-branch architecture. Instead of employing complex frequency processing techniques, the study adopted a method based on average pooling, which improved computational efficiency.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    [1] It is necessary to verify how the frequency decoupling method used in this study performs compared to other commonly adopted approaches, such as wavelet transform. Furthermore, the study should provide a clear explanation as to why frequency decoupling using only pooling operations was sufficient.

    [2] The ratio of 80, 100, and 120 kVp images in the evaluation dataset should be disclosed. Since this study conducted comparative evaluations using images acquired at different kVp levels to match the target BMD value, it is essential to specify the number of images per kVp condition and provide details on the acquisition parameters of the dataset.

    [3] The Discussion section should include an analysis of the study’s limitations. The following points may be considered as potential limitations and should be addressed in the manuscript: (1) The model’s performance under 80–100–120 kVp conditions may vary depending on the CT scanner used. (2) The study results may differ depending on patient factors such as age or sex. (3) The performance under 80 kVp appears to be the most unstable, which should be acknowledged as a limitation.

    [4] In Fig. 2, each subfigure (a–c) should be explicitly referenced and described in the text to enhance clarity.

    [5] Although ResNet-10 and OctResNet-10 were selected as baseline networks for comparison, the rationale behind choosing these specific architectures was not clearly explained and should be provided.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This study presents a meaningful contribution in that it improved the accuracy of bone mineral density estimation by utilizing deep learning in the frequency domain to process images acquired under different kVp conditions. However, the justification for the method used to extract frequency components, as well as its evaluation and analysis, requires further clarification. In addition, the study lacks a discussion on the limitations that may arise from these aspects. Therefore, the overall recommendation is a weak reject.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The manuscript “Multi-Tube-Voltage vBMD Measurement via Dual-Branch Frequency Balancing and Asymmetric Channel Attention” describes a AI-method to compute the volumetric BMD from CT scans at 80, 100 and 120 kVp without using any calibration phantom (phantom-less). They found an improvement compared with several other approaches and achieve an error of 6 to 7 mg/cc, which is just little above the commonly used 5 mg/cc threshold for phantom-based calibrations, which means, they got good values for a phantom-less approaches.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors designed a new NN for the desired task. They treat low- and high- frequency features differently. They have a good database of patient data. They not only trained for 120 kVp, but also for 100 and 80 kVp. They good good results.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The authors didn’t mention the resolution, noise (mAs) and reconstruction kernels. All of those factors drive the high-frequencies, and therefore it not clear to which amount the method applies.
    • It seems that the authors only compute a single vBMD, therefore the method is not applicable to quantititive microstructural analysis, such as computation of Tb.Sp or TMD. please add a sentence to make this clear.
    • The interpolated scans between non-120kVp to 120kVp linearly. However, the relationship is not linear, which is just the meaning why we want to examine at different energies and of dual or multiple energy CT (DECT, PCCT, DXA, etc). Please state this issue and provide an analisis of the error introduced, and/or improve the mapping, or remove for now 100 and 80 kVp from your analysist or simply mention this as the possible source imperfection at 100 and 80 kVp and improve this in a lated publication.
    • Figure two mentions 4 models, but I find only a to c, also it not entirely clear what is b and c.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach is sound, writing is clear, results are good.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The study contributes to osteoporosis diagnostics and other fields benefiting from accurate bone mineral density (BMD) estimation. It focuses on estimating volumetric (vBMD) with phantom-less calibration from CT images. vBMD estimated from CT can be used directly to predict e.g. hip fracture risk such as in (Li et al 2025: 10.1016/j.bone.2025.117431) or to estimate bone stiffness used in finite element simulations of the bones such as in (Yosibash et al., 2020: 10.1016/j.camwa.2020.03.012), but these and most of the phantom-less calibration techniques focus on CT images taken with a single specific energy, most often 120keV. Therefore, authors’ contribution to provide a technique that can estimate vBMD from CT images with different tube voltages ranging from 80 to 120 kV is very welcome.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The architecture authors propose has two branches, of which one focuses on higher frequency components, mostly focusing, according to authors, on trabecular bone texture and low frequency component focusing on global average intensities. This design is adequate and interesting considering the anatomical structure of the bone. Indeed, simple averaging over the trabecular structure may not be enough to capture in detail the vBMD on the trabecular bone, but the structure itself may have a nonlinear role over the tube voltages on the real BMD, e.g. ash density.

    The authors use a large external test set from another hospital for validation and for calculation of the final results.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The approach seems to generate one estimate of vBMD for the input image, e.g. for a vertebra and does not, as I understood, directly work on mapping HU values voxel-by-voxel to BMD values. Therefore, it does not directly support e.g. phantom-less automated finite element simulations. This is something that remains for future studies to address. However, getting BMD values for a vertebra in the same fashion as DXA-scanner provides is itself significant, e.g., for opportunistic screening of bone densities.

    Authors did not address 140 keV tube voltage, even though this tube voltage is used for the most obese patients, e.g. in body CT scans. Body CT is among the most common CT examinations where vertebrae are visible and therefore potential for opportunistic screening of osteoporosis. With the current approach, the most obese patients would be left out from opportunistic screening.

    For tube voltages with the largest difference to the gold standard tube voltage (i.e. 80 keV) the traditional linear-correlation-based method provides the best results. Therefore, the method may not be in its current form fully applicable for clinical practice, but its contribution is more in giving new insight into how the phantom-less calibration could be solved for multiple tube voltages.

    Authors focus their validation only on vertebrae images, whereas the most significant contribution for bone densitometry may lie in proximal femur, since those osteoporotic fractures are the most devastating for the patients and it has been shown in numerous studies that fractures on specific anatomical location are best predicted with BMD measured from the same anatomical site.

    The authors state the network is lightweight, but they do not provide any quantitative metrics or comparison with the other relevant DNNs to prove that.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Abstract: please give (PL) notation where phantom-less is used first time if you want to use an abbreviated version later in the abstract. Abstract: Authors could mention already in the abstract, that the method validation focused on vBMD measurements from the vertebra.

    Fig 2: caption could also describe global average pooling (GAP) and global max pooling (GMP).

    Section 2.3: authors might explain small sigma alongside other operators and functions.

    Section 3.1: Could the number of images at each tube voltage be provided for the datasets?

    Table 1: please give the unit of MAE.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Phantom-less multi-tube voltage calibration of vMBD from CT is e.g. for opportunistic screening of low bone mineral density a clinically significant task which has not yet been solved satisfactorily. Authors propose an interesting architecture designed for accounting anatomical characteristics of the bone to address this issue. Therefore, the study can give significant contribution to the field even though the highest tube voltages were not addressed and the results for the lowest tube voltages did not improve compared to the traditional approach.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Response to Reviewers Data Acquisition Details: Due to page constraints, we initially omitted specific information regarding the data acquisition process, including scanning protocols and kVp dataset ratios. These details will be included in the revised manuscript. Limitation in Microstructural Analysis: We will clarify that this study focuses primarily on volumetric bone mineral density (vBMD) rather than microstructural metrics such as trabecular spacing (Tb.Sp). Rationale for Linear Interpolation: Regarding the use of linear regression for kVp mapping, we acknowledge the inherent non-linear relationship between kVp and HU values, which is a recognized challenge in multi-kVp CT research. The decision to use linear interpolation was based on two key factors: Baseline Benchmarking: Initial benchmarking using conventional HU-to-BMD conversion via internal references revealed a high mean absolute error (MAE). Using linear regression to map non-120 kVp HU to pseudo-120 kVp HU provided a more reliable baseline, which performed well in cross-validation. Literature Precedent: Previous studies, such as Nakaura et al. (2014), have validated the use of linear regression for HU normalization in low-dose CT [1]. These points will be explicitly discussed in the Methods section, with the limitations of linear modeling acknowledged in the Discussion. Performance at 80 kVp: Thank you for your comment regarding CNN performance at 80 kVp. It is true that traditional linear methods often perform better in this scenario, as they prioritize low-frequency trends and suppress high-frequency noise. Our method, designed for joint low/high-frequency optimization, is more sensitive to noise at extreme voltages. We recognize this limitation and plan to explore noise-robust architectures in future work to enhance performance at these voltage settings. Frequency Decoupling via Pooling Operations: We apologize for the insufficient explanation regarding frequency decoupling through pooling operations. To clarify, average pooling acts as a low-pass filter, capturing low-frequency components, while residual connections (x - Pool(x)) serve as high-pass filters, emphasizing fine details. This approach is inspired by frequency-domain feature separation techniques [2, 3]. We will expand on this explanation in the Methods section, providing both theoretical justification and empirical evidence. Figure Clarity and Baseline Selection: We will improve the clarity of Figure 2 by adding explicit subfigure descriptions for panels (a–d). Regarding baseline selection, ResNet-10 was chosen for its simplicity and robustness in feature extraction. On the other hand, OctResNet-10 was used to benchmark frequency-domain strategies, which align with our dual-branch design. We will include a more detailed rationale for these choices in the Experimental Design section. Study Limitations: Due to data constraints and the scope of our study, certain limitations include the lack of 140 kVp validation, limited anatomical coverage, restricted patient demographics, and the absence of alternative frequency methods. Despite these limitations, our approach introduces a novel phantom-less multi-kVp calibration framework. We will explicitly acknowledge these limitations in the Discussion, and outline future directions, including multi-center validation at 140 kVp, femoral BMD modeling, and the development of adaptive demographic architectures. References: [1] Nakaura, T., et al. “Low-dose abdominal CT protocols with 100 kVp or 80 kVp.” Clinical Radiology, 2014, 69(8):804–811. [2] Chen, Y., et al. “Drop an octave: Octave convolution for CNNs.” ICCV, 2019:3435–3444. [3] Yi, Z., et al. “Contextual residual aggregation for image inpainting.” CVPR, 2020:7508–7517.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top