Abstract

Cardiovascular Diseases (CVDs) stand as the primary global cause of mortality, with Abdominal Aortic Calcification (AAC) being a stable marker of these conditions. AAC can be observed in Dual Energy X-ray absorptiometry (DXA) lateral view Vertebral Fracture Assessment (VFA) scans, usually performed for the detection of vertebral fractures. Early detection of AAC can help reduce the risk of developing clinical CVD by encouraging preventive measures. Recent efforts to automate DXA VFA image analysis for AAC detection are restricted to either predicting an overall AAC score, or they lack performance in granular AAC score prediction. The latter is important in helping clinicians predict CVD associated with the diminished Windkessel effect in the aorta. In this regard, we propose a hybrid Feature Pyramid Network (FPN) based CNN-Transformer architecture (Hybrid-FPN-AACNet) that employs a novel Dual Resolution Self-Attention (DRSA) mechanism to enhance context for self-attention by working on two different resolutions of the input feature map. Moreover, the proposed architecture also employs a novel Efficient Feature Fusion Module (EFFM) that efficiently combines the features from different hierarchies of Hybrid-FPN-AACNet for regression tasks. The proposed architecture has achieved State-Of-The-Art (SOTA) performance at a granular level compared to previous work.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1214_paper.pdf

SharedIt Link: https://rdcu.be/dV57X

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72120-5_2

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1214_supp.pdf

Link to the Code Repository

https://github.com/zaidilyas89/Hybrid-FPN-AACNet

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Ily_AHybrid_MICCAI2024,
        author = { Ilyas, Zaid and Saleem, Afsah and Suter, David and Schousboe, John T. and Leslie, William D. and Lewis, Joshua R. and Gilani, Syed Zulqarnain},
        title = { { A Hybrid CNN-Transformer Feature Pyramid Network for Granular Abdominal Aortic Calcification Detection from DXA Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {14 -- 25}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a method for regressing the clinical scoring metrics of Abdominal Aortic Calcification (AAC) which is a key marker in cardiovascular disease. Such scores are typically made manually by clinicians through DXA VFA image analysis, but also have been automated by modern CNN approaches. Previous methods have regressed the final score, but the authors stress that regressing the intermediate scores is also of high clinical importance. For this task they propose a CNN-Transformer model with a number of upgrades designed specifically for this task: (1) a dual resolution self attention block to integrate both high level and low level features; (2) a feature fusion module to combine features from multiple resolutions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is clear, concise and generally well written.

    The authors base their dual resolution self attention module (DRSA) on an existing method called HiLo attention. They make a minor adjustment in the number of patches but this is well justified in their goal to expand the spatial context. The ablation study shows that this minor adjustment actually makes a good improvement to performance.

    The authors show clear improvements in performance against Gilani et al and Saleem et al which are two notable works for this dataset.

    Ablation results are well presented and clearly show improvements in performance when DRSA and EFFM are used.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper lacks any comparison with other transformer-CNN hybrids commonly used in medical image analysis such as TransUNet [1], UTNet[2], Unetr [3], CoTR [4]. As the three listed contributions of the paper are improvements in the design of the transformer-CNN architecture, then this comparison is essential. This leaves a big question as to whether there is any advantage to use this method over other others.

    Performance metrics are not defined or described in experiment section.

    There is a lack of information about the datasets including source (may be withheld to preserve anonymity), preprocessing and training splits.

    [1] Chen et al, 2021, “TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation”, Arxiv [2] Gao et al, 2021, “UTNet: a hybrid transformer architecture for medical image segmentation” MICCAI [3] Hatamizadeh et al, 2022, “Unetr: Transformers for 3d medical image segmentation”. WACV [4] Xie et al, 2021, “CoTr: Efficiently bridging cnn and transformer for 3d medical image segmentation,” MICCAI.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Details on the source of the dataset and training splits are not disclosed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, a good paper which I enjoyed reading. The problem statement and solution are technically interesting and sound, however it is let down by limited comparisons to general medical transformer-cnn hybrids.

    Other remarks:

    • Spelling mistake in section 3: ‘sore’ should be ‘score’.
    • A description of the metrics used should be included.
    • Table 3, which metric is being compared?
    • I assume that the source of the dataset has been withheld to preserve anonymity, but please make sure that you include this in the camera ready version if accepted.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The problem is well defined, and the solution is theoretically solid and well motivated. However, I am concerned by the absence of any comparison to other medical transformer-cnn hybrids. For a paper that lists its contributions as technical changes to the architecture, these comparisons are essential.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My main concern is that the contributions listed are improvements in the CNN-Transformer architecture, yet there is no comparison to other CNN-Transformers in literature. The authors argued that their model is regression based, and the comparisons suggested were segmentation based. While that is true, I am not convinced that the various mechanisms to combine CNN and Transformer significantly change across different tasks. However, I am happy to increase my score to a weak accept, as the authors make a good argument as to why direct comparison is not straight forward without modifications.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a hybrid CNN-Transformer network for detecting the Abdominal Aortic Calcification (AAC) and evaluating the AAC’s severity in the vertebral fracture assessment (VFA) scans. In the method, a novel dual resolution self-attention (DRSA) mechanism and an efficient feature fusion module (EFFM) are proposed to improve network performance. The author conducted comparative and ablation experiments on data from three sources, demonstrating the advantages of the method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The author has carefully explained the background of AAC detection and evaluation, and thoroughly analyzed the progress and shortcomings of related work, which demonstrates the author’s profound understanding of this task. 2) The author’s description of the method is very clear. The overview of method and two key modules in Figure 1 is easy to understand. The two key modules proposed are improved based on the analysis of MSA and FPN structures, clearly demonstrating the contribution and innovation of this paper in terms of methodology. 3) The experiment in this article is quite comprehensive. The author provided a clear and reasonable description of the process of organizing and annotating the dataset, which is easy for readers to understand. Meanwhile, in detailed comparative and ablation experiments, the method proposed in this paper demonstrated the best performance on three metrics.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method proposed in this paper is not innovative enough. Both DRSA and EFFM modules enhance the complexity of feature learning through simple matrix operations, but such improvements lack highlights.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Table 3 in the manuscript and Table 1 in the supplementary material seem to lack indication of the employed metric. In the description of DRSA in the “proposed framework” section, the author should explain the relationship between different branchs and HF/LF information in a more interpretable way.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clinical value, the novelty of the method, the writing, and the performance of the proposed method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The author proposes a CNN-Transformer architecture based on a Hybrid Feature Pyramid Network, which employs a novel dual-resolution self-attention mechanism to enhance the context input features of different resolutions. Compared to previous works, the proposed architecture achieves state-of-the-art (SOTA) performance at the granularity level.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Advantages:

    1.The paper is well-organized, and the expression is very clear. 2.Thorough ablation experiments. 3.The proposed model is highly reasonable, and it is easy to reproduce. 4.We believe that the highlight of this paper lies in the Efficient Feature Fusion Module (EFFM), which combines feature maps of different hierarchical levels, deep networks, and calculates the SA within them. This way can effectively integrate multi-scale features and attention information.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Disadvantages:

    There are no significant drawbacks in the model proposed in this paper, but we believe further exploration of time complexity and interpretability could be beneficial. By the way, we think that the model is practical but not amazing, because the ideas designed in the proposed model are very common in the field of computer vision. Additionally, there are some formatting issues in the paper, such as “Following are the details:-“.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    There are no significant drawbacks in the model proposed in this paper, but we believe further exploration of time complexity and interpretability could be beneficial. By the way, we think that the model is practical but not amazing, because the ideas designed in the proposed model are very common in the field of computer vision. Additionally, there are some formatting issues in the paper, such as “Following are the details:-“.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The rationality of experiments, the innovation of the model, and the clinical application in medicine are the focal points of my consideration.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Thank you very much for the author’s efforts to address my concerns to some extent. I believe the manuscript can be accepted.




Author Feedback

We are grateful to the reviewers for appreciating the convincing ablation study (all reviewers), clarity of explanation and comprehensive experiments (R5), clarity and conciseness (R6) and good organization (R7).

R5 and R6: missing metric in Table3. Sorry, the metric is Pearson Correlation Coefficient (PCC), which is commonly used to measure the agreement between human expert AAC scores and machine predictions.

R6: comparisons with other transformer-CNN hybrids. While hybrid transformer-CNN models have indeed been used for different medical image analysis tasks: like segmentation, classification, super-resolution, and landmark detection, it is important to note that each task necessitates a uniquely designed hybrid model. We have performed image regression. To the best of our knowledge, none of the hybrid models proposed thus far have considered image regression tasks. Models like TransUNet, UTNet, Unetr, and CoTR are crafted for image segmentation and involve up-sampling feature maps (using a decoder) to generate segmentation masks. In contrast, our proposed model does not require feature map up-sampling. It combines features from different hierarchies and down-samples them and then eventually generates regression scalar outputs. Hence, a direct comparison with the aforementioned models is practically not feasible. To conduct a meaningful comparison, significant modifications would be needed for the above-mentioned models, essentially transforming them into significantly different architectures, diverging from their original design intent.

R5: improvements from DRSA and EFFM have not been highlighted. Firstly, DRSA and EFFM are novel contributions which significantly improve the results (Table3). The operations in both modules are Self-Attention mechanisms that improve performance by increasing the capacity to weight the importance of different pixels or image parts. They increase the complexity slightly, but the reward is significant (Table3). Our proposed model has 22.02 million parameters while the CNN backbone without DRSA and EFFM has 19.84 million parameters. The inference time of our model increases by 91 ms, however, the average improvement in performance (PCC) is more than 10%. It is important to point out that our proposed model performs much better than the EfficientNetV2M backbone (without DRSA/ EFFM), which is more complex than ours (52.2 million parameters, 534.4 ms inference time). We will add these details.

R5: more explanation on the relationship between different branches of DRSA. To supplement Sec-2 of the paper, we can stress that DRSA is a Self-Attention (SA) module that employs SA on two different resolutions of the same input feature map. The upper branch (Fig 1(b)) is the high-frequency SA path. It operates on the actual input feature map and considers fine details within the window size. However, its context is limited by the window size. To consider a broader context, like the shape and curvature of the spine, we either need to increase the window size or decrease the input feature map. As increasing the window size is computationally expensive, we reduce the spatial dimension of the input feature map using low pass filtering operation (average pooling) in the lower branch (Fig 1(b)). The reduced-sized feature map has low-frequency information with broader spatial context within the same window size.

R7: time complexity. Our model has 22.02 million parameters, and for the input image of size 320x320, the inference time is 408.08 ms. This will be made clear in the paper, thanks.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Accept



back to top