Abstract

Manual detection of intracranial aneurysms in computed tomography (CT) scans is a complex, time-consuming task even for expert clinicians, and automating the process is no less challenging. Critical difficulties associated with detecting aneurysms include their small (yet varied) size compared to scans and a high potential for false positive (FP) predictions. To address these issues, we propose a 3D, multi-scale neural architecture that detects aneurysms via a deformable attention mechanism that operates on vessel distance maps derived from vessel segmentations and 3D features extracted from the layers of a convolutional network. Likewise, we reformulate aneurysm segmentation as bounding cuboid prediction using binary cross entropy and three localization losses (location, size, IoU). Given three validation sets comprised of 152/138/38 CT scans and containing 126/101/58 aneurysms, we achieved a Sensitivity of 91.3%/97.0%/74.1% @ FP rates 0.53/0.56/0.87, with Sensitivity around 80% on small aneurysms. Manual inspection of outputs by experts showed our model only tends to miss aneurysms located in unusual locations. Code and model weights are available online.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2366_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2366_supp.pdf

Link to the Code Repository

https://github.com/alceballosa/deform-aneurysm-detection

Link to the Dataset(s)

https://zenodo.org/records/6801398

BibTex

@InProceedings{Ceb_Vesselaware_MICCAI2024,
        author = { Ceballos-Arroyo, Alberto M. and Nguyen, Hieu T. and Zhu, Fangrui and Yadav, Shrikanth M. and Kim, Jisoo and Qin, Lei and Young, Geoffrey and Jiang, Huaizu},
        title = { { Vessel-aware aneurysm detection using multi-scale deformable 3D attention } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an aneurysm detection method exploiting the features extracted from a classical 3D CNN along with some distance maps extracted from the blood vessels. A transformer combines both features into an aneurysm bounding box.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of exploiting the vasculature distance maps is interesting, I am not aware of any competing method based on such a idea.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main novelty of this work lies in the exploitation of distance maps to efficiently detect the aneurysms, however, the distance maps (only) depend on the segmentation accuracy. Thus, to my opinion, the segmentation accuracy should be assessed. A bad segmentation probably makes the whole idea useless, and hence it would be helpful to quantify the overall performances depending on the segmentation accuracy.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Mostly, what bothers me most in this paper is its lack of clarity. A few points are quite unclear to me. To my opinion, section 2 should be rewritten (especially sec 2.2), the main points should be clearly highlighted. Concerning the evaluation metrics, it might also have been interesting to provide the euclidean distance between the prediction and ground truths labels. Figure 1 does not make much sense if the methods are compared onto different image datasets. Is it actually the case ? If so, it should be mentioned. Fig. 3 does not help much. A 3D binary segmentation would be more helpful ! Neuro-radiologists can easily interpret such images (even though they might need to have the three planes, sagittal, axial, and coronal), a non-expert will struggle to see your point here ! The others decided to compare the performances of their method against GLIA-Net [4] (in Table 2.), I’m not convinced this method is the best performing approach from the literature. Aren’t there some competing methods out there with better performances ?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The lack of evaluation of the segmentation, as well as a lack of clarity. Moreover, it seems a bit unfair to compare the performance to the GLIA-Net method, which does not seem to be among the best performing competing methods. Some (CTA-based) aneurysm detection approaches claim a 90 to 95% sensitivity in the literature.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I am not fully convinced by the authors’ replies; and they did not address several comments/questions. My opinion on the paper stays unchanged.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a multi-scale deformable 3D attention mechanism incorporating vessel distance maps for aneurysm detection, which simultaneously considers multi-scale information and prior vascular location information. Extensive experiments are performed on one public intra-cranial aneurysm segmentation dataset to show its effectiveness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A vessel-aware multi-scale architecture is proposed for detecting varying-size aneurysms. Quantitative and qualitative results are given to show the effectiveness of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In table 1, the value of confidence score threshold varies across methods. It seems that the parameter setting has a significant impact on the results. Please give the details about how to set the value and compare results under different settings for the proposed method. The proposed method uses vessel segmentation as a prior, which may lead to increased computational expenses and time requirements. Besides, if a noisy segmentation result is obtained, how much does it impact the final aneurysm detection results?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In Table 3, without data augmentation, the performance of the proposed method degrades severely. It looks very unusual, please check it or analysis the analyze. In addition to just presenting the table, comparison and analysis of experimental results is needed in the main text.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see the weakness and strength section.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed most of my concerns and I keep the previous decision.



Review #3

  • Please describe the contribution of the paper

    The authors developed a novel 3D neural architecture that employed a deformable attention mechanism, utilizing vessel distance maps and features from a convolutional network, to detect aneurysms. The model reframes aneurysm detection as predicting bounding cuboids and uses binary cross entropy along with three localization losses to enhance accuracy. It was validated on three sets of CT scans, showing high sensitivity across varying aneurysm sizes and conditions, and low false positive rate at the same time, though it sometimes missed aneurysms in atypical locations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • overall, a well-written and well-organized manuscript with balanced descriptions provided in all sections
    • promising results with good balance between sensitivity/false positive rate, especially in small aneurysm detection
    • modified metric was used for evaluation; specifically, the metric was relaxed to yield more optimistic results
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • ablation study was performed by combining the external and interal dataset, hence, one cannot compare the results reported in Table 3 to those in Table 1
    • pairwise comparisons of metrics value should be performed using an adequate statistical test, eg. McNemar’s test.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The ablation study was performed by combining the external and interal dataset. As the private dataset is not deemed to be made available, it could not be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    2nd paragraph of section 2.2 that describes the projection of multi-scale features is rather confusing. Therein the authors declare a linear projection matrix W_\triangle, linear projection W_o with biases b_o and projection matrix W_p, with complex interplay. It seems that linear projection is employed several times over, first using W_p to project the query Q into 3D representation p, then the p is projected to n_k keys using W_o and b_o. The linear projection matrix W_\triangle is supposed to map “all levels to the same feature space”, however, its role wrt W_p and W_o and b_o is not explained, nor how these matrices are obtained.

    The following sentence is ambiguous: “To facilitate convergence, all aneurysms are matched to the closest detection and other detections may be matched to the same aneurysm only if close enough.” Closest to what? Do you impose certain minimal distance between detections?

    “We remove redundant predictions using non-maximum suppression with t_nms = 0.05.” How did you determine this parameter value?

    When describing the evaluation metrics, the authors write “For segmentation model evaluation, we use the metric defined in [4]…”, and later in the same paragraph write “We make this metric fairer for segmentation models by defining the radius r̂ of a segmented aneurysm as 1/2 of the longest side of its minimum bounding cuboid.” The latter part invalidates the former, since not the original, but a modified metric was used for evaluation. Specifically, the metric was relaxed to yield more optimistic results.

    From the ablation study results in Table 3 it seems that withot “vessel awareness” the sensitivity is at the same level as the proposed solution (93.83% vs. 93.38%; which likely is not a significant difference).

    The ablation study was performed by combining the external and interal dataset, hence, one cannot compare the results reported in Table 3 to those in Table 1, but which would provide a better overview regarding the methodological contributions, also wrt aneurysm sizes.

    Pairwise comparisons of metrics value should be performed using an adequate statistical test, eg. McNemar’s. As it currently stands the rather small differences reported in Table 3 do not justify the promise of the approach.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The use of customized evaluation metrics (towards yielding optimistic results) and a rather small gap between the use and no use of “vessel-awareness module” (major contribution), under incomparable and irreproducible test conditions raises concerns regarding the actual impact of the contribution.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors are acknowledged for their effort in responding to the concerns raised. They proposed to make certain amendments to the original manuscript, e.g. including the results of ablation study separately for internal/external data partitions, inclusion of statistical tests, and deeper analysis of the results. These amendments seem reasonably minor to accommodate within the page limit.




Author Feedback

We thank all reviewers for their valuable comments and address them below.

•Segmentation accuracy (R1,R4): upon training the nnUnet model, we validated it on 11 CT scans segmented semi-automatically by a radiologist, achieving a 0.929 modified Dice Coefficient ([P ∩ GT]/GT). While it is true poor segmentations could harm detection performance, our results on 3 datasets indicate vessel awareness improves/conserves detection Sensitivity while reducing the FPr. Moreover, if we were to evaluate the model on an OOD sample for which segmentation was less accurate, such a sample would likely be challenging too for a non-vessel-aware model.

•Segmentation cost/relevance (R1,R3): Segmentation+EDT takes ~4 min., detection takes ~3 min. (RTX 3090 GPU). Per our radiologist collaborators, the target scan processing time before human review is 20 minutes, so ~7 min. is acceptable. To address R3’s concerns about Sensitivity of vessel-aware/unaware models, we note that, as stated in Sec. 1, our goal in using vessel awareness is to reduce FP detections; indeed, the FPr of the vessel-aware model is ~40% lower than the non-vessel-aware model. This reduces the time radiologists spend looking at FPs and highlights how vessel awareness retains high Sensitivity while reducing human labor, thus offsetting the cost of segmentation.

•Lack of statistical tests (R3): albeit we explained the positive role of vessel awareness, we note our approach also comprises multi-scale deformable attention, a component that, by itself, improves Sensitivity by ~10% (see T3, rows 6, 8). However, we will try to include pairwise tests in the final paper to make our ablation study more informative.

•Confidence/NMS threshold choices (R1,R3): we set the confidence thresholds in T1 based on each model’s Sensitivity@FPr=1 on the train data. Moreover, in Fig. S2 (supp. material), we report Se@FPr curves (akin to ROC) to summarize the threshold-agnostic performance of each model. Notably, Fig. S2 shows our model performs the best for most thresholds. As for NMS, we follow the literature [5] in using a small NMS threshold for running inference on overlapping 3D crops to avoid having multiple detections for the same aneurysm.

•Ambiguity re: matching (R3): for a GT aneurysm, we always match the closest detection (irrespective of distance) to compute the loss. Other detections are only matched to a GT if the L2 distance between their centers is under 0.5 voxels (~0.2mm).

•Aggregation of ablation study results (R3): Since we compute these results by averaging existing results on the internal/external partitions, we will disaggregate them over datasets/aneurysm sizes in the next version.

•Completeness of baselines (R1,R4): Most aneurysm detection work lacks public code/weights (see Bizjak [3]). From our literature survey, only [4] (GLIA-Net), [1] and [6] released code/weights among the best solutions. However, [1] uses pose annotations that aren’t in our training data and [6] employs atlas-based co-registration of points of interest. Also, both use TOF-MRA, not CT data; so their trained models would suffer from a domain gap. Still, in the detection setting we compare against a nnDetection [2] model trained on the same data as ours (nnDet won the 2020 ADAM Aneurysm Detection challenge & is the best baseline in [1]). Re: segmentation, we could have tested nnUnet, but [1] shows nnUnet & nnDetection perform similarly, so we focused our resources on testing more detection baselines.

•Modified segmentation metric (R3): although we modified [4]’s metric, we did it to favor GLIA-Net, as it outputs detailed segmentation masks while our model outputs bounding cubes that overestimate aneurysm sizes. This isn’t an issue for radiologists but would make for an unfair comparison (a larger prediction size makes it easier for a detection to be accepted)

In addition to the above, we will improve the clarity of Fig. 1 (R1) and Sec. 2.2 (R3, R4), as well as the analysis of the results (R1).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal didn’t sufficiently address the concerns of R4, but this reviewer has less confident in the relevant domain and the major concern is the clarity of presentation, which the author can address in the camera-ready version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal didn’t sufficiently address the concerns of R4, but this reviewer has less confident in the relevant domain and the major concern is the clarity of presentation, which the author can address in the camera-ready version.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper originally received 2 x WA and 1 x WR. R3 upgraded their rating after being convinced with the rebuttal. I support the comments of R1 and R3 and am inclined to accept the paper. However, the authors are strongly encouraged to address the post-rebuttal comments of R3 to make their paper stronger.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper originally received 2 x WA and 1 x WR. R3 upgraded their rating after being convinced with the rebuttal. I support the comments of R1 and R3 and am inclined to accept the paper. However, the authors are strongly encouraged to address the post-rebuttal comments of R3 to make their paper stronger.



back to top