Abstract

Cerebral microbleeds (CMBs) are defined as relatively small blood depositions in the brain that serve as severity indicators of small vessel diseases, and thus accurate quantification of CMBs is clinically useful. However, manual annotation of CMBs is an extreme burden for clinicians due to their small size and the potential risk of misclassification. Moreover, the extreme class imbalance inherent in CMB segmentation tasks presents a significant challenge for training deep neural networks. In this paper, we propose to enhance CMB segmentation performance by introducing a proxy task of segmentation of supratentorial and infratentorial regions. This proxy task could leverage clinical prior knowledge in the identification of CMBs. We evaluated the proposed model using an in-house dataset comprising 335 subjects with 582 longitudinal cases and an external public dataset consisting of 72 cases. Our method performed better than other methods that did not consider proxy tasks. Quantitative results indicate that the proxy task is robust on unseen datasets and thus effective in reducing false positives. Our code is available at https://github.com/junmokwon/AnatGuidedCMBSeg.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2508_paper.pdf

SharedIt Link: https://rdcu.be/dV1Mm

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72069-7_3

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2508_supp.pdf

Link to the Code Repository

https://github.com/junmokwon/AnatGuidedCMBSeg

Link to the Dataset(s)

https://valdo.grand-challenge.org/

BibTex

@InProceedings{Kwo_AnatomicallyGuided_MICCAI2024,
        author = { Kwon, Junmo and Seo, Sang Won and Park, Hyunjin},
        title = { { Anatomically-Guided Segmentation of Cerebral Microbleeds in T1-weighted and T2*-weighted MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {24 -- 33}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The study introduces a segmentation method for detecting cerebral microbleeds (CMBs) by incorporating a proxy task to segment the brain into three anatomical regions, thereby enhancing false positive reduction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Integration of Proxy Task.
    • Evaluation on two datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of novelty raises concerns about the originality and contribution of the proposed method. Reference: https://arxiv.org/abs/2306.13020
    • Limited related works.
    • The methodology section lacks clarity regarding the input data and network architecture.
    • The evaluation metrics used in the paper may not fully capture the effectiveness of the proposed method. Additional metrics, such as sensitivity and FPavg, are recommended for a comprehensive assessment of the method’s performance.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This work presented a segmentation method for identifying CMBs by integrating a proxy task, enabling segmentation of the brain into three anatomical regions (lobar, deep, and infratentorial) to improve false positive reduction. The method is evaluated on both in-house and public available datasets. Although the work presented an interesting topic, concerns arise regarding the high degree of similarity between this work and another preprint study! These are my comments:

    1. My major concern is the striking similarity of the concept of using the segmentation of anatomical regions (lobar, deep, and infratentorial) based on MARS criteria in this work and the method proposed in another preprint study (https://arxiv.org/abs/2306.13020), both aiming to improve CMBs detection by reducing the potential false positive candidates.
    2. The related works part in the Introduction appears too brief, only referencing three studies utilizing a two-stage approach. Recent works employing single-stage tri-planar and prototype learning methods have demonstrated effective FP reduction. Check these examples: “Detection of cerebral microbleeds in MR images using a single‐stage triplanar ensemble detection network (TPE‐Det), Journal of Magnetic Resonance Imaging 58, no. 1 (2023): 272-283” and “Cerebral microbleeds detection using a 3d feature fused region proposal network with hard sample prototype learning.” In MICCAI, pp. 452-460, 2022”. You should explore more in this part. Additionally, more clarity is needed regarding the differences between this work and the highly similar preprint study.
    3. Clinicians typically use Phase images to differentiate between CMBs and calcifications. It is unclear how this study addresses false positives arising from calcifications using only T2*-weighted images.
    4. In Fig. 1, which views (sagittal or axial) are used as inputs for segmentation network? Based on this, adjustment should be made to Figures (a) and (d).
    5. The methodology section lacks sufficient detail, particularly regarding whether the segmentation network takes both T1 and T2* inputs and produces two outputs simultaneously. Meanwhile, Algorithm 1 suggests that only T1 is used, raising a question about the framework’s end-to-end nature.
    6. Microbleeds typically exhibit a tiny round shape. It appears that employing detection algorithms might yield superior results compared to segmenting such small objects. Before the introduction of the VALDO dataset, many studies utilized detection rather than segmentation. Including an explanation or discussion on this aspect would be good.
    7. There is no mention of the DiceTopK in reference [3] as indicated on page 3. Please verify this.Also, was the DiceTopK loss used for CMBs and anatomical region segmentation, or was there another combination with different controlling parameters? More details are needed.
    8. Further clarification is needed regarding the datasets used. Does training involve VALDO dataset cases, or is this external public dataset solely for validation?
    9. Regarding the evaluation metrics, considering the small size of CMBs, it may be less crucial to assess the performance of segmented predictions. Instead, it is vital to determine the accuracy of correctly detected MBs and the average number of False Positives per subject (FPavg). I suggest including these metrics (sensitivity of detected CMBs and FPavg).
    10. More detailed results are needed to identify which brain region from the proxy task most effectively reduces false positives.
    11. The Dice scores presented in Table 1 show inferior performance (<52%), suggesting preference for detection measures such as sensitivity and FPavg. Additionally, comparisons against relevant detection works are suggested.
    12. The ablation study requires further investigation, particularly examining the effect of proxy task integration with and without during training.
    13. The ground-truth mask of the last example (sub-311) in Figure 4 appears incorrect, exhibiting a rectangular mask for an oval CMB shape.
    14. The ground-truth mask of the last example (sub-311) in Figure 4 appears incorrect, exhibiting a rectangular mask for an oval CMB shape.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Technical novelty and originality: the high degree of similarity between this work and another preprint study
    • Used method (segmentation rather than detection)
    • Evaluation metrics and performance of Dice score <0.52.
    • More comparisons are needed.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    This work is interesting, but it lacks originality. A very similar paper (arxiv.org/abs/2306.13020) already presented the same idea of proxy segmentation for cerebral microbleed detection. While the authors pointed out some differences in their rebuttal, these are insufficient to establish novelty. For example, using two different datasets instead of one, as in the referenced study, does not change the fact that the same methodology and concept were employed. Additionally, the proxy segmentation in the arXiv paper was also done automatically.



Review #2

  • Please describe the contribution of the paper

    The authors propose to employ information on the brain location to help with the reduction of mimics in the segmentation of cerebral micro bleeds. To this purpose they use multi-task learning involving the semantic segmentation of location labels acquired from parcellations (lobar, deep supratentorial or infratentorial)

    They further propose a false positive cleaning strategy based on the surroundings of each candidate lesion. Their method is compared to nnUnet and nnDetection on two datasets

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Simple and effective idea with good justification of the choices made
    • Clear organisation of the paper and good design of experiments (ablation study + comparisons)
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of clarity on the implementation of the multi-task component of the work.
    • Absence of measures of variability regarding the results
    • Lack of critical discussion of the results
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is overall an agreeable read with a clear contribution and interesting results. A few points are however to be noted and clarified:

    • why is a parcellation / proxy labels of the T2* scans necessary if they are to be registered to the T1-weighted (for which parcellations can be available)?
    • Is the network performed in a multi-task fashion (one component for the location label and one for the micro bleeds) and if so how is it organised?
    • What are the demographics of the different datasets in terms of lesion load (number, volume)
    • Some may argue that the segmentation of CMB is actually irrelevant and the detection aspect more relevant - in this respect ther measures such as absolute difference in number of detected elements could be presented (cf VALDO challenge)
    • Discussion on the number of elements removed by the CFPR module could be included? It appears that the improvement with this module is very minimal - some discussion about it should be added.
    • Is the DiceCE the loss used for nnUnet? If not the ablation study may be missing a component without the proxy module to be complete. - Can the authors comment anyway on the fact that the different performances do not gain much from the added modules in the internal dataset but more in the external one? and that nnUnet appears to perform better than some of the options with both modules when DiceCE is used?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The key points leading to this rating are that the paper proposes a very well presented idea with strong justification on its introduction but this is tempered by some lack of clarity in the description of the model and an overall lack of critical evaluation of the results.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a method for the detection and segmentation of cerebral microbleeds (CMBs) in T2-weighted GRE sequences, which can reduce false positive regions using a novel proxy task. This proxy task combines clinical prior knowledge to distinguish the brain into three regions and reduce false positive regions by identifying the location of CMBs. The author conducted experiments on an internal dataset and an external public dataset, demonstrating better performance of the proposed method compared to the baseline.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The background description of CMBs in this paper is clear enough, and the explanation of the benefits of the proxy task for detecting CMBs is convincing. 2) The writing and figures of the methodology section are easy to understand. The description of the proxy labels generation and the data pre-processing are detailed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) This paper lacks significant innovation in terms of methods, especially in the detection and segmentation network. The author directly employed nnUNet and nnDetection as two backbone models and did not contribute to the loss function facing the challenge of cerebral microbleeds. That means, the key contribution lies in the post-process of the network, which reduces the false positive regions. However, as for the part of network and loss function, which I consider equally important, this paper made less contribution. This paper has not taken measures to deal with possible false negative regions, which are also common in small target segmentation. 2) Although using proxy labels to reduce false positives is insightful, from the results in Table 2, it can be seen that the CFPR ablation has little impact on the performance, especially on the internal test set.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1)The author should make revisions or respond to the above weaknesses. 2)The author can incorporate more comparative methods in comparative experiments to demonstrate the leading performance of the proposed methods.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clinical value, the performance of method, the novelty of method, the writing.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank reviewers for valuable comments. Below we clarify confusions. Q1(R1): Necessity of parcellation and proxy labels of T2* scans. A: Parcellation and proxy label generation are done in T1 scans. T2* scans can differentiate cerebral microbleed (CMB) voxels. Thus, we performed registration to feed both T1 and T2* scans into a segmentation network. Q2(R1): Is the network performed in a multi-task fashion? A: No. Our network directly outputs 3D prediction map for both CMBs and proxy labels. Q3(R1): Lesion demographics? A: Local dataset has mean 2.98 lesions (vol 51.29mm^3). External dataset has mean 3.28 lesions (16.11mm^3). Q4(R1&R4): Segmentation method was used rather than detection. A: We tried to use nnDetection as our backbone but predicting proxy labels as bounding boxes makes integrating our false positive reduction method difficult. Bounding boxes contained too many background voxels. Q5(R1&R4): The reported Dice score (< 0.52) is weak, suggesting preference for detection measures and comparisons against detection works. A: We used nnDetection as baseline and report F1-score as detection metric in Table 1. Still, our segmentation-based model outperforms nnDetection in terms of F1-score. We cannot report other metrics due to rebuttal restrictions. We believe F1-score is a decent detection metric, especially under class imbalance. Q6(R1&R4): Ablation of proxy task is missing. A: Ablations are reported as nnUNet baseline in Table 1. This baseline was trained without proxy task. We will clarify this in Results. Q7(R1&R3): Minimal performance gain of clinically-derived false positive reduction (CFPR) module especially in internal dataset. A: We are puzzled by this issue. Perhaps internal training and validation sets are from the same MRI scanner, limiting the impact of CFPR. In external dataset, we observed that raw CMB predictions are unstable, while proxy label predictions are stable. This enables CFPR to improve overall performance in external dataset. We will mention this in Results. Q8(R3): Less contribution in network and loss function. A: Networks and loss functions have been extensively studied in nnUNet, nnDetection, and DiceTopK papers. Our focus is to raise the significance of processing images and automatically deriving proxy labels on top of well-performing networks and loss functions. Q9(R3): This paper has not taken measures to deal with possible false negative regions. A: We did not report direct measures of false negatives. Instead, we reported F1-score, which deals with both false positives and false negatives. Q10(R4): Similar concept with ArXiv 2306.13020. A: Thanks for pointing out this excellent paper! We agree our concept of proxy labels is similar with theirs. However, ours is different from the mentioned paper in three regards. 1) Our proxy label generation is fully automated in white matter regions (internal capsule, external capsule, and deep white matter voxels), which was manual in theirs. 2) Ours was validated using a public dataset, while theirs was not. 3) We used the ratio of brain parenchyma for false positive reduction to follow the established MARS criteria, which is missing in theirs. We will mention the paper in the Introduction explaining the differences. Q11(R4): Insufficient related works. A: We agree and will add more details in Introduction. Q12(R4): Which views (sagittal or axial) are used as inputs in Fig 1? A: No 2D views were used. Our input is 3D images (Section 3.3). Q13(R4): Verify the DiceTopK in ref [3]. Was the DiceTopK used for CMBs and anatomical region segmentation? A: The term DiceTopK was introduced in [17] and we will correct this. Yes. DiceTopK was applied for both CMBs and proxy labels with K=10 and lambda_TopK=0.5. Q14(R4): Is VALDO dataset used during training? A: No, it is solely used for validation. Q15(R4): Ground truth of sub-311 is incorrect. A: We agree but sub-311 is part of the public dataset, so we could not alter ground truth.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Accept. Overall, authors seem to agree with reviewers on the short comings of the method. One big concern for me is the lack of prior art mentioned in the introduction, that while it can be addressed it makes me wonder why it was not in the first place. Similarly , concerns over F1/DICE as validation seem very valid.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Accept. Overall, authors seem to agree with reviewers on the short comings of the method. One big concern for me is the lack of prior art mentioned in the introduction, that while it can be addressed it makes me wonder why it was not in the first place. Similarly , concerns over F1/DICE as validation seem very valid.



back to top