Abstract

Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger vessels. Addressing these challenges, we introduce the centerline boundary Dice (cbDice) loss function, which harmonizes topological integrity and geometric nuances, ensuring consistent segmentation across various vessel sizes. cbDice enriches the clDice approach by including boundary-aware aspects, thereby improving geometric detail recognition. It matches the performance of the boundary difference over union (B-DoU) loss through a mask-distance-based approach, enhancing traslation sensitivity. Crucially, cbDice incorporates radius information from vascular skeletons, enabling uniform adaptation to vascular diameter changes and maintaining balance in branch growth and fracture impacts. Furthermore, we conducted a theoretical analysis of clDice variants (cl-X-Dice). We validated cbDice’s efficacy on three diverse vascular segmentation datasets, encompassing both 2D and 3D, and binary and multi-class segmentation. Particularly, the method integrated with cbDice demonstrated outstanding performance on the MICCAI 2023 TopCoW Challenge dataset. Our code is made publicly available at: https://github.com/PengchengShi1220/cbDice.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0458_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0458_supp.pdf

Link to the Code Repository

https://github.com/PengchengShi1220/cbDice

Link to the Dataset(s)

DRIVE: https://drive.grand-challenge.org/ Parse 2022: https://parse2022.grand-challenge.org/ TopCoW 2023: https://topcow23.grand-challenge.org/

BibTex

@InProceedings{Shi_Centerline_MICCAI2024,
        author = { Shi, Pengcheng and Hu, Jiesi and Yang, Yanwu and Gao, Zilve and Liu, Wei and Ma, Ting},
        title = { { Centerline Boundary Dice Loss for Vascular Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose cbDice, which modifies clDice by considering changes in vessel diameter by distance maps. The modification improves the ability to detect small diameter vessels. The authors evaluate cbDice loss on three datasets and achieve competitive performance compared to B-DoU and clDice losses.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Vessel segmentation is an important task in medical image analysis, which can attract many attentions in the community.
    2. The proposed loss more sensitive to changes in vessel diameter.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Lack of clarity. For instance, the radius computation of skeletal points is not explained, and the symbol D has multiple meanings.
    2. The authors present a concept in a confusing manner, which requires simplification. They introduce numerous notations and variant losses, yet fail to elucidate the motivations and advantages of each loss.
    3. The evaluation requires further rigors. Figure 3 does not demonstrate the distinctive advantage of cbDice, which cannot be measured by using both Dice and clDice. The evaluation metrics should be consistent across all datasets.
    4. The primary effect of cbDice loss is to enhance the accuracy of small vessel segmentation; however, only the results on the TopCoW 2023 dataset indicate this point.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should present a concept in a clear and simple manner. For instance, they can streamline the symbols and reduce unimportant variant losses.
    2. The evaluation requires further support to highlight the effectiveness of cbDice loss.
    3. It would be beneficial to demonstrate the unique advantages of cbDice in comparison to other metrics.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The writing style of the paper’s method is puzzling.
    2. The effective evaluation of cbDice requires further support, particularly for small vessels.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors introduced a new loss function named cbDice (centerline boundary Dice) based on the geometric information to improve the vessel segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the paper is well organized with clear description of the cbDice
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • this work improved the performance based on an existing loss framework thus the novelty is limit.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors submitted the source code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors showed different images both in 2D and 3D. It would be nice if the authors can also show the segmentation results on the different images.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is in general well written, however the novelty is limited.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors answered all my questions. I’m satisfied with the discussion on the innovation. Still, it would be good to see results picture on both 2D and 3D.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a new loss framework called centerline boundary dice (cbDice) loss for vascular segmentation composed of cl-X-Dice variants, where X=variants incorporating skeleton radius(S), mask distance(M), inverse radius(I), and normalization(N). This framework improves performance over traditional losses like Dice or centerline Dice (clDice) by incorporating penalty for topology, geometric morphologies and radius of vascular skeletons. While clDice loss enhances topological connectivity, it is less effective in preserving geometric details under translation and scaling. Additionally, the combination of Dice+clDice tends to favor the segmentation of larger segments. The authors show the superiority of their proposed loss function over traditional losses across 3 different datasets and over 3 network architectures.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (a) The authors highlight the different challenges associated with vascular segmentation and engineer a loss that incorporates penalizing each of those challenges, namely, topological (connectivity) errors, geometric errors, and errors due to diameter imbalance. This is a novel contribution over existing clDice, etc. (b) The method is based on decent theoretical foundations showing that cl-X-Dice is more sensitive to translations (cl-M-Dice) and radius variations (cl-S-Dice) while it also preserves topological fidelity imposed by clDice. (c) Experiments show improvement across both 2D and 3D datasets for segmentation of various structures like retinal vessels, pulmonary artery, and circle of Willis. (d) The experiments also highlight the importance of the combination of different losses via hyperparameter (alpha and beta) tuning.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (a) Although the authors do not conduct generalizability studies (testing on data outside the existing data distribution), I am afraid these models might not generalize well on unseen external data. That’s because the alpha and beta hyperparameters seems to be chosen on the test set, so I am afraid the authors might be overfitting their models to the specific features within these chosen (single) test subset of the data. Ideally, these hyperparameters need to be chosen on the validation set and only the best hyperparameters should be used on the test set to enhance generalizability. Please comment on this. (b) No significance testing was performed. Improvements from cbDice loss seem to be very small, especially on the DRIVE dataset. For example, for DRIVE dataset, the improvement is about 0.7% on Dice and about 1.2% clDice (best vs worst model) (although the other two datasets seem to have better improvements). Is the improvement on DRIVE significant, given the testing was performed on just 20 cases?
    (c) No uncertainty (standard deviation/error) on the mean values reported.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Implementation for losses, metrics, networks and plan files for nnUNet have been provided via anonymous git repository. The work was performed on 3 publicly available datasets from various past challenges (DRIVE [2D], Parse 2022 [3D], TopCoW 2023 [3D]). Although, (I believe) the authors have not provided their train, validation and test splits (which might slightly limit reproducibility).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (a) The authors discuss the results largely only quantitatively, while the discussion on clinical significance have largely been left out. How would you qualitatively analyze the errors that are made by the new loss framework and their clinical significance (when it comes to dealing with the errors)? Please discuss this. (b) It’s hard to believe that nnUNet obtained a mean Dice (S) of 0 when using 0.5CE + 0.5Dice loss on TopCoW 2023 dataset. Similarly, NexToU obtained a mean Dice (S) of 0 when using CE loss. Why these two were exactly zero? Can you explain this qualitatively? (c) The schematic showing the evolution of cl-X-Dice in Figure 2 can be made more pedagogical; I found it hard to understand without going over it several times. Also, the structures within the practical example shown in Fig 2 on the left is hardly visible for the bottom two set of images (without extensively zooming the file). (d) Similarly, the table 1 and the subsection on “Variations of clDice” may benefit from more clarity with respect to the meanings of different variables (especially in the table caption).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the paper does have some weaknesses, they are largely related to some parts being unclear and some extra experiments that can be carried out on the validation set for model selection, which might have made the paper stronger. Despite these limitations, the paper introduces a novel and improved strategy for domain specific vascular segmentation, supported by both theoretically and experimental validation (although more rigorous experiments can be performed to train a better generalizable model).

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Although the authors have answered most of my questions convincingly, I still have my reservations about accepting the fact that not using deep supervision alone led to DSC=0 performance by nnUNet on TopCOW. As a result, I would like to maintain the score of 4 as before.




Author Feedback

We appreciate the reviewers’ constructive feedback and would like to address the key issues as follows:

  1. Novelty and Generalizability (R#3-3, R#4-3-(a)): We thank R#4 for recognizing our work as “a novel and improved strategy for domain specific vascular segmentation.” Indeed, we have made significant efforts to unify theoretical and practical challenges to spark greater innovation in the field. The proposed cl-X-Dice framework elucidates the evolution of clDice, advancing beyond a mere variant. Extending valuable existing work does not detract from innovation. For instance, the B-DoU loss (MICCAI 2023) is derived from the B-IoU loss (CVPR 2021). With the cbDice combination loss (alpha=1, beta=2), we secured second place in the TopCoW 2023 Challenge and ensured reproducibility for the community. The datasets were randomly split without specific selection, following the nnU-Net (Nature Methods 2021) framework’s default fold 0 settings. We will further analyze the impact of the hyperparameter beta across various datasets in the revised manuscript.

  2. Advantages of cbDice and Clarifying Figures and Tables (R#1-3-1/2/3, R#1-6-1/2/3, R#3-7-1, R#4-3-(c), R#4-7-(c)/(d)): Figure 2 visualizes the evolution of clDice, showing FN and FP at the centerline and boundary. Figure 3 showcases the differences in various metrics under different conditions, highlighting the advantages of the Dice and cbDice combination scheme, consistent with the three scenarios in the cbDice_cal_demo code. The key challenge is that the combination of Dice and clDice favors larger vessels; regardless of whether it’s clDice or its variants (such as cbDice), they need to be combined with Dice. To address “numerous notations,” we will simplify them by changing “SI” to “I,” setting normalization as the default in cl-MS-Dice and cl-MI-Dice, and removing cl-MSN-Dice and cl-MSIN-Dice. Thus, cl-MI-Dice becomes cbDice in the text. We will enlarge and reposition images in Figure 2 and simplify notations. To avoid confusion, we will replace the abbreviation for Dice with (mathcal{D}). Certain metrics are calculated only on 2D data (mu^{err}) and multi-class data (Dice(L) and Dice(S)), and we will unify the (beta^{err}) metric across all datasets. We will include more binary segmentation images, enhance Table 1 with explanations for different variables, and include standard deviation in mean values.

  3. Small Vessel Segmentation and Clinical Importance (R#1-3-4, R#1-6-2, R#4-3-(b), R#4-7-(a)): Regarding small vessel segmentation, binary segmentation data lacks explicit small vessel classes. In multi-class datasets, cbDice effectively identifies small vessels and removes small background noise (see Figure 4, SwinUNETR). This improvement is due to cbDice’s enhanced recognition of small targets. Future work will further validate cbDice’s performance on more imbalanced multi-class datasets (e.g., AortaSeg24 Challenge). On the DRIVE dataset, this experiment aimed to demonstrate our method’s applicability to 2D tasks. Despite using only CE loss with the nnU-Net framework, which approaches the performance limit, improvements were less significant. cbDice’s quantitative improvements translate to clinical value, as mentioned in the Supplementary Material on CoW variant topology matching performance, addressing bottlenecks in downstream tasks (e.g., CROWN Challenge 2023).

  4. Radius Computation and TopCoW Dice (S) of 0 Performance (R#1-3-1, R#4-7-(b)): Regarding radius computation of skeletal points, our method uses the distance map calculation (code: loss/cbdice_loss.py, line 60 and 65). We will clarify in the manuscript that the radius R is derived from the distance map D. The Dice (S) score of 0 performance on TopCoW might be due to not using deep supervision (3.2 Setup), impacting recognition of small vessels with low sample proportions. Due to the computational cost of distance map calculations, we did not compare deep supervision results to maintain consistency.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper got mixed reviews. There was a reviewer with provided a very low-quality review (R3 just said there was lack of novelty), so I am ignoring this for my recommendation. I am then left with R1 recommending weak rejection and R4 recommending weak acceptance. I share R1’s concerns on confusing notation (can be fixed, but I find it better that this goes through a second round of reviews) and the lack of experimental support on the claim of good performance on small vessels: the topCoW competition did not look into performance in small vessels, so the fact that the authors got the second position here is irrelevant. Also, I believe that the authors should not have written here that they were second in that competition, as this may be giving away their identity and breaking anonimity (https://arxiv.org/abs/2312.17670v2).

    Some questions mentioned by R4, who recommends weak acceptance, were really not solved by author’s feedback: particulary concerning is the 0 dice score using deep supervision. Also the DRIVE dataset is <20 years old and legacy now, it should not be used for research in 2024, given that there are tons of modern alternatives.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper got mixed reviews. There was a reviewer with provided a very low-quality review (R3 just said there was lack of novelty), so I am ignoring this for my recommendation. I am then left with R1 recommending weak rejection and R4 recommending weak acceptance. I share R1’s concerns on confusing notation (can be fixed, but I find it better that this goes through a second round of reviews) and the lack of experimental support on the claim of good performance on small vessels: the topCoW competition did not look into performance in small vessels, so the fact that the authors got the second position here is irrelevant. Also, I believe that the authors should not have written here that they were second in that competition, as this may be giving away their identity and breaking anonimity (https://arxiv.org/abs/2312.17670v2).

    Some questions mentioned by R4, who recommends weak acceptance, were really not solved by author’s feedback: particulary concerning is the 0 dice score using deep supervision. Also the DRIVE dataset is <20 years old and legacy now, it should not be used for research in 2024, given that there are tons of modern alternatives.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper has a mixed reviews before and after the rebuttal. After reading the paper, reviews and rebuttal, I am inclined to accept this work based on the following reasons: 1) Overall, this work proposes an improved strategy for tubular structure segmentation with reasonable experimental results (3 datasets of different targets). 2) I disagree with some of R1’s comments, e.g., writing style of the paper’s method is puzzling, which is not that case in my opinion. Plus, R1’s main concern is the effectiveness of cbDice for small vessels. However, results on TopCoW 2023 dataset has already indicate the effectiveness on segmenting small vessels. Therefore, I am inclined to accept this work.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper has a mixed reviews before and after the rebuttal. After reading the paper, reviews and rebuttal, I am inclined to accept this work based on the following reasons: 1) Overall, this work proposes an improved strategy for tubular structure segmentation with reasonable experimental results (3 datasets of different targets). 2) I disagree with some of R1’s comments, e.g., writing style of the paper’s method is puzzling, which is not that case in my opinion. Plus, R1’s main concern is the effectiveness of cbDice for small vessels. However, results on TopCoW 2023 dataset has already indicate the effectiveness on segmenting small vessels. Therefore, I am inclined to accept this work.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Overall there were a lot of mixed reviews. Some reviewers found the notation difficult while others do not, there are some very legitimate issues with the results especially lack of statistical testing to demonstrate the proposed method is better than the state of the art.

    However, I tend to side with meta-reviewer #4 the methodology is unique and has a good theoretical grounding. There are defiantly some minor issues around anonymity, using an older dataset, and questions on how well the model generalize but I think these are outweighed by the contributions of the manuscript.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Overall there were a lot of mixed reviews. Some reviewers found the notation difficult while others do not, there are some very legitimate issues with the results especially lack of statistical testing to demonstrate the proposed method is better than the state of the art.

    However, I tend to side with meta-reviewer #4 the methodology is unique and has a good theoretical grounding. There are defiantly some minor issues around anonymity, using an older dataset, and questions on how well the model generalize but I think these are outweighed by the contributions of the manuscript.



back to top