Abstract

Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenarios, where topological errors are common. We propose a general loss function for topologically faithful multi-class segmentation extending the recent Betti matching concept, which is based on induced matchings of persistence barcodes. We project the N-class segmentation problem to N single-class segmentation tasks, which allows us to use 1-parameter persistent homology, making training of neural networks computationally feasible. We validate our method on a comprehensive set of four medical datasets with highly variant topological characteristics. Our loss formulation significantly enhances topological correctness in cardiac, cell, artery-vein, and Circle of Willis segmentation.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0582_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0582_supp.pdf

Link to the Code Repository

https://github.com/AlexanderHBerger/multiclass-BettiMatching

Link to the Dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html https://ieee-dataport.org/open-access/octa-500 https://topcow23.grand-challenge.org/ https://leapmanlab.github.io/dense-cell/

BibTex

@InProceedings{Ber_Topologically_MICCAI2024,
        author = { Berger, Alexander H. and Lux, Laurin and Stucki, Nico and Bürgin, Vincent and Shit, Suprosanna and Banaszak, Anna and Rueckert, Daniel and Bauer, Ulrich and Paetzold, Johannes C.},
        title = { { Topologically faithful multi-class segmentation in medical images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This article proposes a topological loss for multi-class segmentation, which serves as an extension of the Betti matching loss initially proposed for binary segmentation. The Betti matching loss, introduced in a prior work, involves spatially aligning the d-dimensional topological features (such as connected components and holes) of the predicted segmentation with those of the ground truth. In this study, the authors extend this loss for segmentation in a multi-class setting by decomposing the multi-class segmentation problem into N single-class segmentation problems. Additionally, a variation of the binary Betti matching loss is introduced, incorporating a weight to emphasize the penalty for unmatched topological features. The experimental evaluation includes four 2D datasets, comparing the proposed approach with two other topological losses (ClDice and HuTopo), as well as a regular Dice loss. The authors claim that their method outperforms the current state-of-the-art.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Comprehensive comparison across four diverse datasets with an insightful discussions on their characteristics and the nuanced results observed.
    • Robust experimental framework featuring a detailed description of hyperparameter optimization strategies, rigorous cross-validation techniques, and statistical tests to establish significance.
    • The literature review is thorough and well-articulated.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The method lacks adequate presentation of key concepts, resulting in complexity in comprehension.
    • The methodological novelty is incremental
    • The article lacks a discussion regarding the improvement of topology in light of the significant increase in complexity compared to existing methods.
    • The hyperparameters optimization is biased toward the proposed method.
    • The ablation study is brief and raises concerns regarding the proposed loss
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The proposed method’s reproducibility is poor because of a notably high algorithmic complexity of the proposed method without a detailed description, and the absence of shared code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The article’s readability is hindered by the absence of several crucial concepts. While it’s understandable that the complete mathematical framework of persistent homology and induced matching of barcodes may not be feasible for a conference article, some fundamental notions should have been included. Specifically, a proper definition of the Betti matching loss and the underlying intuition behind it would have been beneficial. Moreover, the loss relies on two components, $l_{BM}^m$ and $l_{BM}^u$, which are neither defined in this article, nor in the previous work [24] (or with a different notation).

    The complexity of the binary Betti matching loss is notably high. Even though the authors mitigate the increase in complexity of this multi-class extension, the proposed solution remains more complex. This aspect requires a thorough evaluation, comparison, and discussion as it significantly impacts the practicality and utility of the method.

    The method appears promising and yield interesting results; however, its impact is diminished by the absence of available code to reproduce the results and facilitate comparison with other approaches.

    The authors optimized their hyperparameters and those of the compared methods based on a performance score that includes the Betti matching, which is also what their approach is designed to optimize. I do not think this is a fair comparison as the author approach will be obviously favored compared to the ClDice and HuTopo which are not designed to minimize the betti matching.

    The authors optimized both their method’s hyperparameters and those of the compared methods using a performance score that incorporates the Betti matching. However, this approach may introduce bias, as their method is designed to optimize the Betti matching, unlike ClDice and HuTopo. This favor the author’s approach over others in the comparison, undermining the fairness of the evaluation.

    The authors’ claims sometimes appear overstated in comparison to the quantitative results provided:

    • “Our results comprehensively demonstrate that our proposed multi-class segmentation loss outperforms all baselines across all datasets in Betti matching errors” According to the results of the statistical significance tests, the Betti matching method demonstrates significantly superior Betti matching errors compared to the HuTopo approach in two out of the four datasets and outperforms the ClDice approach in three out of the four datasets.
    • “Further, we outperform all baselines in Betti number errors 0 and 1 as well as clDice in the Platelet, TopCoW, and ACDC datasets;” The claim appears overly strong for the TopCoW dataset, as the results do not exhibit statistical significance. for the TopCoW dataset as the results are not significant.

    The value of the weight parameters $\gamma^m$ and $\gamma^u$ used in the experiments are not specified. There appears to be a relationship between the two weights ($\gamma^m$ = 1 - $\gamma^u$ ?), but this link is not clearly established. Based on the results, it seems that higher values of $\gamma^m$ lead to better outcomes. This raises concerns regarding the significance of the second term ($l_{BM}^u$) in the proposed loss function. The authors should further investigate this aspect. The ablation study should present and comment the case where there is no weights and/or equal weights on the loss. This is important because the introduction of the weights is one of the methodological novelty of the article.

    Minor comments :

    • $l_{BM}^u$ and $l_{BM}^u$ should be defined
    • The second dataset is called “Cellular Segmentation” in Section 3 and “Platelet” in Section 4
    • $\beta_0$ and $\beta_1$ are used in Eq. (5) but not defined in the text.
    • The weights are sometimes referred to as $\gamma^m$ and $\gamma^u$ and sometimes as $\gamma_{BM}^m$ and $\gamma_{BM}^u$
    • Section 3 - Datasets : “Artery-Vein Classification Lastly, we test…” => “Lastly” should be removed
    • In Section 3 - Metrics : reference [1] is cited, but it is not clear why as it appear that the BM was proposed in [24] which is posterior to [1].
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The article present an interesting extension of the Betti matching loss for multi-class segmentation. However, concerns arise regarding the significance of the findings and the practical applicability of the method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The authors have addressed some of my concerns. Specifically, the training times mentioned, though more important compared to the Dice or ClDice approaches, appear reasonable on the OCTA500 dataset. However, applying their method to 3D images could be problematic.

    The issue of hyperparameter optimization remains a concern for me. Although I agree that the Betti Matching is a strong topological metric, there is still inherent bias. To alleviate any doubt, optimization based on different metrics should have been included. The brief quantitative values of the Dice optimization provided by the authors are not entirely convincing, as they are computed on a dataset that favors their approach.

    My comments regarding the weight parameters $\gamma^m$ and $\gamma^u$ have been only partially addressed.

    While the work presented in this article is interesting, I still do not find the experiments and discussions sufficiently convincing for publication in its current form.



Review #2

  • Please describe the contribution of the paper

    The authors propose an extension of the Betti matching concept [24] for multi-class segmentation. The proposed loss function is generalisable to an arbitrary number of classes for segmentation networks. They circumvent the complexity of computing N-parameter persistent homology descriptors by breaking down the N-class segmentation into N single-class segmentations. This practice is also used for other topology-based losses in literature which they use as baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well organised, concisely written, and the methodology section is clearly described. Figure 1 is especially useful to introduce Betti matching.

    The problem statement is clear and relevant. It is important to have Betti-matching topology losses which are generalisable to N classes.

    The proposed method to circumvent the complexity problem is simple, but effective.

    The quantitative results are strong and are exemplified nicely with good qualitative visualisations. The proposed method demonstrates best performance across all Betti-based metrics, and surprisingly also improves on Dice score in the Platlet and TopCoW datasets. The choice of comparison to multi-class extensions of clDice and HuTopo are well considered and thought out.

    The concept of extending persistent homology-based losses using Betti numbers to multi-class segmentation is not novel (see below), but there are still notable differences between the closest related works.[1,2]. For example [1] uses a post-processing framework, while the proposed method is used in end-to-end training. [2] use single-class, while the proposed method extends this to multi-class.

    [1] Byrne et al. “A persistent homology-based topological loss for CNN-based multiclass segmentation of CMR”. TMI, 2022 [2] Stucki “Topologically faithful image segmentation via induced matching of persistence barcodes. ICML 2023

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The Bryne et al paper [1] that you cite also proposes a multi-class extension to persistent homology-based losses using Betti numbers. This seems to be the closest work to the proposed method, however there is no comparison with it, other than a brief mention in the introduction. Perhaps direct quantitative comparison is not possible, and if that is the case then the authors should explain why. At the very least should be an explanation as to why the proposed method is superior.

    There are several other methods to improve topological correctness (post-processing, shape priors, losses) that are mentioned in the introduction and related work. The authors suggest some limitations for these methods, but it would be much stronger to show this with a quantitative comparison to the proposed work.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Publicly available datasets are used, however no source code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall an interesting piece of work that I enjoyed reading. The quality of the writing and narrative was strong throughout.

    The work could definitely be improved by more thorough comparison with other topology-based methods in literature (as mentioned above). If there are reasons why the proposed method cannot be compare to them, then I recommend stating this explicitly in the paper to avoid confusion.

    Some additional analysis of the weighting concept would be useful. Fig 3 suggests that gamma^{m}_{BM} should be kept at 0. Is that what was used to generate the final results? Does this imply that the loss that reinforces topological features that are correctly predicted is not needed?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed work is interesting, well written and has some novelty. The methodology is solid and theoretically correct. This slightly outweighs the limitations in the comparisons to other existing methods. The authors should have the opportunity to clear up any misunderstandings in regards to this.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors make a reasonable argument as to why comparison to Byrne et al [1] is not possible.

    I would increase my score because of this, however other reviewers have highlighted the computational requirements of the matching algorithm which I had previously missed. I think this needs addressing further by the authors, as the OCTA images are 300x300 resolution yet have 2x training times. Due to the cubic complexity this may make the algorithm unusable on medium-large images.



Review #3

  • Please describe the contribution of the paper

    This paper focusses on a multi class topologically accurate segmentation. By splitting the problem into N-binary segmentation using a custom loss function, the authors circumvent computationally intensive topology calculations. This is then evaluated on a vanilla Unet and compared to other loss functions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper thoroughly validates the proposed methods across four different medical datasets, demonstrating generalizability of the proposed loss function. And the proposed method clearly performs better as compared to the baselines defined in the paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In the paper, there is no discussion or results on the computational efficiency of the proposed loss function. How does this compare to other that are evaluated in the paper?

    There is no information on the actual state of the art results on the datasets. Meaning that there is no indication on how far you are from the sota results.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors do not mention that the code will be open sourced. For instance, the authors mention a Unet with residual units, however the number of layers and kernel sizes are unknown and subject to interpretation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    We train a U-Net architecture [22] with residual units from scratch. Please explain why you are of the opinion that this is generalizable for other network architectures.

    We perform a random hyperparameter search with 50 runs on each of the splits and select the model that has the highest performance S on the validation set with S being defined as a balanced performance metric of pixel-wise accuracy and topological faithfulness: It would be very interesting to see the difference in hyperparameters, as it now can be interpreted as cherry picking the final best hyperparameters.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The results show that this is a valid improvement and a useful experiment that warrants further implementation to reach state of the art results.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Dear Reviewers and ACs, we sincerely appreciate your insightful reviews and that you find our method “interesting” (R1) and “effective” (R3). We appreciate that you identify our experimental validation as “thorough” (R4) and “rigorous” (R1), our quantitative results as “strong” (R3), and that the “quality of the writing […] was strong throughout” (R3).

Practicality & runtime (R1,R4) Similarly to binary homology-based loss functions, the barcode computation scales cubical (O(n^3)) with the number of pixels and dominates the complexity (see [24], suppl.). Our multiclass extension scales linearly with the number of classes (Sec. 2), which holds for all extended baselines. The runtimes of one training (150 epochs, equal hyperparameters, OCTA-500) are: 28m10s (ours), 59m27s (HuTopo), 17m17s (clDice), and 16m56s (Dice). We are convinced this increase is acceptable for obtaining topologically accurate models and have now included runtime in the manuscript.

Comparison to Byrne et al. (R3) A comparison is impossible because their method relies on a fixed, a priori known topology of all samples. They filtered the ACDC data for slices with exactly the same topology. Contrarily, we use the complete data with varying topology, making all methods based on shape priors non-applicable. This is an important strength of our method. We clarified this in the manuscript. The multiclass extended Mosin loss [20] achieves inferior results (ACDC; BM score: 0.08, Dice: 0.85).

Hyperparameter (HP) selection (R1, R4) We incorporate the BM score during HP optimization (see Eq. 5) because our goal is topological correctness. BM is the best topological metric because it exactly measures interpretable and spatially corresponding topological features [24]. Therefore, we believe it to be the most appropriate choice. Still, if we select HPs based on Dice, our method performs superior (e.g., Platelet; BM score: 1.0 vs. 7.7, Dice: 0.71 vs. 0.64 for ours vs. HuTopo). Selection only based on Betti number errors often yields insufficient Dice scores.

Significance testing (R1) We present results on 4 datasets using 5 metrics. We found increased values in the metrics in most experiments and characterized performance differences using paired t-tests. We agree that textual performance descriptions should be concise and not overstated; consequently, we reviewed our description to strictly adhere to the results in Tab.1. Further, a typo error in Tab.1: our method does perform significantly better in BM than HuTopo on OCTA-500 (p0.002). We apologize and correct the error.

Ablation on weighting terms (R1, R3) Our introduced weighting terms can be set independently (Eq. 4). The observation in Fig. 3 is specific to the OCTA-500 dataset and underlines that the weighting can drastically affect topological performance. However, the observed trend is not general but depends on dataset characteristics, as described in our ablation. Specifically, a weight of 0 was not used to generate the final results for any dataset. We added another dataset to Fig.3 and more details to the supplement.

Generalizable to other architectures? (R4) Our topological loss function works without network priors and can be applied independently of architecture. We used U-nets because they are common in segmentation and for comparing topological methods [23,10,9,21]. Our results are robust to changes in model size with a std of 0.006 (Dice) and 0.03 (BM) across 6 different settings. A transformer-based architecture showed the same trend (ACDC; BM score: 0.07 (ours), 0.11 (HuTopo), 0.32 (Dice)).

Additional theoretical notions (R1) We now include a definition and intuition of BM and formally define the two weighting components that amplify and attenuate matched and unmatched components. Moreover, we expanded the theoretical framework on BM in the supplement.

Reproducibility (R1,R4) Code and all trained models will be published upon acceptance. HPs for each run are now included in the supplement.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper receives a mixed reviews (2 positive and 1 negative). Although there are some concerns with this work (e.g., applying to 3D images is not straightforward, and performance when optimized based on different metrics raised by R1), this work extends the persistent homology-based losses using Betti numbers to multi-class segmentation, which is a reasonable contribution. The evaluation on 4 datasets is reasonable convincing. Hence, I am inclined to accept this work.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper receives a mixed reviews (2 positive and 1 negative). Although there are some concerns with this work (e.g., applying to 3D images is not straightforward, and performance when optimized based on different metrics raised by R1), this work extends the persistent homology-based losses using Betti numbers to multi-class segmentation, which is a reasonable contribution. The evaluation on 4 datasets is reasonable convincing. Hence, I am inclined to accept this work.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    R1 thinks that the authors have addressed some of my concerns and the work is interesting, R3 and R4 accept this paper. I think this paper is interesting and well-written.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    R1 thinks that the authors have addressed some of my concerns and the work is interesting, R3 and R4 accept this paper. I think this paper is interesting and well-written.



back to top