Abstract

Whole brain parcellation requires inferring hundreds of segmentation labels in large image volumes and thus presents significant practical challenges for deep learning approaches. We introduce label merge-and-split, a method that first greatly reduces the effective number of labels required for learning-based whole brain parcellation and then recovers original labels. Using a greedy graph colouring algorithm, our method automatically groups and merges multiple spatially separate labels prior to model training and inference. The merged labels may be semantically unrelated. A deep learning model is trained to predict merged labels. At inference time, original labels are restored using atlas-based influence regions. In our experiments, the proposed approach reduces the number of labels by up to 68% while achieving segmentation accuracy comparable to the baseline method without label merging and splitting. Moreover, model training and inference times as well as GPU memory requirements were reduced significantly. The proposed method can be applied to all semantic segmentation tasks with a large number of spatially separate classes within an atlas-based prior.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2217_paper.pdf

SharedIt Link: https://rdcu.be/dV51l

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72114-4_34

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2217_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Kuj_Label_MICCAI2024,
        author = { Kujawa, Aaron and Dorent, Reuben and Ourselin, Sebastien and Vercauteren, Tom},
        title = { { Label merge-and-split: A graph-colouring approach for memory-efficient brain parcellation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {350 -- 360}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a novel label merge-and-split method based on the greedy graph coloring algorithm, aimed at reducing the CPU and GPU memory requirements during both training and testing processes. The proposed method effectively decreases the usage of real labels while still achieving comparable results to previous methods. Experimental results validate the effectiveness of the proposed approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper applies the greedy graph coloring algorithm to reduce the number of true labels and reconstruct new labels for training and testing. Additionally, an algorithm based on atlas-based influence regions is employed to post-process the predicted results. Compared to the original labels, the usage of labels is reduced by over 60%, while achieving comparable training results. The paper makes a clear description and is well-structured.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of explanation regarding the overall algorithmic. Inadequate validation of individual modules in the experimental results. Absence of comparative validation with baseline results in the visualization outcomes.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. It would be beneficial for the reproducibility to provide the overall algorithm or pseudocode.
    2. The rationale behind why the merge-and-split method reduces the usage of real labels is not adequately explained, nor is the impact of using the merge-and-split method discussed in the article.
    3. More visualization results and comparisons could further illustrate the effectiveness of the proposed method.
    4. Why is it that the original label yields better results in the Hausdorff dataset for IXI, but the merged label performs better in the RVE dataset? What are the reasons for this difference?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed merge-and-split method in the paper is intriguing, and the overall clarity of the paper is commendable. However, certain issues regarding the overall algorithm lack clear explanations, and the limited experiments and visualization results do not fully support the claimed results in the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The article presents a method to merge labels in a whole brain segmentation setting with large number of labels. The approach is based on a graph colouring algorithm applied to an averaged label map over the training set. After inference on merged labels, original labels are retrieved according to this averaged label map. Experiments show improvements in processing time and memory efficiency while keeping competitive performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The article is well-written, easy to follow, with clean structure and rigorous mathematical notations.
    • The proposed idea is interesting and can be very useful in segmentation settings with large number of labels (only if examples are registered though).
    • The experiments show significant improvements in processing time and memory efficiency (which are the main bottlenecks for large segmentation problems) while keeping competitive performance when compared to a baseline.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The experimental part could be slightly extended. Currently, the proposed method is only compared to a simple baseline. It would be interesting to compare the approach with another merging strategy (for example the symmetry-based strategy mentioned in the paper) and/or another state-of-the-art approach without using pseudo labels. Other important metrics to report would be:
      • the upper bound performance induced by the label splitting strategy: assuming perfect merged labels predictions, what is the performance obtained after splitting?
      • the performance difference induced by the use of an atlas prior: what if the support map is used to refine the predictions in the baseline unmerged setting? (for example by masking/reweighting the predictions with the support map)
    • Some methodological choices could be further discussed. For example, the study in Fig. A of the supplementary only shows how the number of labels varies with the two thresholds. It would be interesting to at least train the model in an extreme case of less labels to see how performance is impacted by this choice.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • In future works, it would be interesting to experiment with different graph colouring algorithms to see the influence on the results.
    • In the same way, the KDTree applied on border coordinates could be discussed. Other strategies could be used, for example taking inspiration from hierarchical clustering or binary partition trees (BPT) algorithms.
    • For label splitting, it would be interesting to keep the fuzzy strategy until the end. For example, by applying a labelwise softmax on the fuzzy support map, then by picking the unmerged label probabilistically instead of argmax.

    Other minor comments:

    • In Eq. (1), I am not sure why a tilda is used for Y_vl, as Y should already be a one-hot encoded label map?
    • In Fig. 2, the use of two colors could create confusion with the two colors the two colors of the adjacency matrix. Maybe it would be better to use different colors and/or more colors for the graph.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach is interesting and well presented. However, some additional experiments could further improve the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed a graph-coloring approach for memory-efficient brain parcellation by reducing the number of labels. The original labels can then be stored from the merged labels based on atlas-based prior.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The novelty of the method is very good. Extensive experiments on three public datasets demonstrated that the proposed method effectively reduces the time and memory during both the training and inference phases.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The limitations of the work have been sufficient discussed in the paper. No obvious weakness.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    How sensitive are the parcellation performances to the distance thresholds applied for merging the labels?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty of the method and experiments

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Dear reviewers,

Thank you for the detailed and constructive feedback. In the following, we will address the raised concerns and suggestions.

Reviewers #1 and #2 suggest further experiments to validate the label merge-and-split approach. We are currently conducting further experiments including: 1) symmetry-based label merging suggested by reviewer #1, and 2) application of the method to brain images with lesions. Due to the limited available space, we plan to include the results in a journal extension.

Reviewer #1 suggests reporting an upper bound performance based on perfect merged label predictions. While this experiment was not included in the manuscript, we found that with the chosen distance and volume-ratio thresholds the merged labels are split perfectly so that the ground truth labels are fully restored across the testing sets. This implies that differences in segmentation accuracy compared to the baseline without label merge-and-split can mostly be attributed to the learned CNN weights. However, a distance threshold that is small compared to the spatial variability of structures in the test set will indeed result in faulty label splitting and reduced segmentation accuracy.

Reviewer #1 suggests refining the baseline predictions by using an atlas prior. While we consider this interesting idea out of scope for this work, we would like to refer to a recent publication that explores this approach (Fidon, et al., 2024). The authors demonstrate that the atlas prior can indeed be combined with the CNN output to increase the robustness of the prediction.

Regarding reviewer #1’s comment on Eq. (1): The tilde is used to indicate the one-hot encoding. Y (without tilde) is defined as the corresponding integer-encoded representation.

For Fig. 2, we will adopt reviewer #1’s sensible suggestion to use more than two colours for the graph in the camera-ready version.

Reviewer #2 points out correctly that the IXI_merged model performs better than the IXI_orig model in terms of relative volume error (RVE) but worse in terms of Hausdorff distance. Considering the small differences (relative to the standard deviations) we expect that this observation is due to statistical fluctuations resulting from the randomness in the training process. Additional experiments are required to confirm whether the label merge-and-split method is favourable/disadvantageous for specific metrics.

  1. Fidon, Lucas, et al. “A Dempster-Shafer approach to trustworthy AI with application to fetal brain MRI segmentation.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).




Meta-Review

Meta-review not available, early accepted paper.



back to top