Abstract

Airway segmentation in chest computed tomography (CT) images is critical for tracheal disease diagnosis and surgical navigation. However, airway segmentation is challenging due to complex tree structures and branches of different sizes. To enhance airway integrity and reduce fractures during bronchus segmentation, we propose a novel network for airway segmentation, using centerline detection as an auxiliary task to enhance topology awareness. The network introduces a topology embedding interactive module to emphasize the geometric properties of tracheal connections and reduce bronchial breakage. In addition, the proposed topology-enhanced attention module captures contextual and spatial information to improve bronchioles segmentation. In this paper, we conduct qualitative and quantitative experiments on two public datasets. Compared to several state-of-the-art algorithms, our method outperforms in detecting terminal bronchi and ensuring the continuity of the entire trachea while maintaining comparable segmentation accuracy. Our code is available at https://github.com/xyang-11/airway_seg.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3005_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3005_supp.pdf

Link to the Code Repository

https://github.com/xyang-11/airway_seg

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yan_Airway_MICCAI2024,
        author = { Yang, Xuan and Chen, Lingyu and Zheng, Yuchao and Ma, Longfei and Chen, Fang and Ning, Guochen and Liao, Hongen},
        title = { { Airway segmentation based on topological structure enhancement using multi-task learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The work proposes a topology preserving airway tree segmentation method for CT volumes, based on deep learning. Contributions are a multi-task architecture, which combines a segmentation task and a centerline prediction task where the latter has a centerline-oriented Dice loss (clDice). Features from different scales are integrated via an attention module. Features from different tasks are integrated via a project-excite submodule.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Results on a recently widely used public benchmark are promising, especially in the more important topology oriented measures. It seems that presented results are best yet reported on that dataset.

    • The proposed method is a combination of existing techniques from the literature, used for the airway tree extraction task. While none of the individual components are original, their combination and setup might be valuable for this specific task of airway tree segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    i. The introduction is not very well structured and does not give a good motivation by summarizing limitations of current state of the art and how this translates to the contributions in this paper.

    ii. The methodological components per se as well as the idea of topology aware segmentation and multi-task segmentation are not original, but have been reported in other works (an example would be Zhang et al., Biomedical Signal Processing and Control, Centerline-supervision multi-task learning network for coronary angiography segmentation, 2023).

    iii. The description of the methodology is confusing and hard to follow, thus diminishing reproducibility quite significantly, in the absence of publicly available code.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    As mentioned in 10), there is confusion about the methodology given the description, which adds to the problem of reproducing the method. It does not seem that code will be made publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    i. The introduction is not very well structured. In the second paragraph, it mentions a number of partly topology aware airway tree segmentation methods. Next the paper states that accurate airway tree segmentation currently (i.e. in the deep learning era) faces two challenges. (1) Intra-class imbalance due to a difference in volume of trachea and peripheral bronchi. (2) breakage and leakage of peripheral bronchi. However, these aspects pose problems for off the shelf UNet based segmentation methods, not so much for the before mentioned approaches, which address these challenges! This discrepancy is then especially underlined via Fig. 1 and the reference to 3D-UNet [7] as the state of the art method for airway tree segmentation, which is not the case. Ongoing from that the contributions of the work are described, to face those challenges. However, the correct way of structuring this is to address the challenges and limitations of current state of the art methods like [3-6] to motivate the own work. As a side note, it is not clear what is meant with the “specificity between different individuals” as an argument why this segmentation is hard.

    ii. Methodologically, the notion of topology preservation has been widely used in vascular and airway segmentation, but also in computer vision (as indicated by references cited in the paper). Therefore, the idea of a multi-task network combining segmentation and centerline extraction is not new. A lot of the heavy lifting also comes from the used clDice loss in my opinion, as can also be seen in the ablation results (compare BAse and Base-CLtask). The proposed architecture modification ideas TEIM and TEAM are taken from other work (i.e. [8] and [9], respectively) and combined in an appropriate manner. Therefore, as a contribution, solely this combination and the application to the airway segmentation task plus its evaluation remains.

    iii. The description of the method in 2.1 and 2.2 is confusing and is in parts not corresponding with the figure (Fig. 2) showing the different parts of the network architecture. In Section 2.1, it is stated that Sl hat concatenates previous features Cl-1, which is not the case in Fig. 2a. Fused features from C bold are sent back to S bold -> these are undefined and don’t appear in Fig. 2a. In Section 2.2 there is no Rl hat in Fig. 2b. It is unclear why the P&E module represents features of smaller structures better. Equation (5) has a conv_1 layer but this is not reflected in the figure. Moreover, Rl from the figure should be located in a different spot according to Section 2.2 (multiplied by the low level features, i.e. it should be before entering SEFl from above). This is very confusing and prevents the reader to fully understand or reproduce the method.

    iv. The dataset is described as BAS from paper [12], and is explained as consisting of 50 train, 20 validation and 20 test cases. According to [12], annotations for 60 cases were made publicly available, and it seems this is what other compared works are using as well for fair comparison. However, this discrepancy in numbers is adding to the confusion. Moreover, the paper should repeat the protocol how the datasets were collected and how the reference annotation was created.

    v. It is not clear why the method [14] of all existing skeletonization algorithms was chosen, especially since this work is rather old (1994) and there are many newer methods that were proposed after that. This needs to be motivated and explained.

    vi. In Fig. 4 I miss a colored depiction of false positives. Moreover, I think that what authors mean is that False negatives are colored in green, not false positives!

    vii. The paper has many sections where the writing is weak, or where there are fuzzy descriptions or mistakes (as mentioned elsewhere). It would have required much more tedious proofreading to bring this in a good state for reviewing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Due to the good practical results in the evaluation, there might be some interest for the specific community interested in airway tree segmentation. But the paper is in a very rough and unpolished shape, which makes it hard to fully understand all aspects.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Authors have provided some convincing clarifications of issues, which were inadequately discussed in the paper. Given that there is a good evaluation included, my concerns about methodology can be somehow mitigated (e.g. the skeletonization component or the straight-forward combination of methods in a multi-task framweork).

    I think most of the criticisms by reviewers can be dealt with in a revision to improve the clarity of the paper, especially in the introduction and when describing the dataset as well as the skeletonization ground truth. Thus, the paper can be a nice contribution in the area of airway tree extraction.



Review #2

  • Please describe the contribution of the paper

    To enhance airway integrity and reduce fractures during bronchus segmentation, this paper proposed a novel network for airway segmentation. They utilize centerline detection as an auxiliary task to enhance topology awareness. The network introduces a topology embedding interactive module to emphasize the geometric properties of tracheal connections and reduce bronchial breakage. Compared to several state-of-the-art algorithms, the proposed method outperforms in detecting terminal bronchi and ensuring the continuity of the entire trachea while maintaining comparable segmentation accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The analysis of issues in airway detection and segmentation is thorough and insightful. 2.The use of centerline detection tasks is novel, introducing topological information about organ structure. 3.The experiments are comprehensive. Except for the numerical comparison results, they give visualization results, illustrating the segmentation advantages of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are some unclear points in the description of the paper, as follows: 1)Why does the Topology Enhanced Attention Module solve the problem of small structure segmentation? 2)The attention mechanism used in module design is often adopted in other works. Why is it designed in this way? Is there a better way to design it? 3)How is the ground truth of center line detection obtained?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.The explanation of how the Topology Enhanced Attention Module works should be described more clearly. 2.This paper mentions both topology and prior knowledge, but the introduction should highlight these two points more prominently. The method introduction does not cover topology but only discusses prior knowledge. 3.The process for obtaining ground truth (GT) for centerline detection should be clearly explained.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main factors for me to judge this paper are listed as follows: 1) The multi-task based method they proposed is reasonable, though the techniques related to the attention module is widely used. 2) The experiments are comprehensive, which is a main strength of this paper. 3) Unclear issues are the main weakness for this paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed a new method to reduce bronchi delineation breakage for the (human) airway segmentation task in Thoracic volumetric CT-scans. The core of the claimed contributions is to use the centerline detection as an auxiliary task to reduce airway segmentation failure in the critical peripheral segments. The proposed architectures is composed of one encoder and two decoders (one for each task), the centerline head being supervised by a cl-Dice loss (centerline “ground truth” derived from skeletonize function) while the both BCE and DSC losses are used for the supervison of the segmentation head. The results are compared both at pixel-level (Dice Similarity Coefficient, Precision, and Sensitivity) and graph-level (Branches detected and Tree-length detected). The method was thoroughly evaluated on the BAS dataset [12] against four state-of-the-art methods, and supplementary material provide an additional comparison on an external dataset (AeroPath [16]) with consistent results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The introduction of the paper is well written and gives a good overview of both the SOTA and the issues the proposed method tries to tackle. The description of the two modules (TEIM - to make the features extracted from both tasks interact - and TEAM - attention module based on Project and Excite [9] to better represent the finer structures -) is very detailed and clearly explained. The evaluation procedure is neat and the comparison against seven other models as well as the ablation study are very convincing.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The evaluation metrics are not mentioned in the abstract. In the abstract, it is not mentioned if the database is a private one or a public one.

    About equation (4): it goes along with a very detailed technical explanation as if the authors would like to describe the concatenation of features extracted at several layers as a contribution… it can not be claimed as a contribution as it a common feature of almost every encoder/decoder deep models used in literature.

    More generally, the authors must clarify the scientific grounding/motivation of the proposed architecture as for now the manuscript is a detailed description of the architecture but there is few clues about why those choices were made specifically.

    About equation (6) and (7): If the authors do not present a study about the different values of the meta-parameters (alpha and lambda), they should be removed as the values that where given to them results in equi-weighting all the terms present in each equation.

    p.5 “multi-scale loss for training” and “The output prediction map of each layer is up-sampled to the original image size by bilinear interpolation”: is it done from any depth ? It is not visible in the Fig.2 where the loss is computed : from these description it can be understood that the loss is not only computed after the segmentation head, please adjust the figure to make the reader understand where the loss is computed from in the framework.

    p.5 “we produced skeleton ground truth using the algorithm in [14]” : The ground truth being derived by another method (without physician verification), it raises several questions. Is the proposed method (and loss) robust regarding the potential error produced in the “ground truth” ? This point should at least be discussed.

    p.8 : The comparison with “four relevant state-of-the-art methods in recent years” is of crucial interest, but it could be argued that more recent methods where released since 2019. Yet not all with their associated code to compare them easily, but for instance about reference [3] a more recent model is available here (https://github.com/antonioguj/bronchinet) it would be of interest to compare with this up-to-date version.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors do not mention providing any code upon acceptance, we encourage them to provide it is the paper is finally accepted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    About the evaluation, it is written that “The dataset was divided randomly into training set of 50 cases, validation set of 20 cases, and testing set of 20 cases”: the BAS dataset is composed of two public datasets, please indicate if the split was performed randomly after merging all the 90 CT-scans or before merging them. In other words, are there samples from both datasets in the training and test sets ? Make sure to adapt the manuscript so that this doubts are avoided.

    p.2 “We novelty perform” p.2 : “to form the centerline detection of feature interactions with airway segmentation.” must be reformulated. Do you mean to make the features extracted from the centerline detection task interact with those extracted from the airway segmentation task ? Fig.1 (b): which 3-D UNet is it ? Is this a 3D-UNet from litterature, or a model trained by the author ? If so, on which data was it trained ? Fig.2 : the caption mention TEAM twice, gather everything about TEAM after the (b) in caption. p.4 : “a weak supervision signal that maintains the global topology.” This is an hypothesis until the results are given, formulate accordingly (“that aim at …”). p.4 : “By learning more local and global semantic information, the network is able to solve the problem of intra-class imbalance in the airway.” This sentence should be removed from the method section and located after the results are given (discussion for instance). p.4 “P&E” is not defined.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    To me, the very detailed and clear experimental setup description alongside with the extensive evaluation and comparison against SOTA algorithms, plus the presented ablation study, make this contribution reach the publication readiness level.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    The authors answered clearly and their proposed improvements are convincing. The proposal of code release for reproducibility is very much appreciated.




Author Feedback

We appreciate the encouraging comments like “insightful analysis, novel centerline detection task”(R1), “valuable for airway segmentation”(R3), “detailed experimental setup”(R4) and “comprehensive experimental results”. Our responses to the questions are as follows. 1.(R1 R3)Novelty and motivation of topology preservation including Topology Enhanced Attention Module(TEAM)&Topology Embedded Interaction Module(TEIM) 1)TEAM: The ‘Project&Excite’[9] in TEAM retains the spatial information of anatomical structures through feature recalibration. By merging low-level spatial and high-level semantic feature maps, TEAM is used to build cross-scale dependencies and enhance the network’s response to various airway scales. Our previous paired T-test showed that Base-TEAM was significantly better than Base on DSC, BD and TD indicators. Theoretical analysis from [9] and our results(Table2) support its validity in segmenting small structures. 2)TEIM: Unlike the shared decoder strategy of Zhang et al.(BSPC,2023), we use the independent centerline detection decoder to ensure the network’s learning ability for tubular topology. TEIM strengthens the topological interaction of dual branches through cross-channel feature fusion. Base-TEIM improves BD&TD by about 6% compared to Base. The previous paired T-test showed significant differences, demonstrating the effectiveness of our topology preservation method. 2.Introduction 1)(R1)‘Topology’ explanation: [11] showed that the centerline is the minimal point set representing the tubular topology. This inspired us to use it as prior knowledge to improve the network’s topological awareness of the airway by introducing the centerline detection task and TEIM. 2)(R3)Structure of the introduction: Our method focuses on preserving airway continuity. The introduction reviews related work and finds that Ref[3-6] partly added topological priors but are still insufficient for bronchioles. We address this issue by introducing stronger anatomical priors. 3D-UNet serves as the backbone for [3-6], while the segmentation results of 3D-UNet typically exist the airway topology breakage problem (Fig.1). We then outline airway segmentation challenges and introduce our approach. 3.(R3)Comparison with clDice-loss We only use the formula from Ref[12]. Different from R3’s view, our previous experiments showed that the Base-CLtask outperformed clDice-loss on all metrics, especially on BD&TD indicators. We found that the soft-skeletonization in [12] often results in broken centerlines, making it insufficient for accurate topology representation. Thus, we use the method in [14] instead to obtain an unbroken centerline, providing accurate ground truth(GT) for the auxiliary task. 4.(R1 R3 R4)Details of centerline GT Lee et al.[14] proposed a robust method to extract centerline via 3D medial surface axis thinning algorithms. The method is widely used including Yao et al.(TMI,2023), Peng et al.(JBHI,2024). In our work, two doctors validated the accuracy of extracted results. Given the widespread adoption and expert validation, we think using the generated labels as GT is feasible. 5.(R3 R4)Dataset The public BAS dataset, first used in [12], comprises 60 cases. In their subsequent work, Zheng et al.(TMI,2021) increased the number to 90 cases, which is used in most related work. We randomly divided the dataset after merging all 90 scans. For details about the dataset please refer to Zheng et al.(TMI,2021). 6.(R3 R4)Unclarity We proofread Fig.2 and method variables for consistency. C Bold and S Bold are defined in the Fig.2 caption. We corrected the error in Eq5: the first-level TEAM inputs are E0 and E1, with subsequent levels using El and SEFl-1 (the previous level’s TEAM output). We added details in the abstract, clarified the loss function in Fig.2, corrected the meta-parameters in Eq6&7, the error subplot in Fig.4, and syntax errors. We will carefully correct errors and release the code to ensure clarity and reproducibility.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All the reviewers agree to accept it, and I also believe that this is a good article

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All the reviewers agree to accept it, and I also believe that this is a good article



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper’s approach is novel, and the experiment proved the method’s performance. I recommend accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper’s approach is novel, and the experiment proved the method’s performance. I recommend accepting this paper.



back to top