Abstract

Chronic obstructive pulmonary disease (COPD) is a type of obstructive lung disease characterized by persistent airflow limitation and ranks as the third leading cause of death globally. As a heterogeneous lung disorder, the diversity of COPD phenotypes and the complexity of its pathology pose significant challenges for recognizing its grade. Many existing deep learning models based on 3D CT scans overlook the spatial position information of lesion regions and the correlation within different lesion grades. To this, we define the COPD grading task as a multiple instance learning (MIL) task and propose a hierarchical multiple instance learning (H-MIL) model. Unlike previous MIL models, our H-MIL model pays more attention to the spatial position information of patches and achieves a fine-grained classification of COPD by extracting patch features in a multi-level and granularity-oriented manner. Furthermore, we recognize the significant correlations within lesions of different grades and propose a Relatively Specific Similarity (RSS) function to capture such relative correlations. We demonstrate that H-MIL achieves better performances than competing methods on an internal dataset comprising 2,142 CT scans. Additionally, we validate the effectiveness of the model architecture and loss design through an ablation study. and the robustness of our model on different central datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1863_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1863_supp.pdf

Link to the Code Repository

https://github.com/Mars-Zhang123/H-MIL.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zha_Hierarchical_MICCAI2024,
        author = { Zhang, Hao and Zhao, Mingyue and Liu, Mingzhu and Luo, Jiejun and Guan, Yu and Zhang, Jin and Xia, Yi and Zhang, Di and Zhou, Xiuxiu and Fan, Li and Liu, Shiyuan and Zhou, S. Kevin},
        title = { { Hierarchical multiple instance learning for COPD grading with relatively specific similarity } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a hierarchical multiple instance learning framework for detecting COPD and predicting its GOLD stages. Differently from most other MIL framework, the authors use image slices as instances instead of smaller 3D patches. For predicting GOLD stages, the authors propose a relatively specific similarity. Overall, the paper is well-written, with exhaustive ablation studies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The main strength of the paper lies in the way COPD GOLD stage prediction is formulated through the use of RSS. This is an important step towards characterizing various complicated sub-groups in COPD.

    2. The ablations studies conducted by the authors are exhaustive and provide detailed insights about various model components.

    3. The paper uses a comprehensive set of quantitative metrics for evaluating their method

    4. Overall, the manuscript is well-written and easy to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors do not compare their method with some very important, simpler state-of-the-art methods. For instance, there are a few radiomics models that achieve AUCs > 0.91 on large COPD cohorts (see below).

    Amudala Puchakayala, P.R., Sthanam, V.L., Nakhmani, A., Chaudhary, M.F., Kizhakke Puliyakote, A., Reinhardt, J.M., Zhang, C., Bhatt, S.P. and Bodduluri, S., 2023. Radiomics for improved detection of chronic obstructive pulmonary disease in low-dose and standard-dose chest CT scans. Radiology, 307(5), p.e222998.

    Chaudhary, Muhammad FA, Yue Pan, Di Wang, Sandeep Bodduluri, Surya P. Bhatt, Alejandro P. Comellas, Eric A. Hoffman, Gary E. Christensen, and Joseph M. Reinhardt. “Registration-invariant biomechanical features for disease staging of COPD in SPIROMICS.” In Thoracic Image Analysis: Second International Workshop, TIA 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 8, 2020, Proceedings 2, pp. 143-154. Springer International Publishing, 2020.

    1. Then there is a deep learning model that operates on simple 2D slices and achieves similar performance on large cohorts.

    Gonzalez, G., Ash, S.Y., Vegas-Sánchez-Ferrero, G., Onieva Onieva, J., Rahaghi, F.N., Ross, J.C., Díaz, A., San José Estépar, R. and Washko, G.R., 2018. Disease staging and prognosis in smokers using deep learning in chest computed tomography. American journal of respiratory and critical care medicine, 197(2), pp.193-203.

    1. The performance for COPD grading is very low. Simpler approaches have achieved much higher performance. For instance, the study below achieved an AUC of > 0.80 for predicting COPD GOLD stages.

    Chaudhary, Muhammad FA, Yue Pan, Di Wang, Sandeep Bodduluri, Surya P. Bhatt, Alejandro P. Comellas, Eric A. Hoffman, Gary E. Christensen, and Joseph M. Reinhardt. “Registration-invariant biomechanical features for disease staging of COPD in SPIROMICS.” In Thoracic Image Analysis: Second International Workshop, TIA 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 8, 2020, Proceedings 2, pp. 143-154. Springer International Publishing, 2020.

    1. Similarly, the idea of hierarchical MIL may not be entirely novel.

    Yan, R., Shen, Y., Zhang, X., Xu, P., Wang, J., Li, J., Ren, F., Ye, D. and Zhou, S.K., 2023. Histopathological bladder cancer gene mutation prediction with hierarchical deep multiple-instance learning. Medical Image Analysis, 87, p.102824.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    None.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. We suggest the authors compare their method with different state of the art methods mentioned above to show the superiority of their method.

    2. We suggest that the authors dissuss the differences between their H-MIL approach with other hierarchical MIL approaches used for medical image classification or grading.

    3. From a clinical standpoint, COPD GOLD stages are increasingly variable, and several structural CT biomarkers have highlighted their vulnerability.

    Regan, E.A., Lynch, D.A., Curran-Everett, D., Curtis, J.L., Austin, J.H., Grenier, P.A., Kauczor, H.U., Bailey, W.C., DeMeo, D.L., Casaburi, R.H. and Friedman, P., 2015. Clinical and radiologic disease in smokers with normal spirometry. JAMA internal medicine, 175(9), pp.1539-1549.

    1. Similarly, spirometry which is used to define GOLD has a lot of vulnerabilities, one being PRISm subjects.

    Wan, E.S., Castaldi, P.J., Cho, M.H., Hokanson, J.E., Regan, E.A., Make, B.J., Beaty, T.H., Han, M.K., Curtis, J.L., Curran-Everett, D. and Lynch, D.A., 2014. Epidemiology, genetics, and subtyping of preserved ratio impaired spirometry (PRISm) in COPDGene. Respiratory research, 15, pp.1-13.

    The method proposed here could be applied to other more clinically relevant sub-groups of COPD, and we suggest that the authors investigate other tasks as well.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The technical novelty is reasonable, but the performance in comparison to the literature is significantly lower. Especially in case of COPD GOLD stage prediction, which is the main contribution of this paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces a novel Hierarchical Multiple Instance Learning (H-MIL) method with a Relatively Specific Similarity (RSS) loss for COPD grading. By employing spatial attention fusion across varying levels, from pixel to sub-bag, it effectively refines disease information from large-scale CT volumes. The innovative RSS loss enhances multi-class correlations, boosting classification accuracy. This approach surpasses existing methods in both in-house and external dataset evaluations, promising improved COPD grading precision.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well written, with clear motivations, sufficient technical explanations and illustrative visualizations. (Experiments) 1.The proposed H-MIL with 2D slices consistently outperforms other methods across both COPD binary classification and grading tasks. 2.The ablation study in the paper is robust and comprehensive. It meticulously evaluates various components like sub-bag partitioning, pixel level attention fusion (PLAF), and RSS-based loss, providing a clear understanding of their individual and collective contributions to the model’s overall performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the method is evaluated on an in-house dataset, external validation on publicly available datasets or a different cohort could further validate its robustness and generalizability.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is commendable for its novel approach and comprehensive evaluation. However, external validation on diverse datasets would strengthen its generalizability.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper offers a novel method for COPD grading with strong experimental results. However, external validation is essential for broader acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal and the comments from the other reviewers, I decided to keep my score unchanged (Weak Accept).



Review #3

  • Please describe the contribution of the paper

    This study proposes a hierarchical multi-instance learning method with relative specific similarity for the classification and identification of Chronic Obstructive Pulmonary Disease (COPD). The hierarchical multi-instance learning strategy, by introducing pixel-level fusion, slice-level fusion, and sub-bag-level fusion, achieves progressive attention fusion and effective information refinement, enabling resource-friendly and fine-grained interpretation of lung lesions. Moreover, by leveraging the correlations between labels in the COPD grading task, the model is constrained to continually learn the relationships between different severities of the disease, helping to achieve better and more robust disease grading.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A hierarchical multi-instance learning (H-MIL) model is proposed. The correlation between labels in the COPD grading task was fully utilized. Compared with the existing methods, the performance is significantly improved.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The data sets used for experimental verification are mainly internal data sets, and the validation of public data sets is too few.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors claim to release the source code after accepting the submission and state the open data set source. An internal dataset was used in the paper, but it was not stated whether it was made public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.The data sets used for experimental verification are mainly internal data sets, so it is suggested to add validation of public data sets or provide data sets, which can make the verification of method performance more convincing. 2.In this paper, the comparison method is called “competition method” and it is suggested to change the expression.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel, but most of the data sets used for experimental verification are internal data sets.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thanks for your valuable comments. 1.R#4-Q1&R#5-Q1: About public external validation. A1:1)We noted a large public dataset,COPDGene,used in COPD-related studies,but it is currently in the application stage.We will report the corresponding results once the application is successful. 2)The code and the corresponding model weights will be disclosed after acceptance. 3)Due to involvement with partner institutions,we regret that the dataset we used cannot be publicly released at this time.

  1. R#6 Q1:Comparison with other simple SOTA models. A1:1)The dataset used by Amudala et al. is larger(8,878 vs. 2,142) and more carefully screened than ours,making direct performance comparisons unfair. 2) Indeed,we had taken note of this article earlier and replicated its methodology. Using the same dataset,our performance is better than Amudala et al.’s approach.In addition to the demographic features,gray features (LAA<-950),and airway morphological features mentioned in the paper,we also consider vascular morphological features.We comprehensively compared traditional machine learning methods,including MLP,SVM,random forest,Adaboost,and Catboost.The best-performing method among these exhibited an 11.0% gap in AUC compared to our method with the same dataset.

Q2:Comparison with some simple 2D methods. A2:1)Gonzalez et al reports an AUC of 0.856 for the diagnosis of COPD,which is lower than the AUC of 0.896 achieved in our study. 2) Although Gonzalez et al.’s approach is easy to implement,it captures less 2D information and is weak in information condensation.Our method conducts more extensive 2D sampling on a single CT scan and achieves more efficient information fusion through the use of the PLAF strategy, thus enhancing its effectiveness.

Q3:The low performance for COPD grading. A3:1) There are limited existing methods for grading research,and the performance of grading is often hindered by issues such as small dataset size and significant class imbalance(directly related to the probability of disease occurrence or progression). 2)The method(Chaudhary et al.)utilizes paired respiratory data,which is challenging to obtain in practical scenarios and poses higher dose exposure risks.Besides,differences in cohort data(data size and quality)make direct performance comparisons unfair. 3)Inspired by the inter-class correlation in COPD grading tasks,we innovatively proposed the RSS, which enhances model predictions by accurately modeling the intrinsic association of classification labels.The ablation experiment results in Table2 further demonstrate the significant performance improvement.

Q4&Q5:The novelty of H-MIL & Comparsion with other hierarchical MIL. A4&A5:We have reviewed the relevant paper you mentioned,and we believe it is a case of name conflict,as the methods differ significantly: 1)The PLAF strategy proposed by H-MIL in our paper focuses on key regions within key slices in key sub-bags through a hierarchical process of pixel-level,sub-bag-level, and bag-level feature fusion.This approach is distinct from the method proposed by Yan et al. 2)Yan et al.’s method involves dividing instances into K clusters, which require repeated sampling and combination into bags within each cluster.The labels of these combined bags inherit the labels of WSI.In contrast, our method does not require label inheritance or sampling.The PLAF strategy enables the step-by-step expression of bag-level features without introducing additional noise,making it more efficient. 3)We were inspired by [23],but for the characteristics of COPD,we introduced the PLAF strategy and RSS.These additions improved diagnostic performance by 4.96% and grading accuracy by 7.40%.

Q6:The vulnerability of COPD GOLD stages A6:We strongly agree with this perspective.Our study also underscores the significance of CT in the early diagnosis and classification of COPD.Moving forward,we aim to integrate multiple modalities of information to further advance community development.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This work makes a commendable effort in addressing a practical problem. It would benefit further from additional experiments comparing it against alternatives on public benchmarks.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This work makes a commendable effort in addressing a practical problem. It would benefit further from additional experiments comparing it against alternatives on public benchmarks.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Decent work in addressing a practical problem. Will further benefit from more experiments against alternative on public benchmarks.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Decent work in addressing a practical problem. Will further benefit from more experiments against alternative on public benchmarks.



back to top