Abstract

This study investigates utilizing chest X-ray (CXR) data from COVID-19 patients for classifying pneumonia severity, aiming to enhance prediction accuracy in COVID-19 datasets and achieve robust classification across diverse pneumonia cases. A novel CNN-Transformer hybrid network has been developed, leveraging position-aware features and Region Shared MLPs for integrating lung region information. This improves adaptability to different spatial resolutions and scores, addressing the subjectivity of severity assessment due to unclear clinical measurements. The model shows significant improvement in pneumonia severity classification for both COVID-19 and heterogeneous pneumonia datasets. Its adaptable structure allows seamless integration with various backbone models, leading to continuous performance improvement and potential clinical applications, particularly in intensive care units.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1495_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/blind4635/Multi-Region-Lung-Severity-PAFE

Link to the Dataset(s)

https://brixia.github.io/#dataset

BibTex

@InProceedings{Lee_COVID19_MICCAI2024,
        author = { Lee, Jong Bub and Kim, Jung Soo and Lee, Hyun Gyu},
        title = { { COVID19 to Pneumonia: Multi Region Lung Severity Classification using CNN Transformer Position-Aware Feature Encoding Network } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a CNN-transformer model for COVID-19 chest x-ray severity prediction and uses the same model for pneumonia severity prediction in a different dataset. The paper uses a region-based embedding method: “position-aware features” and “Region Shared MLPs”.

    Specifically, in BrixIA COVID-19 dataset, there are 6 regions with the severity score marked (0 to 3), whereas, in *** pneumonia dataset, there are only 4 regions with severity (0 to 4) marked. The method does this adaptation and presents results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The anonymously provided code made understanding easier.
    2. Results that surpass the baseline BS-Net (cited as [3], 2021).
    3. t-SNE plot shows separation – although not full – of severity classes in BixIA consensus test set.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It is not clear what the technical contributions are, given the current advancements of the field, e.g., “integrat[ing] lung region information”, in my opinion, is not a contribution. The second contribution, using COVID-19 chest x-ray data for pneumonia severity prediction, too is not a substantial contribution.
    2. It was difficult for me to compare results with the BS-Net paper (cited as [3], 2021): Is it possible to report the Pearson correlations coefficient too as done Slika et al. 2024? Slika et al. 2024 shows better results. I agree that it is unfair to compare results with Slika et al. 2024, given the MICCAI submission deadline. However, there can be papers after BS-Net which show superior results.
    3. Results of testing the module with the pneumonia (**) dataset (Table 2) is not conclusive, although the BrixIA pretraining has helped to improve the results with ResNet18 and ResNet34).
    4. It may not be possible to reproduce the paper without the author-provided code.

    Slika, Bouthaina, Fadi Dornaika, Hamid Merdji, and Karim Hammoudi. “Lung pneumonia severity scoring in chest X-ray images using transformers.” Medical & Biological Engineering & Computing (2024): 1-19.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Please see my comments above.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see my comments above.

    Minor: leads to label noise -> lead to label noise Goal In this study -> Goal: In this study (correct similar cases) lung severity classification -> pneumonia severity classification consists of Segmentation, Transform, and Augmentation -> consists of segmentation, transforming, and augmentation when the proposed 00 method (?) Consider using with instead of “w/”.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Limited novelty. Baseline being a 2021 paper. Downstream test result comparison is not reproducible due to the unavailability of the dataset. Testing on another dataset used by other papers would be good. Need for improvement in writing and difficulty in comparing results with existing papers.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed my comments partially. In my opinion, result comparisons with more recent papers are still required as another reviewer has noted. However, based on the reviews by the others, particularly, noting that they have appreciated this solution given the problem context (Covid-19 to pneumonia), I decided to upgrade my rating to “Weak Accept”.



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors present a position-aware CNN-transformer-based hybrid network architecture for assessing regional pneumonia severity using chest X-ray scans from COVID 19 patients. The authors tackle the problem in three steps, and propose a position aware feature encoding (PAFE) in the second stage of their framework. Incorporation of these modules have been demonstrated to improve the overall performance of the proposed model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper tackles an important problem that could potientially have further clinical relevance.

    2. The paper uses large, extensively annotated datasets for training and evaluating their models

    3. The qualitative study shown in Figure 3 demonstrates the effectiveness of the proposed method

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In the first module, the authors propose to use a STN. It is unclear what was the reference and the moving objects that were processed by the STN.

    2. It is also unclear as to how the STN was pre-trained?

    3. In the second module, the authors add position-awareness on a local CNN processed output. How do the authors expect to learn position awareness at an earlier resolution?

    4. Even if the method is weakly supervised, how would a vision transformer architecture perform instead of the PAFE module.

    5. The model comparisons with state-of-the-art are limited.

    6. The authors did not conduct statistical tests to ensure the significance of their results

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    None.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. We suggest that the authors look into some recent literature on position-aware embeddings and regional classification in medical images.

    2. It is important to compare the method with other position/locality aware methods, such as:

    Zheng, Y., Jiang, Z., Shi, J., Xie, F., Zhang, H., Luo, W., Hu, D., Sun, S., Jiang, Z. and Xue, C., 2022. Encoding histopathology whole slide images with location-aware graphs for diagnostically relevant regions retrieval. Medical image analysis, 76, p.102308.

    Yu, X., Qin, Y., Zhang, F. and Zhang, Z., 2024. A recurrent positional encoding circular attention mechanism network for biomedical image segmentation. Computer Methods and Programs in Biomedicine, 246, p.108054.

    1. We suggest the authors conduct statistical tests for ascertaining the performance gains of their method.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper stands at the borderline of the required technical contribution. While the technical component of this paper is well-written and presented clearly, some aspects need to be clarified or maybe improved.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The proposed hybrid approach utilizes the strengths of both CNNs and transformers for local and global feature extraction. Moreover, the study focuses on extracting positional features, which is defined as a good approach to classify pneumonia severity. The authors have used transfer learning to compare the results with publicly available datasets. Overall, the study is well-designed.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The primary strength lies in the severity classification, even in the absence of labelled pixels. This approach is interesting as it is somewhat less reliant on data and can be applied to other scenarios or problems.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The approach is good; however, how this technique can be beneficial in practice as it does not offer good performance in comparison to other techniques such as supervised techniques?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) What are the contributions of authors regarding datasets? How are labels assigned to different regions of an image based on severity? What are the criteria for that? 2) What specifically reduces haziness in images using the proposed technique mentioned in Figure 3? It seems to be a data preprocessing issue that can be addressed using appropriate preprocessing techniques. Please justify this statement along with details of the specific component. 3) How have the authors addressed the issue of images having different contrast? Variable contrast in images can lead to misclassification of severity. 4) As mentioned, Cls tokens have not been used because of the self-supervised setting. However, Cls tokens usually contain global features and can be beneficial in finding correlations between different regions of the image, thus improving the performance of the model. Please justify why the authors did not consider Cls tokens. 5) 2) The reported model only achieves a 58.5% accuracy rate in test mode, which is not significantly better than random guessing, where accuracy would hover around 50%. This implies that the model’s capacity to make precise predictions is only marginally superior to chance. Please justify how such a model could be employed in intensive care units, where even minor errors can have life-or-death consequences. I agree that AI-based systems are there to aid medical staff and should not be used standalone. Even as an assistive tool, the potential for chance predictions can lead to vulnerable situations.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is a well-designed study that attempts to solve a genuine problem. Additionally, the techniques and results are presented appropriately. Moreover, the model has generalization capabilities as it produced good results on different datasets.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

(R1)“The model comparisons with state-of-the-art are limited” (R3) “difficulty in comparing results with existing papers”: When designing our study, we reviewed papers that used the BrixIA dataset referenced in this study (see BrixIA in the “paper with code”). Most existing studies did not use BrixIA scores as we did and focused on other research areas. Only BS-Net and Slika et al. (2024) used BrixIA scores, but Slika et al. had a different evaluation methodology, making direct comparisons difficult. They predicts a global score rather than local scores. During this process, if an image has both positive and negative errors in different regions, the “global score,” which is the sum of the local scores, will experience error cancellation. Consequently, the global score MAE will tend to be lower than the sum of the local MAEs and closer to the mean of the global score distribution. In addition, the global scores in the BrixIA dataset are close to normally distributed, so they are likely to be highly correlated with the global score predicted closer to the mean. Therefore, an approach that uses the average of the local MAEs may be more reliable. Unfortunately, Slika’s recently published study did not directly compare its performance to BS-NET. However, in both papers, when tested on the Cohen COVID-19 dataset, BS-NET had a higher Pearson correlation than Slika’s method, so we cannot conclude that Slika’s method is more accurate than BS-NET. In that sense, BS-NET is best suited for comparing performance with local MAEs. We also chose CXR-CLIP (MICCAI 2023) for comparison with a more recent study. (R3)“Unclear technical contributions: 1) integrating lung region information and 2) using COVID-19 chest X-ray data for pneumonia severity prediction”: 1) There are studies on segmentation labels (Slika et al.) and localization (Frid-Adar et al. 2021), but none have applied our proposed lung region information to severity prediction. Without our proposed integration, localization information is not learned from datasets with different label positions during transfer learning, resulting in an inability to distinguish features from visually similar but different labels. Although there is still much room for improvement, the integration of lung region information, consisting of spatial normalization and PAFE, has demonstrated clear performance improvements in local severity assessment. 2) Past research focused on single diseases, either COVID-19 or pneumonia, using multiple datasets only for comparison between multi-center data or data augmentation, not cross-performance between COVID-19 and pneumonia. We demonstrated generalization by cross-validating COVID-19 to pneumonia datasets using multi-region score extraction and shared MLP methods. Our model maintained strong performance on different datasets with altered label positions using linear probing. (R1)“unclear information about STN”: STN training uses Affine Augmentation with the CXR segmentation map as input. The target image is the CXR lung segmentation, and the moving image is the augmented version. The transformation matrix is calculated, and the Affine transform is applied to the CXR image. (R4)“Benefits despite lower performance?”: The BrixIA dataset has 4 severity classes (0-3) locally, with random guessing accuracy at 25%. Our method achieves 67.1% accuracy, significantly outperforming random guessing. Furthermore, BS-NET reports an MAE of 0.528 for radiologists, while our method’s MAE is 0.35, indicating higher disagreement among doctors. Therefore, we believe our method will provide valuable information for doctors’ judgment. (R1)“Statistical tests”: At submission, we had repeated the experiment three times, confirming consistent results. ResNet18 and ResNet34 showed no overfitting and significant performance gains of up to 10% over BS-NET. No statistical tests were performed as gains and losses were clear. We will add details of the repeated experiments to the updated paper




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I have checked the reviews of this paper and there are no issues.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I have checked the reviews of this paper and there are no issues.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top