Abstract

Medical image interpretation often encompasses diverse tasks, yet prevailing AI approaches predominantly favor end-to-end image-to-text models for automatic chest X-ray reading and analysis, often overlooking critical components of radiology reports. At the same time, employing separate models for related but distinct tasks leads to computational over-head and the inability to harness the benefits of shared data abstractions. In this work, we introduce a framework for chest X-Ray interpretation, utilizing a Transformer-based object detection model trained on abundant data for learning localized representations. Our model achieves a mean average precision of ∼ 94% in identifying semantically meaningful anatomical regions, facilitating downstream tasks, namely localized disease detection and localized progression monitoring. Our approach yields competitive results in localized disease detection, with an average ROC 89.1% over 9 diseases. In addition, to the best of our knowledge, our work is the first to tackle localized disease progression monitoring, with the proposed model being able to track changes in specific regions of interest (RoIs) with an average accuracy ∼ 67% and average F1 score of ∼ 71%. Code is available at https://github.com/McMasterAIHLab/CheXDetector.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3269_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3269_supp.pdf

Link to the Code Repository

https://github.com/McMasterAIHLab/CheXDetector

Link to the Dataset(s)

https://physionet.org/content/chest-imagenome/1.0.0/

BibTex

@InProceedings{Esh_Representation_MICCAI2024,
        author = { Eshraghi Dehaghani, Mehrdad and Sabour, Amirhossein and Madu, Amarachi B. and Lourentzou, Ismini and Moradi, Mehdi},
        title = { { Representation Learning with a Transformer-Based Detection Model for Localized Chest X-Ray Disease and Progression Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an approach for localized disease detection and localized progression monitoring on chest X-rays. They propose to train a DETR model to localize anatomical regions and extract region feature vectors for them. For disease detection they apply a simple MLP ontop of those features, while for progression monitoring they propose the use of a self-attention module ontop of region feature differences between several images. They train and evaluate their method on the Chest ImaGenome dataset, which provides the necessary targets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A simple but effective approach to tackling the understudied problem of localized progression monitoring
    2. The proposed method predicts bounding boxes, which helps with interpretability
    3. Ablation study on the attention component in the localized progression monitoring task
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Results on the localized disease detection task are notably worse than baseline results. The best baseline AnaXNet outperforms the proposed method by 4%. The proposed method is only on-par with a simple global DenseNet121 model (see AnaXNet paper). Therefore, the proposed method does not provide any improvements on existing literature, limiting the relevance of the work.
    2. Missing baselines for the localized progression monitoring task. The authors only compare against a single baseline which uses only global image information. They did not include the baseline proposed in Example Task 1 of the Chest ImaGenome dataset paper (the same task and dataset the authors are using). While the baseline was only evaluated on a subset of regions and diseases, it could have been extended to provide a strong baseline. At least, a comparison of the subset of regions and diseases would be easily possible.
    3. Limited to no novelty in the Localized Disease Detection task. The proposed approach closely follows the Faster R-CNN approach already proposed in Example Task 2 of the Chest ImaGenome dataset paper (the same task and dataset the authors are using), only the Faster R-CNN detector is replaced by a DETR model. Also, the model ADPD (citation [9] in the paper) follows a very similar approach, using a DETR model to detect anatomical regions and classifying diseases for these regions using an MLP, the main difference being that ADPD targets a different task.
    4. The authors missed an important related work on (global) disease progression monitoring in chest X-rays:
      • Bannur, Shruthi, et al. “Learning to exploit temporal structure for biomedical vision-language processing.” CVPR 2023
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Baselines on the Localized Disease Detection cannot be easily compared, as they are only summarizes within the text. I recommend comparing the proposed method directly with the 4 baselines presented in the AnaXNet paper (Faster R-CNN, Global View i.e. DenseNet121, CheXGCN, and AnaXNet). For this I recommend a table, either (i) a small table showing only the AVG AUC scores which could be inserted right to Fig. 2, or (ii) a table showing the AUC scores for all diseases and the AVG AUC score and replacing Fig. 2. I would recommend option (ii), as this gives a better overview of specific diseases.
    • The baseline on the localized progression monitoring task is not presented in any table and only an avg score is provided in the text. I recommend adding it to Tab. 4, to allow an easy comparison.
    • In Tab. 4, it is not clear which of the three methods is the “main” method proposed in the paper. This should be clearly indicated, e.g. in the caption. Additionally, the difference between Global and Region-focused attention should be explained. Also, for the MLP version, it is not clear whether it uses feature differences or concatenation, this should be explained as well. I recommend adding a short paragraph explaining the methods studied in the ablation studies.
    • Which IoU threshold where used for computing the mAP values for region detection in Tab. 2?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • No/limited novelty with results subpar to baselines on the localized disease detection, limiting the relevance of this work.
    • simple but effective approach on the progression monitoring task could be interesting for the community
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Most of my points have been clarified and addressed. The method remains interesting, but due to the limited novelty and limited baselines, I would consider it barely above the acceptance threshold. For the camera-ready version, I highly recommend:

    1. To focus the contribution on the second (progression) task, and clearly indicating the first task as an “additional” supplementary task.
    2. Presenting the main results of both tasks in tables to simplify comparison



Review #2

  • Please describe the contribution of the paper

    The paper proposed a novel approach to using a DETR model for localized disease detection and localized progression monitoring for chest x-ray interpretation. It also introduced the task of disease progression monitoring at a localized level.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper utilized detection transformer for localized disease detection in chest x-rays. It also creates a detailed localized disease progression monitoring task which is more clinically important. The proposed method also fits for the detailed task.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper does not have details about how to partition the anatomical regions.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper includes a detailed parameter setting and dataset introduction.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The paper lacks the details how to partition the anatomical areas.
    2. highlight status of “diff” is not consistent in section 2.3 and eq (2).
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper creates a task (localized diseased progression monitoring) that is clinically important but ignored in AI research. It also contributes to solving the task by fitting the detection transformer with different loss calculation and sufficient experiments.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Authors propose a model for the localized chest x-ray disease classification and the localized disease progression monitoring. The model is based on a well-known transform-based object detection model DETR.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Simplicity: authors reuse a well-known DETR model and perform the pretraining using a standard object detection setup. Further, an additional MLP is fine-tuned for either the disease localization or the disease progression monitoring. Extendability: the method can be transferred to the similar 2d and 3d tasks, requiring only the adaptation of the encoder. Validation: authors extensively validated the work and provided relevant metrics. Novelty: authors proposed a benchmark for the localized disease progression monitoring task.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weak ablation study. “The results for localized progression labeling are presented in Table 4 for the three model variations. It is clear that the introduction of the attention layer in this classifier has improved the results compared to the baseline of MLP.” The average results were improved indeed. However, we can notice exceptional results of MLP for the Right Upper Lung Zone(96.17/94.30), Left Upper Lung Zone(Left Upper Lung Zone), etc. So there is a chance that the MLP was inferior due to the large variance issues.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Ablation study. I would rewrite the paragraph about the importance of attention (Table 4). In my opinion, the MLP results can not be directly compared to the attention due to high variance of the MLP.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty of the localized disease progression monitoring task, creation of a new benchmark.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have partially addressed my comment about Table 4. However, I do not understand such a high difference between attention and MLP for relatively large zones (e.g. Lung Zones), so I maintain my original score.




Author Feedback

We thank reviewers for their feedback, highlighting several strengths: novelty (R1), extensive validation (R1), clinical importance (R2), studying an understudied problem (R2), simple effective/extendable (R1, R3), interpretability (R3).

R1 MLP baseline: For smaller anatomically complex regions (hilar structures, costophrenic angles, etc), MLP consistently showed poor performance, since it does not explicitly account for spatial relationships and contextual details. The integration of attention has led to noticeable improvements in classifying these difficult regions. Attention exhibits more balanced performance across different regions and a reduction in variance.

R1, R3, R4 Code/data availability: Our data is already public. We’ll open-source all code.

R3 Anatomical regions partition: Anatomical regions used are those defined and provided in the ChestImageGenome dataset, explained in publications [12,13]. We did not alter these.

R3 highlight”diff” inconsistency: Typo, apologies, will be fixed.

R4 Direct comparisons with AnaXNet baselines: Our main contributions are introducing the localized progress monitoring task and demonstrating that the proposed DETR region representations work well for various downstream tasks such as localized progress monitoring. The addition of the localized disease detection problem here was to show the potential of the approach in producing features that can be used in a variety of tasks. Since we don’t use the exact same dataset, it is fair to say that we are close to state-of-the-art with a representation learning method that was not optimized for this task.

R4 Comparison with original ImageGenome paper on localized disease progression: ChestImageGenome dataset paper [13] reported a simple version of the disease monitoring problem only in lungs, with 2 diseases (heart failure and hazy opacity), and a binary classification problem (Improved/Worsened). The addition of ‘No Change’ label significantly increases task complexity which motivated the authors in [5,8] to work on this task. However, these works don’t tackle localized disease detection, and as shown in Fig.3 fail to predict different progression labels for different anatomical locations of a given instance. Also similar to [5,8] we include 9 different findings which means our data is different from [13]. And we classify 12 anatomical regions which is a distinction from both [13] and [5,8]. Nevertheless, we now trained a CNN Siamese network equivalent to the one in [13] with 3 classes, on 9 diseases and 12 anatomical regions, maintaining our original train/test splits. This simple model delivered an weighted ave accuracy of only ~34% and f1 score of ~32% on our dataset.

R4 Limited novelty in localized disease detection: Although our method performs comparably well with more sophisticated end-to-end baselines on localized disease detection, this task is not central to our claim of novelty. Also, there are distinctions such as our use of DETR as a source of features and not just a localization mechanism.

R4 CheXRelFormer disease progression model not presented in Table 4: As the reviewer pointed out, we do mention the closest match (work in [8]) in our paper. We didn’t include this number in Table 4 to avoid the impression of a direct comparison since previous work in this area doesn’t tackle localized disease progression.

R4 Table 4 + more details about attention variants: In the region-based self-attention method, we use the output row associated with a specific ROI from the self-attention mechanism. In global attention (our main method), we average all output rows from self-attention to create a single “global” vector representing all ROIs in the CXR image. For each specific ROI, the global vector is concatenated with the ROI’s difference vector. Global attention performs better in detecting progression.

R4 IoU threshold for mAP + BioViL-T Citation: Threshold is 0.5. We’ll add this detail, and also the missing reference




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    There is a clear consensus among reviewers

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    There is a clear consensus among reviewers



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top