Abstract

Anomaly detection in chest X-rays is a critical task. Most methods mainly model the distribution of normal images, and then regard significant deviation from normal distribution as anomaly. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. In this paper, we aim to explore the potential of CLIP-based methods for anomaly detection in chest X-rays. Considering the discrepancy between the CLIP pre-training data and the task-specific data, we propose a position-guided prompt learning method. Specifically, inspired by the fact that experts diagnose chest X-rays by carefully examining distinct lung regions, we propose learnable position-guided text and image prompts to adapt the task data to the frozen pre-trained CLIP-based model. To enhance the model’s discriminative capability, we propose a novel structure-preserving anomaly synthesis method within chest x-rays during the training process. Extensive experiments on three datasets demonstrate that our proposed method outperforms some state-of-the-art methods. The code of our implementation is available at https://github.com/sunzc-sunny/PPAD.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0578_paper.pdf

SharedIt Link: https://rdcu.be/dVZiR

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72378-0_53

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0578_supp.pdf

Link to the Code Repository

https://github.com/sunzc-sunny/PPAD

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Sun_PositionGuided_MICCAI2024,
        author = { Sun, Zhichao and Gu, Yuliang and Liu, Yepeng and Zhang, Zerui and Zhao, Zhou and Xu, Yongchao},
        title = { { Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {567 -- 577}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a position-guided prompt learning method for anomaly detection in chest X-rays, which bridges the gap between training and pre-training data through learnable text and image prompts. The position-guided prompts are also introduced to mimic the expert diagnostic process, allowing the model to focus on each region. In addition, a Structure-preserving Anomaly Synthesis method (SAS) is added to generate seamless pseudo-data through gamma correction, producing more realistic anomalies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Considering the problem of the gap between task-specific data and pre-training data, instead of using previous model fine-tuning, the approach of reducing the data distribution difference through cue engineering is noteworthy. In addition an anomaly sample synthesis method is provided for generating more realistic anomaly samples, which considers the previous problem of anomaly data being too raw, and is more adequate. Ablation experiments are more adequate, and both quantitative and qualitative analysis of the results exist.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1, Some of the parameters in the formula lack of explanation, such as the formula (2) in the M represents what, did not explain. 2, Is the reason the results for the metrics of this paper’s methodology are the same in Tables 3 and 4 because SAS was used by default in Table 3? If not shouldn’t the results be different. 3, How does the method of focusing regions proposed in the method of this paper compare with other methods after producing the results, is it mapping all the results to a single image? Please clarify it.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Have been listed above.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Its novelty.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a comprehensive investigation into the detection of anomalies within chest X-ray imagery, proposing a novel position-guided prompt learning approach within the CLIP framework (referred to as Position-Guided Prompt Learning, or PPAD). This approach endeavors to mitigate the disparity between CLIP’s pre-training data and the data specific to the task at hand. Rigorous experimental evaluations conducted across the ZhangLab, CheXpert, and VinDr-CXR datasets substantiate that the PPAD method realizes cutting-edge performance in the realm of anomaly detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The researchers have ingeniously crafted adjustable position-guided textual and visual prompts to fine-tune the pre-trained CLIP model to the task-specific requirements.
    2. The study introduces an innovative Structure-preserving Anomaly Synthesis (SAS) strategy that significantly improves the model’s proficiency in identifying anomalous instances during the training process.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. This manuscript does not include a comparative analysis with contemporary diffusion model-based approaches for anomaly detection, which is a significant oversight for establishing its comparative efficacy.
    2. The ablation studies delineated in this manuscript were limited to a singular dataset. This approach constrains the demonstration of the method’s robustness and effectiveness across a broader spectrum of datasets.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Overall, the findings presented within this paper appear to be largely reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors have expressly articulated their ambition to further the development of the PPAD method for anomaly localization and to improve the SAS so as to include a wider array of anomalies in their prospective research endeavors. To advance the academic rigor and depth of future work, it is imperative that the authors augment their discourse concerning several key dimensions:

    1. A more exhaustive exposition of the experimental framework is warranted, specifically detailing the types of location-oriented cues employed and elucidating the mechanisms by which said cues influence model efficacy.

    2. A thorough delineation of the SAS methodology’s algorithmic intricacies should be provided. This includes an in-depth account of how anomalies are synthesized using this technique, ensuring that the artificial anomalies exhibit pathological discrepancies without compromising the anatomical fidelity of the chest radiographs.

    3. It is advisable to broaden the scope of experimentation to gauge PPAD’s adeptness at tackling infrequent diseases and other intricate cases, thereby evaluating the model’s transferability and resilience across diverse medical settings.

    4. The discourse would benefit from an exploration of the influence of parameter selections highlighted in the study—for example, the quantity of training iterations and the calibration of thresholds—on model performance, coupled with an analysis of optimizing training efficacy vis-à-vis predictive accuracy under limited resources.

    5. In light of the commendable outcomes garnered by the PPAD method across various datasets, it is prudent for the authors to delve into research on cross-domain transfer learning, thereby probing the model’s adaptability to novel data sources.

    In closing, while the PPAD method, in conjunction with location-oriented cue learning and the SAS approach for anomaly synthesis, has exhibited substantive promise and applicability in the realm of chest X-ray anomaly detection, the consolidation and refinement of research in the delineated areas are essential to cultivate a more robust and convincing scientific contribution.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The research question posed in this paper is of considerable importance, and the methodology utilized is advanced. Nevertheless, there is a notable deficiency in the scope of empirical validation; specifically, the study lacks ablation experiments with diffusion model-based anomaly detection methods and a more extensive range of datasets to robustly demonstrate the effectiveness of the proposed approach.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces a novel position-guided prompt learning method (PPAD) for anomaly detection in chest X-rays, leveraging CLIP-based methods and introducing a unique structure-preserving anomaly synthesis (SAS) technique during training. This approach allows the model to focus on specific lung regions and enhances its ability to distinguish between normal and pathological images by generating more realistic synthetic anomalies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The integration of position-guided prompts into CLIP-based models for enhanced anomaly detection. This method reflects how radiologists view and assess X-rays, focusing on specific lung regions, thus tailoring the pre-trained model more effectively to task-specific data.
    2) The method was thoroughly evaluated on three datasets, showing superior performance over state-of-the-art methods. This robust evaluation demonstrates the practical applicability and effectiveness of the proposed technique.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The biggest innovation of this paper is to propose Position-guided prompt learning and apply it to frozen pre-trained CLIP-based model. I don’t think this thesis is innovative enough to support the research work of the paper. I have the following concerns: 1、The text prompt and image prompt proposed in the Position-guided Prompt are not reflected in the text, can you give a specific explanation? 2、The text simply applies the embedding concatenation operation to splice different embeddings, please explain how to deal with the noise brought by the concatenation operation? 3、Please explain specifically how the learnable prompts adapt the training data to the frozen CLIP-based pre-trained model?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The potential computational and implementation complexities should be discussed, providing guidelines or strategies for practical deployment in typical clinical workflows.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The recommendation is based on the approach of using position-guided prompts combined with a structure-preserving anomaly synthesis method, which shows potential for significant improvements in anomaly detection in chest X-rays. The method’s ability to outperform existing techniques and its alignment with clinical diagnostic processes are the primary factors influencing the positive evaluation of this paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely appreciate the reviewers for providing highly insightful comments. We greatly appreciate the feedback, as it will undoubtedly help us improve the quality of our paper. We would like to address several issues raised by the three reviewers. According to the “Rebuttal Guide,” new experimental results in the rebuttal are not allowed. We will show additional results for the suggested experiments when releasing the source code on GitHub.

Q1: Compared with diffusion-based methods and additional ablation study on another dataset. (R3) We appreciate the suggestion to compare our results with diffusion-based methods. We will conduct these comparisons and make the results available on GitHub. In addition, we have already performed ablation studies on another dataset, which yielded conclusions consistent with those in the paper.

Q2: Explanation of learnable prompts. (R3, R4) Prompt learning is a proven method for fine-tuning pretrained models. In our approach, we concatenate data embeddings with learnable prompts to create a new input, thus adapting task-specific data to the pretraining data.

Q3: How is the position-guided prompt reflected in the text prompt? (R4) It is reflected through the position prompt embedding (E_t^{pos}). Details are provided in sec. 2.1.

Q4: Noise due to concatenation operation. (R4) The use of learnable prompts and embedding concatenation is a common practice, as seen in methods such as CoOp [25] and MaPLe [10]. Additionally, concatenation of different modality embeddings is widely applied in large vision-language models like Flamingo, PaLM-E, and LLaVA, known as the fully autoregressive architecture. Based on these precedents, we think that embedding concatenation does not suffer significantly from noise issues.

Q5: What does the “M” in formula (2) represent? (R1) We have explained it in sec. 2.1 line 6 as “a binary mask.” A clearer explanation will be added near formula (2) in the final version.

Q6: Are the results the same in Tables 3 and 4 because SAS was used by default in Table 3? (R1) Yes, Table 3 only analyzes position-guided prompts with SAS as the default anomaly synthesis method.

Q7: How are the results from different positions aggregated during inference? (R1) The inference process is detailed in sec. 2.3. We will provide a clearer version in the revised paper.




Meta-Review

Meta-review not available, early accepted paper.



back to top