Abstract

In recent years pseudo label (PL) based semi-supervised (SS) methods have been proposed for disease localization in medical images for tasks with limited labeled data. However these models are not curated for chest x-rays containing anomalies of different shapes and sizes. As a result, existing methods suffer from biased attentiveness towards minor class and PL inconsistency. Soft labeling based methods filters out PLs with higher uncertainty but leads to loss of fine-grained features of minor articulates, resulting in sparse prediction. To address these challenges we propose AnoMed, an uncertainty aware SS framework with novel scale invariant bottleneck (SIB) and confidence guided pseudo-label optimizer (PLO). SIB leverages base feature (Fb) obtained from any encoder to capture multi-granular anatomical structures and underlying representations. On top of that, PLO refines hesitant PLs and guides them separately for unsupervised loss, reducing inconsistency. Our extensive experiments on cardiac datasets and out-of-distribution (OOD) fine-tuning demonstrate that AnoMed outperforms other state-of-the-art (SOTA) methods like Efficient Teacher and Mean Teacher with improvement of 4.9 and 5.9 in AP50:95 on VinDr-CXR data. Code for our architecture is available at https://github.com/aj-das-research/AnoMed.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3485_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3485_supp.pdf

Link to the Code Repository

https://github.com/aj-das-research/AnoMed

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Das_Confidenceguided_MICCAI2024,
        author = { Das, Abhijit and Gorade, Vandan and Kumar, Komal and Chakraborty, Snehashis and Mahapatra, Dwarikanath and Roy, Sudipta},
        title = { { Confidence-guided Semi-supervised Learning for Generalized Lesion Localization in X-ray Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces AnoMed, a semi-supervised framework that improves disease localization in chest x-rays by addressing the issues of biased attentiveness and pseudo-label inconsistency. AnoMed employs a scale-invariant bottleneck (SIB) for capturing detailed anatomical features and a confidence-guided pseudo-label optimizer (PLO) to refine uncertain labels. Tested extensively on cardiac datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. An unique layered encoding-decoding bottleneck, SIB is proposed.
    2. PLO refines the PLs and boosts confidence into mutual learning. Also integrated distribution alignment (DA) objective to stabilize training.
    3. AnoMed is evaluated on thin hairline fracture images besides CXRs.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Althoug the ablation studyies and extensive experiemnts from the paper shows promising results from the paper. The reviewer still though the general idea of semi-supervised learning play limited effect, especially from the clinical and medical point of view. For the reviewer, this paper is not significant enough fro MICCAI.
    2. No evlaution between the proposed method and the supervised learning method.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The author may consider toreqrite the method parts to make it more concise Any discussion to describe why their performance are better than other comparative methods?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clincial impact and the novelty of the methods

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This manuscript presents a confidence-guided semi-supervised learning method for lesion detection in X-ray images. The authors proposed a scale-invariant block(SIB) to extract scale-invariant features in the detector. Besides, the authors introduced a confidence-guided pseudo-label optimizer for all three tasks (ROI, object, class) in the Teacher-Student framework.

    The method achieved SOTA performance on 2 public datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors proposed the scale-invariant feature extraction module, which is suitable for the CXR vision problems.
    2. The authors introduced the hesitant pseudo label, which is interesting and effective for semi-supervised learning.
    3. The experiment is sound, and the analysis is detailed. The authors presented both qualitative and quantitative analyses to show the effectiveness of modules in the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Poor clarity and organization. There are too many abbreviations in the manuscript, which make it extremely hard to read.
    2. The authors overstated their contribution to “cardiac”-related problems. The proposed method was tested on a dataset with cardiac disease tasks but was not tailored for cardiac-related problems.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors proposed an interesting method for SSL in the detection or Xray images. However, the writing and organization are poor. Fulsome re-writing is highly recommended.

    minor: 1)t1 and t2 are reversed in the PLO section

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I would say boarderline rather than weak reject.

    The method is good, but the writing is poor

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The authors response addressed one question of mine. But I don’t know how to check if there final version is clear.



Review #3

  • Please describe the contribution of the paper

    This paper introduces AnoMed, a pioneering effort in pseudo-label based semi-supervised bounding box regression for cardiac disease detection. ​AnoMed features a novel scale-invariant bottleneck (SIB) and an uncertainty-guided optimizer (PLO) to reduce pseudo-label inconsistency. Additionally, the paper presents distribution alignment (DA) to enhance training stability. The proposed framework demonstrates superior performance on cardiac datasets and out-of-distribution fine-tuning, showcasing its advanced capabilities compared to state-of-the-art methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel Formulations The paper “AnoMed” introduces several innovative methods and formulations that contribute to its strength. These include the Scale-Invariant Bottleneck (SIB), which captures multi-level anatomical information and intrinsic relationships, enabling accurate detection of anomalies across different scales. Additionally, the Uncertainty-Guided Optimizer (PLO) refines pseudo-labels, guides them towards consistent learning, and reduces inconsistency, leading to boosted confidence scores.

    Strong Evaluation AnoMed employs several evaluation metrics such as Average Precision (AP) and its variations, including AP50 and AP50:95, which are standard metrics for bounding box detection. ​The results indicate that AnoMed outperforms existing methods by a considerable margin, excelling in learning large anomalies and fine-grained anatomical structures.

    Comprehensive Experimental Analysis The paper provides an in-depth experimental analysis, including component analysis of various components such as the Scale-Invariant Bottleneck (SIB), Uncertainty-Guided Optimizer (PLO), and Distribution Alignment (DA).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper could benefit from a more detailed discussion on the specifics of the datasets used, particularly in highlighting the shortcomings and strengths of the VinDr-CXR dataset in cardiac disease localization. Additionally, a thorough benchmarking against other relevant datasets and associated methodologies would have provided a holistic view of the proposed method’s performance and generalizability across different datasets. This absence undermines the comprehensive evaluation and validation of AnoMed in varied data settings, limiting the paper’s robustness and applicability beyond specific datasets.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The author is advised to include a discussion on the clinical credibility. Additionally, it is recommended that the author provide citations for other comparative methods during the comparative experiments. The author should also review the formatting of the references, as some are inconsistently formatted. Lastly, it would be beneficial for the author to test the network’s feasibility on other datasets in the future.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    AnoMed, the proposed framework, addresses the limitations of pseudo label based semi-supervised methods in the following ways:

    Scale-Invariant Bottleneck (SIB): ​AnoMed introduces a novel scale-invariant bottleneck (SIB) which leverages base features obtained from any encoder to capture multi-granular anatomical structures and underlying representations. This enables the framework to capture local to global features of small to large diseases, addressing the biased attentiveness issue. Confidence-Guided Pseudo-Label Optimizer (PLO): AnoMed incorporates a confidence-guided pseudo-label optimizer (PLO) which refines hesitant pseudo labels and guides them separately for unsupervised loss, reducing inconsistency. This effectively addresses the PL inconsistency issue and leads to improved accuracy in disease localization, particularly for minor anomalies. Extensive Experiments and Out-of-Distribution Fine-Tuning: AnoMed demonstrates its effectiveness through extensive experiments on cardia

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The author’s rebuttal has addressed some of my questions; however, the organization and structure of the paper still lack clarity.




Author Feedback

We thank all the reviewers for their valuable feedback. We appreciate the recognition of our novel contributions, including our unique approach to learning better scale-invariant representations and our innovative pseudo-label consistency regularization. We also thank the reviewers for acknowledging the value of incorporating distribution alignment during training for enhanced stability. We have addressed the concerns below. Reviewer#4  (1) Discussion on the clinical credibility: The clinical credibility of AnoMed is twofold. Firstly, our studies demonstrate that AnoMed, utilizing a semi-supervised framework, is remarkably efficient. With only 20% labeled training data, it achieves an AP50 of 76.8% (see Table 1), surpassing existing supervised methods trained with 100% labeled data (ViTb16 - AP50 72.4%, ResNet101 - AP50 58.2%, VGG16 - AP50 34.6%). This highlights AnoMed’s potential to reduce the cost of manual labeling in clinical settings. Secondly, the concept of learning scale-invariant features for anomaly detection is also applicable to multi-organ segmentation tasks. Furthermore, we envision AnoMed’s potential in generative models, where we aim to scale existing models with synthetic data and minimal manual annotation in future work. Reviewer#5  (1) Overstating ‘cardiac’ tailored framework: In our paper, we define the problem as learning scale-invariant features and refining hesitant pseudo-labels for smaller and minor target ROIs. To the best of our knowledge, chest X-rays are one of the most suitable benchmarks for this purpose. Although our proposed method generalizes well to out-of-distribution datasets (see Table 1), we plan to include additional datasets for OOD evaluation to enhance our assessment. We are particularly excited about the potential applicability of this approach to abdominal CT images, which involve both small tumours and large organs to segment or detect. We will tone down our claims to present AnoMed as a generalized method for scale-invariant learning in both semi-supervised and supervised tasks, with a focus on pseudo-label optimization and refinement. (3) Typo (t1 and t2 are reversed in the PLO section) and clarity of paper: Thank you for pointing out the typo in the PLO section, where t1 and t2 are reversed. We will correct this in the revised manuscript. Additionally, as per your suggestion, we will ensure greater clarity throughout the revised manuscript. Reviewer#6  (1) Evaluation between supervised and proposed semi-supervised method: We have compared AnoMed with supervised methods in Table 1, where shaded rows represent supervised methods. Evaluations cover all three categories: 2-stage anchor-based (Faster RCNN), 1-stage anchor-free (DETR), and 1-stage anchor-based (supervised counterpart of AnoMed). Due to space constraints in the MICCAI format, we only included the best-performing supervised method in Table 1. For the final manuscript, we will update with at least two baseline comparisons for each category in supervised training results. For reference, Fast-RCNN (2-stage anchor-based), YOLO-v8 (1-stage anchor-free), and RetinaNet (1-stage anchor-based) achieved AP50 scores of 40.6%, 45.2%, and 56.1% and AP50:95 scores of 26.4%, 30.2%, and 40.5%, respectively. With only 20% labeled training data, AnoMed achieves an AP50 of 76.8%, surpassing existing supervised methods trained with 100% labeled data (ViTb16 - AP50 72.4%, ResNet101 - AP50 58.2%, VGG16 - AP50 34.6%). Therefore, we have focused more on semi-supervised frameworks to ensure a fair evaluation, countering the claim that our method does not meet MICCAI’s standards. (2) Effect of semi-supervised method: Table 1 clearly shows that the semi-supervised method outperforms supervised methods by significant margins. We have detailed the impact of AnoMed in both supervised (AP50 61.2%) and semi-supervised (AP50 76.2%) settings, with comprehensive quantitative and qualitative comparisons (see Table 2 and Figures 1-5).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposes a pseudo-label generator (SS-FPLGen) for semantic segmentation, which uses a student-teacher framework to generate more reliable pseudo-labels. The reviewers list the interesting idea and comprehensive evaluation as strengths. However, the reviewers express concerns about the insignificant improvement and the overall quality of the paper. The reviewers agree to reject this paper after the rebuttal, despite the authors addressing some of the concerns. The paper cannot be accepted in its current format.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper proposes a pseudo-label generator (SS-FPLGen) for semantic segmentation, which uses a student-teacher framework to generate more reliable pseudo-labels. The reviewers list the interesting idea and comprehensive evaluation as strengths. However, the reviewers express concerns about the insignificant improvement and the overall quality of the paper. The reviewers agree to reject this paper after the rebuttal, despite the authors addressing some of the concerns. The paper cannot be accepted in its current format.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Although the authors have addressed some issues during rebuttal, the reviewers have major concerns on the writing of the current version of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Although the authors have addressed some issues during rebuttal, the reviewers have major concerns on the writing of the current version of the paper.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper received initially mixed reviews and the criticism relates to the issues with the presentation and incremental improvements. This meta reviewer argues that the paper still makes a valuable and contribution despite its limitations. In particular, the reviewers and ACs highlighted the novel formulation and stron evaluation. The authors should improve the writing with respect to clarity as requested in the reviews and AC comments. While the results suggest the method is not greatly outperforming other approaches, the novel formulation could spark new ideas for further research.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper received initially mixed reviews and the criticism relates to the issues with the presentation and incremental improvements. This meta reviewer argues that the paper still makes a valuable and contribution despite its limitations. In particular, the reviewers and ACs highlighted the novel formulation and stron evaluation. The authors should improve the writing with respect to clarity as requested in the reviews and AC comments. While the results suggest the method is not greatly outperforming other approaches, the novel formulation could spark new ideas for further research.



back to top