Abstract

In this paper, we propose a Test-time Self-guided Bounding-box Propagation (TSBP) method, leveraging Earth Mover’s Distance (EMD) to enhance object detection in histology images. TSBP utilizes bounding boxes with high confidence to influence those with low confidence, leveraging visual similarities between them. This propagation mechanism enables bounding boxes to be selected in a controllable, explainable, and robust manner, which surpasses the effectiveness of using simple thresholds and uncertainty calibration methods. Importantly, TSBP does not necessitate additional labeled samples for model training or parameter estimation, unlike calibration methods. We conduct experiments on gland detection and cell detection tasks in histology images. The results show that our proposed TSBP significantly improves detection outcomes when working in conjunction with state-of-the-art deep learning-based detection networks. Compared to other methods such as uncertainty calibration, TSBP yields more robust and accurate object detection predictions while using no additional labeled samples.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1242_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1242_supp.pdf

Link to the Code Repository

https://github.com/jwhgdeu/TSBP

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yan_TSBP_MICCAI2024,
        author = { Yang, Tingting and Xiao, Liang and Zhang, Yizhe},
        title = { { TSBP: Improving Object Detection in Histology Images via Test-time Self-guided Bounding-box Propagation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this work, the authors proposed a new method to improve the object detection results in histology images. The aim of this work is interesting; such interest could be considered even more generic rather than clinically significant only. However, the manuscript presentation should be revised adequately, particularly the description of the proposed method in Section 3, whose details and motivation of some constraints are missing. I have also found the experimental results in Section 4 not convincing enough. Indeed, a single performance metric was used, and no statistically significant analysis was performed. Furthermore, the proposed method has been compared with rather old methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The topic fits the conference aims and the objective is even relevant for other topics
    • The manuscript organisation is adequate and favours readability
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The proposed method could be considered novel, but given that its presentation is not completely clear and that the evaluation is not adequate to assess the quality of such a method, its contribution to society could not be considered adequate
    • Even though the presentation is mostly adequate, some sections, particularly the most important ones, Sections 2 and 3, do not accurately describe the existing methods and the proposed one, respectively
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    0) The abstract mainly summarises the proposed method, while it should motivate the need for such a method, highlighting the cons of existing ones and how such a method is able to address them.

    1) Section 1 correctly presents the issues present in the addressed task, but many sentences should be reinforced by adequate references.

    2) Section 2 presents few existing methods, most of them not so recent, including the arxiv one. The latter should be limited both numerically and to recent articles

    3) Section 3 describes the proposed method, but even if it appears very simple, its description lacks clarity, which does not allow it to be reproduced. Moreover, some motivations behind some choices and constraints are missing. In particular:

    • the proposed method is still based on a confidence threshold to find the HF b-boxes, how is it selected?
    • why is the K value for the K-means set to 25? It is not clear what is the contribution of K-means
    • why constrain the matching of confirmed b-boxes to a single candidate b-box?
    • it is said that distance constraints are relaxed after the first round to allow more b-boxes to be added to the confirmed sets, but it is not said how
    • the termination condition of the b-boxes confirmation procedure occurs when the list of candidate b-boxes is empty; thus the list of confirmed b-boxes is full. Where is the selection of b-boxes? is something missing?

    4) In Section 4, the experiments are incomplete and do not show any clear evidence of some advantage of the proposed method over other methods. Reported accuracy improvements over other (very old) methods appear very small, and anyway, no statistical significance analysis has been reported

    • comparisons are made with old approaches, and the improvements are not so evident in particular if compare to [15], which is dated 1999
    • Fig 3 should also present the baseline, perhaps with different confidence thresholds, in order to understand the improvements. Indeed, there are many missing B-boxes, but it is not clear if they are not even detected with low confidence by the baseline or if the method is not able to recover them. Furthermore, different arrows should also be included, to show the FN b-boxes with respect to the GT or the FP b-boxes with respect to the GT. Indeed, the proposed method produces several FN (mostly in the right part of the bottom image) and some FP
    • given the just-mentioned issues, further evaluation metrics should be used
    • Table 2 is quite confusing, and BC is not included. In particular, such evaluation presents equal threshold values for the baseline and for s^c1, while previously, you mentioned that the DiffusionDet model was used without applying the threshold. Moreover, s^c2 has not been mentioned or evaluated.
    • for the issue mentioned before related to k-means the results reported in Table 3 are not clear
    • Are the results of Table 4 saying that the design of the proposed method is wrong? Why is f-score not used in this case? more comments are needed

    5) Conclusions are limited and do not contain an accurate summary of the findings and the lessons learned. Even the discussion on future directions is limited, and no discussion on failure cases or limitations of the proposed method is present.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My main criticism is that even if the method appears interesting and very simple, its description lacks clarity and the authors failed to highlight its quality and demonstrate the performance on the addressed task.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal, I do not think I had clearer details of the methodology; many things that were not clear before are still not clear, and the changes the authors plan to make do not seem to me to be sufficient to improve the overall quality of the article.



Review #2

  • Please describe the contribution of the paper

    The paper introduces Test-time Self-guided Bounding-box Propagation to enhance object detection in histology images by using high-confidence detections to improve low-confidence ones, based on the Earth Mover’s Distance. The method is evaluated on the GlaS and MoNuSeg dataset, which shows improved performance over traditional confidence calibration methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. An interesting test-time optimization approach for dynamically calibrating confidence score for each bounding-box prediction, to enhance object detection in histology images.
    2. No additional parameters.
    3. State-of-the-art performance over past confidence calibration methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method tends to be a general framework for enhancing object detection performance. What specific effect does it have on pathology images?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper will be enhanced by including a comparison with state-of-the-art test-time adaptation of object detection and a detailed explanation of the connection between the method and pathological images.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see the weekness.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks for the rebuttal from authors. My concerns are addressed. The authors are encouraged to include test-time adaption methods in experiments, and I agree with R1 that the descriptions of paper lacks clarity.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a Test-time Self-guided Bounding Box Propagation method, utilizing Earth Mover’s Distance to enhance object detection in histology images. Theoretical and experimental approaches were detailed, with results demonstrating the effectiveness of the proposed method compared to current state-of-the-art techniques.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is thoroughly structured and detailed. The authors tackle the challenging task of histological nuclei segmentation with their novel approach, TSBP. The theoretical and experimental setups are well-described, facilitating reproducibility within the community. Evaluation of their method on two widely recognized public datasets, ‘MONUSEG’ and ‘GlaS’, enables researchers to conduct a sanity check before applying the method to their own datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The primary weakness of the paper is the absence of an explanation regarding the choice of the K-means models and how they are suitable for addressing the problem. Although the results demonstrate improvement, they are not included in either the abstract, introduction, or conclusion.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper is reproducible upon release of the source code by the authors.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The paper demonstrates a clear structure, and the proposed approach is articulated effectively.
    2. My primary concern pertains to the absence of justification for employing the K-means model in their experiment and its relevance to the problem at hand.
    3. The authors should incorporate a percentage of improvement in either the abstract, introduction, or conclusion. 4- A future perspective could involve considering circles instead of boxes for precisely segmenting the nucleus.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty and clarity of the paper were the key factors that led me to accept it. Authors took into account all the key ellement to make their proposal and paper easy to read and follow.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely thank all the reviewers for their insightful comments and suggestions. Code will be publicly released when the paper is published.

To R1: On the initial condition of TSBP: TSBP, as an iterative algorithm, requires some initial condition, such as the initial set of high-confidence bounding-boxes (b-boxes), pre-determined. A relatively high confidence threshold is needed to form the initial HC (High Confident) b-boxes. We use 0.7 as the default threshold and found that it works well on the GlaS and MoNuSeg datasets. We also tested the case when it is set to 0.6 (see Table 2). In both cases, TSBP yields better results than the competing methods (with the same thresholds).

For more efficient and effective matching performance, we use K-means on top of the initial HC b-boxes to select the representative b-boxes. By default, K is set to 25, and the performance of other values of K is reported in Table 3. Future work will focus on developing a more automatic and robust procedure for forming the initial condition of TSBP (will mention this in Conclusions).

On the one-to-one matching: One-to-one correspondence is enforced so that the most reliable (most likely correct) b-box would be chosen at each round of matching, ensuring the reliability of the multi-rounds propagation.

On Table 4: The error rates presented in Table 4 show that the b-boxes admitted in the first stage (earlier rounds of matching) exhibit a lower error rate than those admitted in the second stage (later rounds of matching). This table is for showing the dynamics of the propagation, giving insights on how TSBP works. Note that candidate b-boxes are uncertain predictions by the detection model, and mostly hard cases. TSBP cannot resolve all the hard cases; only some of the cases (which would be mis-classified by the baseline) would be rectified by TSBP. We’ll add F-scores to Table 4.

[15] is too old: We actually use an improved version of the method proposed in [15], which was detailed in a paper in 2020, with the code hosted on GitHub as listed at the bottom of page 6. We’ll include the reference to this 2020 paper for the method in our revision.

We’ll revise Fig. 3 by including the baseline results at different threshold levels. In Table 2, we’ll include the results of BC.

Following your comments, we’ll further improve the writing and presentation in the revision by:

  1. Including a short description of the motivation in the Abstract.
  2. Including additional references in Sections 1 and 2.
  3. Providing more details in Section 3, for example, on how to release the distance constraint during the second stage.
  4. Making the tables easier to read.

To R3: Specific effect on pathology images: We appreciate R3 for asking this question! Pathology images often have objects (possibly of the same type) repeatedly occur (e.g., cell nuclei), and our test-time bounding-box propagation suits this property well, as TSBP encourages that objects influence with one another in producing the detection results. TSBP is more effective when the images are large (e.g., whole slide) and contain multiple object instances for each class type.

Comparing to test-time adaption methods: Methodology-wise, comparing to test-time adaption methods (e.g., Tent, TestFit), our method requires no additional model training, and runs very efficiently. Experimental comparison on the detection performance will be carried out in future work. Combining TSBP and TTA methods could also be a promising direction.

To R4: K-means is for selecting the representative High Confident bounding-boxes to initiate the EMD matching. It is useful for improving the efficiency as well as the accuracy of the matching procedure. It is also very fast to compute and easy to use in practice. We’ll include more descriptions regarding the K-means in the revision. We’ll mention percentages of improvement in the Abstract. We agree that utilizing circles and features in polar space would be a very promising direction.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The decision was split: Rejection (R), Weak Accept (WA), and Accept (A). All reviewers agreed on the novelty and interesting method; specifically, the idea of leveraging high-confidence bounding boxes to refine low-confidence ones by exploiting their visual similarities during test time for object detection in pathology. The main concern is the clarity of the paper. The rebuttal addressed some of the issues, but Reviewer 1 considers it insufficient. The meta-reviewer recommends accepting this paper. In the final version, the authors should clarify the points requested by Reviewer 1.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The decision was split: Rejection (R), Weak Accept (WA), and Accept (A). All reviewers agreed on the novelty and interesting method; specifically, the idea of leveraging high-confidence bounding boxes to refine low-confidence ones by exploiting their visual similarities during test time for object detection in pathology. The main concern is the clarity of the paper. The rebuttal addressed some of the issues, but Reviewer 1 considers it insufficient. The meta-reviewer recommends accepting this paper. In the final version, the authors should clarify the points requested by Reviewer 1.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The final version should include all the technical clarifications requested by R1.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The final version should include all the technical clarifications requested by R1.



back to top