Abstract

The detection of tumor budding on histopathological images provides vital information for treatment planning and prognosis prediction. As manual identification of tumor budding is labor-intensive, automated tumor budding detection is desired. However, tumor budding, involving clusters of tumor cells resembling other cell clusters, challenges existing detection methods in distinguishing it from other cells. Additionally, the lack of public datasets for tumor budding detection hinders further development of accurate tumor budding detection methods. To address these challenges, to the best of our knowledge, we introduce the first publicly available benchmark dataset for tumor budding detection. The dataset consists of 410 images with H&E staining and the corresponding bounding box annotations of 3,968 cases of tumor budding made by experts. Moreover, based on this dataset, we propose a designated approach Tumor Budding Detection Network (TBDNet) for tumor budding detection with improved detection performance. On top of standard objection detection backbones, we develop two major components in TBDNet, Iteratively Distilled Annotation Relocation (IDAR) and Rotational Feature Decoupling And Recoupling (RFDAR). First, we introduce the IDAR module to standardize annotations, addressing the inconsistency caused by varying expert standards during model training. IDAR relocates the annotations via iterative model distillation so that the relocated annotations are consistent for training the detection model. Second, to reduce the interference from cells with similar features, i.e., negative samples, to tumor budding, i.e., positive samples, we develop the RFDAR module. RFDAR enhances feature extraction via positive-negative feature coupling regularized by prior feature distributions, so that it is better capable of distinguishing tumor budding. The results on the benchmark show that our approach outperforms state-of-the-art detection methods by a noticeable margin.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2609_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/J-F-AN/TumorBuddingDetection

Link to the Dataset(s)

N/A

BibTex

@InProceedings{SunRui_Towards_MICCAI2025,
        author = { Sun, Rui-Qing and Fan, Zeng and Dai, Boyang and Su, Yiyan and Hao, Qun and Ye, Chuyang and Zhang, Shaohui},
        title = { { Towards Accurate Tumor Budding Detection: A Benchmark Dataset and A Detection Approach Based on Implicit Annotation Standardization and Positive-Negative Feature Coupling } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {671 -- 681}
}


Reviews

Review #1

  • Please describe the contribution of the paper
    1. Release the first publicly available dataset for tumor budding detection
    2. Designed a IDAR and RFDAR module to better detect small tumor budding
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Curated and planed to release a histopathological image dataset for tumor budding detection
    2. Proposed a new method and benchmark to the existing methods with a fair comparison
    3. Conducted ablation studies to demonstrate the effectiveness of each module
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The method details are little bit confusing and not clear demonstrated. It seems that there are multiple stages of training, 1) IDAR to train the detection backbone, 2) then RFDAR for feature decoupling and coupling (frozen later), 3) afterwards the detection head and contrastive head.
    2. The overall architecture seems to be very sophisticated, which would result in high computational cost. It would be better to compare the parameters and FLOPs to the existing methods
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Clarify the method details as mentioned in the weaknesses part.
    2. Please clarify how is the teacher and student model used in the framework. For the results table 2, the results from 2 iterations outperformed that from 1 iteration, does this means that the student model is better than the teacher model as the backbone feature extractor in Fig. 1?
    3. Another interesting result is that both REC loss and KL loss decrease the model performance along, it would be better to discuss the possible reason behind this.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This manuscript porposed a new method for tumor budding detection and would release a new dataset for public benchmark. The overall paper is well written and the evaluation is fair and relative comprehensive.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper makes two key contributions to tumor budding detection in histopathological images. First, it introduces TBDD, the first publicly available benchmark dataset containing 410 H&E-stained images with 3,968 expert-annotated tumor budding cases. Second, it proposes TBDNet, a novel detection framework that incorporates two components: Iteratively Distilled Annotation Relocation (IDAR) to standardize inconsistent annotations through iterative model distillation, and Rotational Feature Decoupling and Recoupling (RFDAR) to enhance feature discrimination between tumor budding and visually similar non-budding cells via positive-negative feature coupling. These innovations significantly improve detection performance over existing methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper introduces the first publicly available dataset for tumor budding detection, consisting of 410 H&E-stained images with expert-annotated bounding boxes, which fills an important gap in computational pathology.
    2. It proposes a novel annotation refinement method, IDAR, which uses iterative teacher-student distillation to standardize inconsistent expert annotations, improving training consistency.
    3. The RFDAR module further enhances detection by decoupling and recoupling features of tumor budding and similar-looking cells, improving discriminative learning through contrastive loss and prior-based regularization.
    4. The results seems promosing.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. How many pathologists were involved in the annotation process? Did they cross-review annotations for consistency? How was annotation accuracy and reliability ensured?
    2. The dataset consists of 410 image patches instead of whole-slide images (WSIs). Do the patches represent one per patient, or a subset of larger WSI data? Were additional images available?
    3. Why was YOLOv5 chosen over newer YOLO variants like YOLOv11 for both the baseline and proposed method?
    4. The IDAR module uses two iterations. Was this choice based on observed performance, and was a sensitivity analysis performed?
    5. In Table 2, using only L_rec or L_KL leads to lower performance. Why do these components degrade performance when isolated?
    6. The teacher and student models differ in the confidence threshold for generating pseudo-labels. How was this threshold chosen, and was a sensitivity analysis performed?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    no

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    contribution of the dataset and the experiment result seems promising.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces the first Tumor Budding Detection Dataset (TBDD) designed for tumor budding detection. The authors propose the Tumor Budding Detection Network (TBD-Net) to enhance detection performance. TBD-Net comprises two key components: (1) Iteratively Distilled Annotation Relocation (IDAR), which iteratively refines bounding box annotations to address inconsistencies in budding boundaries among different experts, and (2) Rotational Feature Decoupling and Recoupling (RFDAR), which decomposes and reassembles the distinctive and shared features of positive and negative samples to improve performance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors release a dataset to the community to support related research.
    2. The proposed TBD-Net outperforms existing detection methods, achieving more true positive detections while reducing false positives in tumor budding detection.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. In Section 2.2, it is stated that the student model serves as the new teacher model. However, it is unclear how the roles of teacher and student are exchanged. Specifically, when the student model becomes the new teacher, what is the new student model in the next iteration?
    2. On page 5, the author does not specify the criteria for selecting negative non-tumor-budding cells. Are all normal cells included, potentially causing data imbalance, or is a subset chosen? Providing clarification on this selection process would be helpful.
    3. In Figure 2, a red rectangular box is not detected by any of the methods. Can the author analyze the potential causes of this issue?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (6) Strong Accept — must be accepted due to excellence

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors influencing the overall score are the release of a new dataset and the improved performance of the proposed network.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Reviewer 1

  1. Clarify whether multi-stage training is performed. We do employ a multi-stage training approach, and different background colors are used in Figure 1 to highlight distinct modules. We will better clarify this.

  2. The overall architecture seems sophisticated, which would result in high computational cost. We would like to clarify that the additional modules we introduce are used solely during training. Therefore, during inference, the speed and parameters remain the same as YOLOv5, and there is no increase in computational cost. This will be clarified.

Reviewer 2

  1. How many pathologists were involved in annotation? Did they cross-review annotations? Three experienced pathologists were involved. They collaboratively annotated the tumor budding regions based on international standards defined by the International Tumor Budding Consensus Conference. To ensure consistency and reliability, the annotations were cross-reviewed. This will be clarified.

  2. Does each patch represent one patient? Were additional images available? Each WSI is associated with a single patient. From each WSI, an experienced pathologist selected one representative region and extracted a patch for annotation and analysis. We plan to also make all original WSIs publicly available.

  3. Why was YOLOv5 chosen over newer variants like YOLOv11? We have observed that YOLOv5 is no worse than more recent variants. For example, see the comparison between YOLOv5 and YOLOv11 in Table 2.

  4. IDAR uses two iterations. Was this choice based on observed performance, and was a sensitivity analysis performed? The decision was based on sensitivity analysis on the validation set. Further increasing the iterations beyond two did not improve performance, where the F1-score decreased from 60.5 to 59.5 after the third iteration. 

  5. In Table 2, using only L_rec or L_KL leads to lower performance. Why do these components degrade performance when isolated? L_rec emphasizes the model’s ability to identify positive cases (higher recall), while L_KL improves prediction precision (higher precision). Using either loss alone is insufficient to balance the sensitivity and specificity.  
  6. The teacher and student models differ in the confidence threshold for generating pseudo-labels. How was this threshold chosen, and was a sensitivity analysis performed? The teacher model provides soft labels for the detected buds, aiming to achieve a recall rate close to 1. Hence, we have chosen a low threshold of 0.05. We will include more sensitivity analysis in the journal version.

Reviewer 3

  1. When the student model becomes the new teacher, what is the new student model in the next iteration? As described in the second paragraph of Section 2.2, we first train a teacher model with the manually annotated data. The bounding boxes generated by the teacher model are then used as soft labels to train the student model. In the next iteration, the student model serves as the new teacher model to train a new student model. We will better clarify this.

  2. Specify the criteria for selecting negative non-tumor-budding cells. Are all normal cells included, potentially causing data imbalance, or is a subset chosen? Instances detected by YOLOv5 that do not belong to budding are negative samples, which can include both normal and other non-budding abnormal cells. Regarding data imbalance, YOLOv5 does not detect all normal cells but only a small subset of false positive ones above the confidence threshold. Thus, data imbalance is not obvious here, where 5616 positive and 7628 negative samples are used.

  3. In Figure 2, why is a red rectangular box not detected by any of the methods? This tumor budding is a challenging case. It appears relatively large, shows limited contrast with surrounding tissue cells, and lacks the typical isolated or small cluster arrangement seen in other tumor budding regions.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    All reviewers recommend acceptance. I hope the authors will incorporate the reviewers’ suggestions in their camery ready version. Congratulations!



back to top