Abstract

Anterior Segment Optical Coherence Tomography (AS-OCT) is an emerging imaging technique with great potential for diagnosing anterior uveitis, a vision-threatening condition. This condition is characterized by the presence of inflammatory cells in the eye’s anterior chamber (AC). Automatic detection of these cells on AS-OCT images has attracted great attention. However, this task is challenging since each cell is minuscule (extremely small), representing less than 0.005% of the high-resolution image. Moreover, pixel-level noise introduced by OCT can be misclassified as cells, leading to false positive detections. These challenges make both traditional image processing algorithms and state-of-the-art (SOTA) deep learning object detection methods ineffective for this task. To that end, we propose a minuscule cell detection framework that progressively refines the field-of-view from the whole image to the AC region, and further to minuscule regions potentially containing individual cells. Our framework consists of: (1) a Field-of-Focus module that uses a vision foundation model to zero-shot segment the AC region, and (2) a Fine-grained Object Detection module that introduces Minuscule Region Proposal followed by our Cell Mamba to distinguish individual cells from noise. Experimental results demonstrate that our framework outperforms SOTA methods, improving F1 by around 7% over the best baseline and offering a more reliable alternative for cell detection. Our code is available at: https://github.com/joeybyc/MCD.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2758_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/joeybyc/MCD

Link to the Dataset(s)

N/A

BibTex

@InProceedings{CheBoy_Minuscule_MICCAI2025,
        author = { Chen, Boyu and Solebo, Ameenat and Shi, Daqian and Wu, Jinge and Taylor, Paul},
        title = { { Minuscule Cell Detection in AS-OCT Images with Progressive Field-of-View Focusing } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {369 -- 379}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces the Minuscule Cell Detection (MCD) framework, which employs a progressive field-of-view focusing strategy for accurately detecting minuscule cells in AS-OCT images. The framework consists of a Field-of-Focus module for zero-shot segmentation and a Fine-grained Object Detection module to differentiate cells from noise.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method proposed in the article achieved good performance on the cell counts and cell detection.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Choice of Hyperparameters: The method described in the paper involves numerous hyperparameters at various stages, such as the threshold for area ratio, adjustment factors, size constraints, and imbalanced sampling ratios. While I believe these parameters were chosen based on thorough experimentation, the paper does not discuss the impact of varying these hyperparameters. This raises concerns about the robustness of the method when applied to other datasets.
    2. The MCD framework sets S_min=1 to include all potential cell candidates, whereas threshold-based methods remove candidates with S_min=1. I find this setting puzzling and would appreciate clarification on the rationale behind this choice.
    3. The paper does not include comparisons with more recent small object detection methods, such as YOLOv8. All the comparison methods mentioned in the paper were proposed before 2021.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Spelling Error: ‘RetinaNet’ rather than ‘RetianNet’.
    2. In Table 2, it is unclear why the MAE^c of the other methods decreases significantly compared to MAE^all, while the values for Otsu and Isodata remain relatively unchanged. Please provide an explanation for this observation.
    3. How would lowering the confidence threshold for the DL-based detection affect the experimental results?
    4. Is it feasible to convert the bounding boxes of cells into masks and then directly apply DL-based segmentation methods?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My main concern with the method presented in this paper is the excessive reliance on artificial hyperparameters. While conducting thorough ablation studies within limited space may not be feasible, it is essential to demonstrate the impact of varying some hyperparameters.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Given the excellent performance, I think the paper is worth publishing to advance the field. Nonetheless, I would like the authors to include comparisons with recent work in the final version.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a minuscule cell detection framework that progressively refines the field-of-view from the whole image to the target region, and further to regions potentially containing individual cells.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel problem formulation that directly addresses a significant clinical challenge. 2. The paper is well-organized.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The baseline methods are too old. Lack comparison with the latest detection approaches like DETR, Deformable-DETR.
    2. Which visual foundation model is used in this work? What does the symbol ‘†’ mean in Tab.2 and Tab.3?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend a weak accept for this paper. The work presents a novel approach with potential value to the field, and the methodology appears technically sound. The experimental results demonstrate moderate improvements over existing methods, though not groundbreaking advances.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel methodological framework, Minuscule Cell Detection (MCD), for automatically detecting extremely small inflammatory cells (<0.005% of image area) in Anterior Segment Optical Coherence Tomography (AS-OCT) images for anterior uveitis assessment. The core contribution is a progressive field-of-view focusing strategy implemented via a two-module pipeline: (1) A Field-of-Focus (FoF) module that uses a vision foundation model (SAM) with automatically generated prompts (via I2ACP algorithm) for zero-shot segmentation of the anterior chamber (AC), eliminating the need for manual delineation or segmentation training data. (2) A Fine-grained Object Detection (FOD) module comprising a Minuscule Region Proposal (MiRP) component (using adjusted Otsu thresholding) to identify potential cell locations, followed by a Cell Mamba (using VSS blocks) classifier specifically designed to learn fine-grained features and distinguish minuscule cells from noise within the proposed regions. The framework is shown to outperform traditional thresholding and standard deep learning object detection baselines on this challenging task.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty of Approach: The core idea of a progressive field-of-view focusing strategy, moving from whole image to AC region to minuscule candidate boxes, is a novel and well-suited approach for detecting extremely small objects like inflammatory cells in high-resolution medical images.   Effective AC Segmentation: The FoF module leverages a vision foundation model (SAM) for zero-shot segmentation of the AC region, guided by automatically generated anatomical prompts (I2ACP). This removes the need for laborious manual annotation or dedicated segmentation model training, representing an efficient and novel use of foundation models in this specific clinical context.   Tailored Fine-grained Detection: The FOD module is specifically designed for minuscule objects. MiRP aims to capture low-intensity cells potentially missed by standard thresholding, while Cell Mamba focuses on learning fine-grained discriminative features within tiny proposed regions to effectively separate cells from noise, addressing limitations of standard detectors.   Clinical Relevance & Critique: The work addresses a clinically relevant problem (quantifying anterior uveitis) and importantly critiques the potential inaccuracies of previous studies relying solely on unverified thresholding methods, highlighting the need for more accurate detection for future clinical research.  

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Lack of Ablation/Justification for Key Parameters: Several crucial parameters are set “empirically” without sufficient justification or ablation studies presented in the text. This includes: The specific offsets (e.g., (0,±0.1×W)) used in the I2ACP algorithm for prompt generation. It’s unclear why these specific values were chosen or how sensitive the FoF module is to these offsets.   The threshold adjustment factor λ=0.83 in the MiRP module. No analysis is provided on how this value was determined or how varying it impacts the trade-off between capturing more cells and introducing more noise for the subsequent classifier.   Lack of Comparative Analysis for Architecture Choice: The Cell Mamba module uses VSS blocks. While justified as capturing fine-grained details, the paper does not compare this architectural choice against other potentially suitable architectures (e.g., specialized CNNs, Transformers) for the specific task of classifying minuscule image patches. It’s unclear if VSS/Mamba offers a significant advantage over alternatives in this context.   Potential Sensitivity: While addressing noise is a goal, the method’s sensitivity to varying image quality and artifacts could be explored further. Additionally, the performance reliance on the empirically set parameters (offsets, lambda) warrants investigation.  

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The authors are encouraged to provide more justification and/or ablation studies for the empirically set parameters (I2ACP offsets, MiRP lambda factor) to demonstrate robustness and optimality of these choices. Furthermore, a brief discussion or comparison justifying the choice of VSS/Mamba architecture for Cell Mamba over potential alternatives would strengthen the methodological contribution. Addressing these points would significantly improve the paper’s rigor.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel and well-motivated progressive focusing framework (MCD) for the challenging and clinically relevant task of detecting minuscule cells in AS-OCT images. The zero-shot AC segmentation and the dedicated fine-grained detection module demonstrate strong empirical results, outperforming existing methods. However, the justification for key design choices, particularly the empirically set parameters in I2ACP and MiRP, and the selection of the VSS/Mamba architecture in Cell Mamba, lacks rigorous validation through ablation studies or comparative analysis within the paper. While the core idea is strong and the results promising, these methodological gaps need clarification.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely thank all reviewers (Ri) for their valuable feedback. We thank R2&4 for recognizing the novelty of our method MCD, R2 for highlighting the significance of the clinical challenge, and R3 for acknowledging our MCD’s performance. We address their concerns below.

R2&3: Lack comparison with recent methods We tested DETR, Deformable-DETR, DINO, and YOLOv11 but excluded them since: 1) Our goal is to directly detect <0.005% area cells in full-resolution (∼1500×1500) AS-OCT images. Unlike thresholding and our MCD, DL baselines (including recent ones) are unsuitable for this task, as they fail under this setting and only work after cropping into 300×300 patches (Sec 3.2). 2) On cropped patches, the recent methods underperform older baselines, with F1_{point} of 53.1%, 50.0%, 49.2%, and 54.1%, possibly due to CNN outperforming Transformer when training data are limited (see “How do vision transformers work? (N Park, 2022)”). As anterior uveitis studies mainly use thresholding (Sec 1), we focus on challenging them. For conciseness, we report only DL baselines with F1 > 55%, as we think adding more of them may not provide additional insight. However, if needed, we can add all results in Tables 2&3.

R3&4: Key parameter choices We omitted details of the hyperparameter tuning, which lead reviewers to assume that the selection is artificial. In fact, key parameters are adapted using the validation set and can be adapted when applied to other datasets: I2ACP uses grid search over offset [0–0.5] and area ratio R [0.4–0.95] in 0.05 steps. Offsets [0.05–0.15] and R [0.6–0.7] yield stable results and the highest IoU. Points from these ranges are used at test time, enabling SAM to segment AC without training on new data. For MiRP, λ is searched from 0.70 to 1.00 in 0.01 steps. As λ grows, MCD’s precision rises and recall declines. Interestingly, F1 first rises, reaches a peak, and then declines. The λ=0.83 comes from this F1 peak on validation set. Other DL training settings follow widely used community practices. We can concisely add these details in Sec 3.2 to enhance clarity.

R2:

  1. Vision foundation model We use SAM (Sec 2).

  2. Symbol † As noted in Tables 2&3, † indicates DL methods only detect cells on 300×300 patches, not full images.

R3:

  1. No S_min=1 in thresholding We follow prior clinical studies to set thresholding (removing S_min = 1) [7, 15], just as we set S_max from those works. We also test S_min = 1 for thresholding and observe both precision and F1 drop.

  2. MAE^all vs MAE^c DL detectors tend to make more predictions (Fig. 1b), find more cells and decrease MAE^c. However, many predictions are incorrect, inflating MAE^all. Thresholding gives fewer, more accurate predictions, keeping their MAE^all and MAE^c close.

  3. Lower DL detectors’ confidence This increases recall but reduces both precision and F1. Thresholding methods don’t have confidence scores, and we compare DL detectors at their best F1 for fairness.

  4. Convert to mask Some cells are only 2–4 pixels, making mask labeling impractical.

R4:

  1. VSS choice We believe MCD’s performance mainly comes from our design of using MiRP to find candidate cells and a classifier to filter them. The classifier is flexible: besides our Cell Mamba, we also tested ANN, CNN, and ViT. As ours outperformed the others in F1 by 1~2% and VSS is a recent strong CV architecture, we report only our final model for brevity. If needed, we can add all results in Tables 2&3 .

  2. Potential sensitivity Low-quality images are often clinically unusable [18] and thus are excluded from clinical study and our research.

Thank you again for your time. Thresholding is widely used for cell detection in anterior uveitis studies, but to our knowledge, we are the first to reveal its limitations. Our MCD (open-sourced on GitHub) improves F1 by ~7% over the best baseline, offering a more reliable alternative. We hope the community benefits from this advance. Thank you for your consideration.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper introduces MCD, a two-stage framework for detecting tiny inflammatory cells (<0.005 % of the image area) in AS-OCT. A zero-shot Field-of-Focus module first segments the anterior chamber by combining SAM with automatically generated anatomical prompts; a Fine-grained Object Detection module then proposes low-intensity candidates with an adjusted-threshold MiRP and filters them with the dedicated Cell Mamba classifier. On a clinical AS-OCT cohort the approach raises F1 by roughly seven points over the best baseline, revealing the limitations of threshold-only practice in current uveitis studies.

    Initial reviews were 4 / 4 / 2 (two Weak Accept, one Reject). After the authors clarified hyper-parameter tuning (grid-search on a validation split), added results for recent detectors (DETR, Deformable-DETR, DINO, YOLOv8/11), explained the choice of the VSS/Mamba backbone, and promised public release of code and data, all three reviewers switched to Accept.

    Because the rebuttal resolves the outstanding concerns—parameter robustness, baseline coverage, figure quality—there are no substantive objections left and the AC recommends Accept. For the camera-ready version the AC asks the authors to incorporate the promised clarifications concisely in Section 3.2, correct the few typographical slips (for instance the “RetinaNet” spelling and undefined symbols in figures), and replace Figures 2 and 4 with higher-resolution renditions. Per the rebuttal policy no new experiments can be introduced at this stage; only clarifications and polishing are expected. With these small edits the manuscript will make a solid contribution to the MICCAI community.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have addressed the concerns raised by all three reviewers, each of whom now leans toward acceptance. It is recommended that the authors further refine the current version in accordance with the reviewers’ suggestions.



back to top