Abstract

Deep learning holds significant promise for enhancing real-time ultrasound-based prostate biopsy guidance through precise and effective tissue characterization. Despite recent advancements, prostate cancer (PCa) detection using ultrasound imaging still faces two critical challenges: (i) limited sensitivity to subtle tissue variations essential for detecting clinically significant disease, and (ii) weak and noisy labeling resulting from reliance on coarse annotations in histopathological reports. To address these issues, we introduce ProTeUS, an innovative spatio-temporal framework that integrates clinical metadata with comprehensive spatial and temporal ultrasound features extracted by a foundation model. Our method includes a novel hybrid, cancer involvement-aware loss function designed to enhance resilience against label noise and effectively learn distinct PCa signatures. Furthermore, we employ a progressive training strategy that initially prioritizes high-involvement cases and gradually incorporates lower-involvement samples. These advancements significantly improve the model’s robustness to noise and mitigate the limitations posed by weak labels, achieving state-of-the-art PCa detection performance with an AUROC of 86.9%. Our code is publicly accessible at github.com/DeepRCL/ProTeUS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4677_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/DeepRCL/ProTeUS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ElgTar_ProTeUS_MICCAI2025,
        author = { Elghareb, Tarek and Harmanani, Mohamed and To, Minh Nguyen Nhat and Wilson, Paul and Jamzad, Amoon and Fooladgar, Fahimeh and Abdelsamad, Baraa and Dzikunu, Obed and Sojoudi, Samira and Reznik, Gabrielle and Leveridge, Michael and Siemens, Robert and Chang, Silvia and Black, Peter and Mousavi, Parvin and Abolmaesumi, Purang},
        title = { { ProTeUS: A Spatio-Temporal Enhanced Ultrasound-Based Framework for Prostate Cancer Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    Summary: The paper presents ProTeUS, a novel deep learning framework to improve csPCa detection from TRUS. Temporal RF signals and clinical metadata were integrated with image features to address tissue heterogeneity issues, and an involvement-aware loss and progressive training strategy was introduced to address the noisy label issue. The approach outperforms state-of-the-art methods on a private dataset.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The framework to integrate time-series RF signals and patient metadata as prompts for the image decoder is novel and well-motivated.
    2. Instead of relying on core-level labeling from histopathological reports, the authors introduced a new approach to train with the cancer involvement score to avoid potential errors in labeling. The idea is inspiring as a workaround for noisy labels.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Some details about the approach are unclear; more details are needed for clarity, see the Details section.
    2. I suppose AUROC was computed from predicted involvement and true involvement, but the authors have not mentioned the criteria for true positive detections to compute sensitivity and specificity.
    3. Progressive Training Strategy: 1) “0.95 error quantiles criterion”, meaning during training, the cases with the largest 5% discrepancy between actual and predicted cancer involvement are flagged as ambiguous. So, ambiguous cases with large true cancer involvement but the model failed to learn from are more likely to be discarded (also evident by the blue dots in Fig.2, which are all on the far right). I don’t quite understand intuitively how this is “prioritizes high-involvement cases before gradually introducing ambiguous samples”. 2) Can the authors plot the results in the test set in the same fashion as in Fig.2? It should be expected that predicted cancer involvement should approach true cancer involvement over time, but from Fig.2 it’s unclear if the dots are approaching the diagonal line.

    Details: 1) Fig. bottom left refers to extracting the time-series signal, connecting with the time-series signal above with an arrow is better. 2) Were the biopsy samples collected during a systematic or targeted biopsy? What’s the reason for only getting 883 samples from 131 patients, which is on average ~7 cores per patient? In the case of multiple cores in the same targeted region, are the samples all used in training? 3) What does other-core information refer to? Does that also refer to cancer involvement, or location? More details about this will be helpful. 4) The authors have not mentioned the size of the test set.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper lacks some important details to make the approach and results more convincing.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The study focuses of prostate cancer detection on temporally enhance ultrasound, using both B-mode, RF data, clinical information. The model was trained with 131 patients with 883 biopsies, with weak labels at biopsy level.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The study presents a solid approach, incorporating all available information. It proposes a progressing learning approach learning first with patients with more care where the weak labels are stronger, and progressively adding the rest of the samples.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The study has a few weaknesses

    • Is not clear how it will be used in clinical care, e.g., if cancer information in other cores is part of the input at inference: that information is not available at the time of the biopsy
    • using RF data in the temporal sequence is great, but that information is not usually available, for instance your approach will not work in the public cohort from the cancer imaging archive with ultrasound.
    • choosing a color scheme (fig 4) - red-green is a bad choice for color blind folks
    • fig 4 needs a color bar, to understand what the color represent, but more importantly to indicate which part of the image to be biopsied by the urologist. There are a lot of spots that are red (presumably that should be biopsied)
    • finally, are your results statistically significant? Is it worth apply your method compared to the others, and what impact an 1-2% increase has clinically.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Solid study with some weaknesses especially on the clinical utility and benefit.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose ProTeUS, a spatio-temporal deep learning framework for localized prostate cancer segmentation from ultrasound data. The model addresses two key challenges in PCa detection: limited robustness to heterogeneous tissue and weak, coarse labels derived from histopathology. The proposed framework integrates: (i) a segmentation backbone for extracting spatial features from B-mode images, (ii) a time-series encoder that processes temporal RF signals from pixels within biopsy regions to capture fine-grained dynamics, (iii) a text encoder for patient metadata (e.g., age, PSA) and cancer involvement from other biopsy cores, with separate projection heads. The extracted multimodal embeddings are further merged through a prompt encoder and used by an image decoder to generate pixel-wise cancer segmentation maps. To handle label noise, the model uses an involvement-aware hybrid loss, adopting cancer involvement percentages to guide supervision. Additionally, a multi-stage training strategy is used to gradually introduce ambiguous samples. The method is evaluated on a private dataset of 883 biopsy cores from 131 patients, showing state-of-the-art performance (AUROC 86.9%, sensitivity 90.1% at 60% specificity), enabling thus more accurate and reproducible TRUS-guided biopsies.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Multi-Modal Data Integration: The proposed framework integrates temporal ultrasound signals, B-mode images, and clinical metadata (e.g., PSA levels, age) using dedicated encoders and a shared prompt-based fusion mechanism. This unified approach enhances prostate cancer detection accuracy, especially under weak supervision scenarios by using complementary information.
    • Benchmarking and SOTA Results: ProTeUS is evaluated against different recent baselines (e.g., Cinepro, ProstNFound, MedSAM), showing superior performance across different metrics. It achieves an AUROC of 86.9%, sensitivity of 90.1% at 60% specificity, and balanced accuracy of 77.9% (Table 1), establishing a new state-of-the-art in prostate cancer detection.
    • Comprehensive Ablation Studies: The paper presents different ablation experiments, illustrating the incremental contribution of each component. For instance, multi-stage training alone provides a +5.9% AUROC gain over the MedSAM baseline (Fig. 3a). Furthermore, qualitative comparisons show that ProTeUS produces more spatially precise and clinically plausible cancer heatmaps compared to MedSAM (Fig. 4).
    • Clinical Feasibility: The model is validated on a real-world dataset comprising 883 biopsy cores from 131 patients, collected using raw RF ultrasound and histopathology-derived involvement scores. This dataset and clinical setup help to show the translational potential of ProTeUS for improving TRUS-guided prostate biopsies.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Clarity in Prompt Encoder Design: While the paper highlights the role of a “prompt encoder” in integrating embeddings from temporal signals, patient metadata, and other-core cancer involvement, the architecture and implementation details of this component remain unclear. This weakens reproducibility.
    • Limited External Validation or Generalization Study: The proposed model is evaluated on a dataset from a single institution, which might limit conclusions about its robustness across different clinical environments. For instance, Cinepro, a relevant baseline, was evaluated on data from two separate hospitals (KHSC and VGH, Canada), providing stronger evidence of generalizability. Including even a small holdout set from an external center would significantly strengthen the clinical relevance of the findings.
    • Comparison to Multimodal or Fusion-Based Baselines: A more comprehensive evaluation would include analysis of different fusion strategies. So while the value of each modality is evaluated, the design choice of how they are combined (i.e., the fusion strategy) is neither described in detail nor benchmarked against alternatives.
    • Lack of Analysis on Multi-Stage Training Overhead: The paper introduces a novel multi-stage training strategy that incrementally incorporates increasingly ambiguous cancer cores over four stages. While this idea is conceptually appealing, the authors didn’t report, e.g. (i) total training time or computational cost compared to baselines, (ii) the overhead introduced by iterative re-evaluation and outlier filtering, or (iii) any impact on convergence stability.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors present a novel model named ProTeUS for ultrasound-based prostate cancer detection that integrates spatial, temporal, and clinical modalities. The authors address two key challenges: tissue heterogeneity and weak label, by combining: (i) a segmentation backbone, (ii) a time-series encoder, (iii) a metadata-aware text encoder, and (iv) an involvement-aware hybrid loss. Following a multi-stage training strategy, ambiguous cancer samples are reduced.

    Comparisons against multiple baselines and detailed ablation studies showing consistent gains from each component. The method achieves state-of-the-art performance across all reported metrics, and the use of a real-world dataset comprising 883 biopsy cores from 131 patients strengthens its clinical relevance.

    Despite its contributions, some aspects limit the clarity of the work:

    • The prompt encoder, a core fusion component, lacks details making it difficult to assess or reproduce.
    • The multi-stage training is iterative and conceptually appealing, yet the computational overhead and convergence behaviour are not analyzed.
    • While the authors report modality-level contributions, the fusion strategy itself is not benchmarked against alternative fusion mechanisms.
    • Finally, the model is evaluated on a single-institution dataset only, and an external validation would be necessary to support broader claims of generalizability.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers for their constructive feedback. We are encouraged that all recognized the novelty of ProTeUS and its potential clinical value.

  1. Inference-Time Data Availability and Clinical Applicability(R1 and R3) The “cancer information in other cores” refers to each core’s text data describing the location and involvement score of all cores from the same patient, which are used only during training to guide supervision. At inference time, this information is not used. The model uses only inputs available at the time of biopsy: B-mode images, RF signals, and patient-level metadata. While RF data are not routinely available, such data are accessible on many ultrasound systems today through special research packages from manufacturers.

  2. Fusion Strategy and Prompt Encoder Design (R4) Each modality is encoded with a modality-specific MLP into a 256-dimensional vector. These embeddings are concatenated and passed through a shared linear layer to form the prompt representation for the image decoder. We also plan to compare our fusion method with additive and attention-based strategies in ablation studies. Our full implementation code will be released to support reproducibility.

  3. Progressive Training and Computational Overhead (R3 and R4) We agree with the reviewer R3’s assessment that ambiguous cases—those with >5% error—and cases with high label noise, which are typically low-involvement cores, should be clearly distinguished. Our progressive strategy prioritizes high-involvement (cleaner) cases first and gradually introduces noisier (low-involvement) samples, not ambiguous ones. We’ll clarify this distinction in the revision. To implement this progressive filtering strategy efficiently, we compute per-sample loss through a forward pass at the end of each stage (i.e., every 4 epochs) and apply a simple quantile-based threshold to filter out high-error outliers. This outlier filtering step is computationally lightweight and introduces only minimal overhead relative to the total training time.

  4. Evaluation Metrics Clarification (R3) AUROC is computed by comparing the predicted continuous involvement scores against ground truth. Sensitivity and specificity are derived from the ROC curve by thresholding these predictions. Specifically, we report sensitivity at 60% specificity, which corresponds to the true positive rate at the point on the ROC curve where specificity equals 60%.

  5. Statistical and Clinical Significance (R1) We compared ProTeUS with CinePro across 7 independent runs using a one-sided paired t-test, yielding a p-value of 0.003, indicating the improvement is statistically significant. Clinically, ProTeUS detected three cancerous cores missed by CinePro. Two of these had Gleason scores of 9, which have poor prognosis and require early intervention. All three had < 10% cancer involvement, underscoring ProTeUS’s ability to detect subtle yet critical findings.

  6. Dataset Scope and Generalization (R4 and R3) Our dataset is unique and clinically valuable. It was collected over three years in collaboration with clinicians, specifically to include raw RF ultrasound, which is rarely stored and not available in public datasets. After quality control, we retained 883 cores from 131 patients and used an 80/20 patient-wise split, with 191 cores from 28 patients in the test set. We are actively pursuing external validation.

  7. Visualizations and Accessibility (R1) To improve accessibility in Fig. 4, we will switch to a more accessible and perceptually uniform color palette, add a color bar, and annotate high-risk regions. Visualizations comparing predictions to ground truth across stages will also be included.

We will revise the manuscript and figures to clarify the fusion mechanism, progressive training, evaluation metric definitions, and visualization layout. If we have addressed your concerns, we kindly ask you to consider raising your score.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Sufficient interest of the reviewers to guarantee a rich discussion during the conference: real problem, integration of multi-modal data, methodological propositions (loss, curriculum, strategy to deal with noisy labels), comparison to SOA. The authors should take into account the clarity issues raised by R3 and discuss the clinical usability (R1)



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top