Abstract

The Gleason groups serve as the primary histological grading system for prostate cancer, providing crucial insights into the cancer’s potential for growth and metastasis. In clinical practice, pathologists determine the Gleason groups based on specimens obtained from ultrasound-guided biopsies. In this study, we investigate the feasibility of directly estimating the Gleason groups from MRI scans to reduce otherwise required biopsies. We identify two characteristics of this task, ordinality and the resulting dependent yet unknown variances between Gleason groups. In addition to the inter-/intra-observer variability in a multi-step Gleason scoring process based on the interpretation of Gleason patterns, our MR-based prediction is also subject to specimen sampling variance and, to a lesser degree, varying MR imaging protocols. To address this challenge, we propose a novel Poisson ordinal network (PON). PONs model the prediction using a Poisson distribution and leverages Poisson encoding and Poisson focal loss to capture a learnable dependency between ordinal classes (here, Gleason groups), rather than relying solely on the numerical ground-truth (e.g. Gleason Groups 1-5 or Gleason Scores 6-10). To improve this modelling efficacy, PONs also employ contrastive learning with a memory bank to regularise intra-class variance, decoupling the memory requirement of contrast learning from the batch size. Experimental results based on the images labelled by saturation biopsies from 265 prior-biopsy-blind patients, across two tasks demonstrate the superiority and effectiveness of our proposed method.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0193_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/Yinsongxu/PON.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Xu_Poisson_MICCAI2024,
        author = { Xu, Yinsong and Wang, Yipei and Shen, Ziyi and Gayo, Iani J. M. B. and Thorley, Natasha and Punwani, Shonit and Men, Aidong and Barratt, Dean C. and Chen, Qingchao and Hu, Yipeng},
        title = { { Poisson Ordinal Network for Gleason Group Estimation Using Bi-Parametric MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors propose the poisson ordinal network for classifying the severity of prostate cancer using MR images. They specifically utilize a Poisson distribution to model the prediction distribution. Additionally, they incorporate a memory-bank-based contrastive learning strategy to regulate the intra-class relations, enhancing the discriminative capability of the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.This paper targets on the Gleason groups estimation through analyzing the MR imaging, reducing the reliance on biopsy, which holds significant clinical importance. 2.The ordinal classification approach is well-suited to Gleason group estimation. The use of Poisson-based prediction and Poisson encoding effectively enforces unimodal constraints on class distribution. 3.Experimental results across two distinct tasks demonstrate a good performance of the proposed framework.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My major concerns come from the novelty of this proposed method. 1.One major contribution the authors claim in the paper is the Poisson-based distribution modeling. However, one previous work [1] at 2017 (also cited in the paper) have already discussed applying poisson distribution for ordinal classification problem. It is essential for the authors to clarify how their approach differs significantly from this prior study. Specific details on any innovative modifications or improvements in application would be particularly valuable. 2.There are some works related to apply deep learning approaches for fully automatic Gleason grade estimation of prostate cancer [2, 3]. The authors should expand their discussion on how their methodology contrasts with these existing approaches. Highlighting unique aspects such as improved accuracy, efficiency, or clinical relevance could help establish the distinctiveness of the proposed method. 3.The suitability of the Poisson distribution in this context is questionable. The authors describe an advancement by one Gleason group as an “event,” with the number of such events corresponding to class indices (0-4). However, the Poisson distribution assumes that events occur randomly and independently, which may not be a valid assumption for Gleason group transitions. For instance, the transition rates between different Gleason scores (e.g., from 0 to 3+3 versus from 3+4 to 4+3) can be significantly different and are likely not independent. The authors need to justify the use of the Poisson distribution in this setting or consider alternative modeling approaches that more accurately reflect the dependencies and variability in Gleason score transitions. [1] Beckham, Christopher, and Christopher Pal. “Unimodal probability distributions for deep ordinal classification.” International Conference on Machine Learning. PMLR, 2017. [2] Pellicer-Valero, Oscar J., et al. “Deep learning for fully automatic detection, segmentation, and Gleason grade estimation of prostate cancer in multiparametric magnetic resonance images.” Scientific reports 12.1 (2022): 2975. [3] Bashkanov, Oleksii, et al. “Automatic detection of prostate cancer grades and chronic prostatitis in biparametric MRI.” Computer Methods and Programs in Biomedicine 239 (2023): 107624.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors provide the link to the source code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.More advanced methods for ordinal classification like [4] could add novelty to this work. 2.Since the authors address the effectiveness of the memory-bank-based contrastive learning, exploring or discussing the influence of the memory size could add some benefits to this work. 3.In the future, the author may consider comparing with more methods related to Gleason Group Estimation such as [2] and [3]. Additionally, external experiment like online ProstateX grand challenge may be included for further validation.

    [2] Pellicer-Valero, Oscar J., et al. “Deep learning for fully automatic detection, segmentation, and Gleason grade estimation of prostate cancer in multiparametric magnetic resonance images.” Scientific reports 12.1 (2022): 2975. [3] Bashkanov, Oleksii, et al. “Automatic detection of prostate cancer grades and chronic prostatitis in biparametric MRI.” Computer Methods and Programs in Biomedicine 239 (2023): 107624. [4] Dey, Prasenjit, Srujana Merugu, and Sivaramakrishnan R. Kaveri. “Conformal Prediction Sets for Ordinal Classification.” Advances in Neural Information Processing Systems 36 (2024).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper shows novelty in adopting poisson-based prediction and contrastive learning onto Gleason Group Estimation. However, more discussion related to the difference with the previous works should be included, especially for those works on Gleason Group Estimation via MRI images and advanced approaches for ordinal classification.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors investigate the feasibility of directly estimating the Gleason groups from MRI scans to reduce biopsies. They propose a Poisson ordinal network modeling the prediction using a Poisson distribution and leverages Poisson encoding and Poisson focal loss to capture a learnable dependency between ordinal classes, rather than relying solely on the numerical ground-truth.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and well organized.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper has some weaknesses technically that can well be addressed.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper has addressed an important problem, however, the approach has a few limitations that the authors need to address: 1- The approach typically assumes a predefined functional form for the relationship between the ordinal categories, such as a linear or nonlinear function. However, real-world data may exhibit more complex dependencies that cannot be effectively captured by the chosen functional form. 2- The approach probably assumes that the relationship between the ordinal categories is monotonic, meaning that as the input variables increase, the predicted ordinal class either stays the same or increases. This assumption may not hold true for all datasets, leading to suboptimal model performance. 3- While the approach can capture nonlinear relationships to some extent, they may struggle to model highly nonlinear dependencies between ordinal classes. This limitation can result in reduced predictive accuracy, especially when the true relationship is complex or non-monotonic. 3- The approach may face challenges when dealing with sparse data or imbalanced class distributions, as they rely on sufficient examples from each ordinal category to learn accurate representations of the underlying dependencies. Sparse data can lead to overfitting or poor generalization performance. The below papers can be referred while addressing this point: “Re- routing drugs to blood brain barrier: A comprehensive analysis of Machine Learning approaches with fingerprint amalgamation and data balancing,” IEEE Access, vol. 11, pp. 9890-9906, 2023. “Dense-PSP-UNet: A Neural Network for Fast Inference Liver Ultrasound Segmentation,” Computers in Biology and Medicine, ScienceDirect, vol. 153, pp. 106478, 2023. 4- The approach may be sensitive to outliers in the input data, especially if the chosen loss function (e.g., Poisson loss) is not robust to extreme values. Outliers can distort the learned dependencies between ordinal classes and degrade model performance. 5- The approach may be sensitive to noise in the input data. Can the authors discuss the potential role of stochastic resonance to enhance the signal using the noise if it would be of any help? The below references could be cited while discussing this: “Toward Computing Cross-Modality Symmetric Non-Rigid Medical Image Registration,” IEEE Access, vol. 10, pp. 24528-24539, 2022. “Development of a Cerebral Aneurysm Segmentation Method to Prevent Sentinel Hemorrhage,” Network Modeling Analysis in Health Informatics and Bioinformatics, Springer, vol. 12, no. 18, pp. 1-14, 2023. “Pathological Liver Segmentation Using Stochastic Resonance and Cellular Automata,” Journal of Visual Communication and Image Representation, ScienceDirect (Elsevier), vol. 34, pp. 89-102, 2016. “Cellular Automata based Left Ventricle Reconstruction from Magentic Resonance Images,” Computer Methods in Biomechanics and Biomedical Engineering: Visualization & Imaging, Taylor & Francis, vol. 5, no. 1, pp. 54-67, 2017. 6- The approach is complex especially when modeling high-dimensional data or incorporating multiple input features. Increased model complexity may lead to longer training times, overfitting, and difficulties in model interpretation.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written, well organized, and attempted to convince beautifully.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have improved the quality of the paper. Thus, should there be space to accommodate more papers, this can well be considered.



Review #3

  • Please describe the contribution of the paper

    The primary objective of this work is to estimate Gleason groups, indicative of prostate cancer grades, directly from MRI scans instead of using biopsies. A Poisson ordinal network is proposed to model the prediction using a Poisson distribution. The model is also tailored to capture the relationship between Gleason groups via Poisson encoding and Poisson focal loss. This method aims to reduce dependence on the ground truth by modeling the class interdependencies.

    Two tasks are evaluated: Gleason group prediction (five classes), and the detection of clinically significant cancer (primary and secondary definitions). Evaluation is conducted using a dataset consisting of 265 publicly available MRIs from 262 patients. The performance surpasses ordinal classification methods, although it does not exceed radiologist performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel approach to the problem using MRI instead of biopsies, potentially useful in risk stratification
    • Excellent communication of research. Clear and concise writing, well-presented figures, formulas, and tables
    • Well-structured experiment setups with good ablation study on the components of the proposed method
    • Introduction of simple yet effective modules for problem-solving, potentially applicable to other ordinal classification tasks
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper lacks a literature review section for the Poisson-based learning framework and contrastive learning methods.
    2. The proposed methods do not display significantly superior performance in Table 1. Some baseline methods outperform them, leading to doubts about the efficacy of the proposed method.
    3. In Table 2, the authors primarily compare their method with baselines that either use Focal loss or alternative label encoding. It would be fairer to compare these methods combined with contrastive learning regularization. This would help verify if the Poisson encoding or other components of the proposed methods are effective. Additionally, further experiments involving other contrastive learning-based methods are needed to examine the effect of contrastive learning regularization.
    4. The authors claim that the proposed methods can identify False Positives (FP) and False Negatives (FN) made by human experts, thereby assisting humans and improving predictions. However, it’s unclear how this is possible since the network’s performance is inferior to humans. Thus, while it might help detect human-made FP and FN, it could also generate more FP and FN given its subpar performance.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Your research presents valuable insights into the use of MRIs for cancer grading, and your unique approach shows promise. It would be beneficial to extend this approach to other ordinal classification tasks and test it against various datasets for improved validation. I’m unsure why a comparison with human evaluation was made. Before drawing meaningful comparisons with human experts, the work may need further advancements and performance validation on large scale datasets. Could some Gleason groups be combined to explore potential improvements in performance for practical applications?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper does have some limitations regarding the evaluation of findings, as it ambitiously compares them with human experts without meaningful outcome and uses just one dataset. However, to the best of my knowledge, the idea and proposed approach are novel and could have intriguing applications for various ordinal classification tasks, especially clinical grading for cancer and other pathologies. Despite the limitations that should be addressed, I don’t believe they should impact the paper’s consideration for publication.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The rebuttal could have been more detailed and specific, as it only provides a brief summary without addressing the specific queries raised in the initial review. However, the paper is well-structured and clear, presenting a significant novel contribution. Therefore, it should be accepted.




Author Feedback

To R1: We appreciate R1’s overall positive comments, in particular, several insightful suggestions for improving our work, including further validation on larger data sets, more ablation studies and practical benefits by combining Gleason groups. We would also like to clarify that the comparison with human radiologist performance does not only demonstrate the diagnostic value of the proposed approach alone, but providing an interesting evidence to support combining ML models and radiologists, e.g. acting as a second reader.

To R3: We thank R3 for her/his positive recommendation, with a number of suggested references to include. We would like to clarify that the proposed method assumes a functional form on the distribution of the change of ordinal categories. No further assumption is made between classes, therefore these categories are not necessarily positively correlated with or monotonic to input variables.

To R4: We thank R4 for valuable comments and address the issues below. We first clarify the difference from the prior Poisson-based study[1]. The key differences are the label encoding, the loss function and the incorporation of a contrastive learning strategy. First, [1] proposes one-hot encoding where all classes are encoded equally and orthogonally, while we introduce Poisson label encoding, which incorporates the order of Gleason group into the encoding. Second, [1] employs cross-entropy loss, which focuses solely on the numerical ground truth and penalizes mispredictions without considering the class order. As grading prostate MRIs is a challenging task with limited data, we propose Poisson Focal Loss, which encourages the model to adapt to the label distribution and focus on hard samples. Additionally, considering the diverse appearances of prostate lesions in MRIs within the same group, we introduce contrastive learning. In the ablation study (Sec 3.2), we denote [1] as Poisson-based Prediction, and the results in Tab 3 demonstrate the effectiveness of our approach over [1].

We agree that summarising the key differences between [2,3] and our method would highlight the novelty. Our approach diverges from them in two key aspects. The first one is the task and clinical relevance with different data and ground truth. [2] and [3] segments clinically significant lesions based on Gleason grade and PI-RADS respectively, while in our work we use the estimation of the Gleason groups from MRI scans. We formulate our problem as a classification task to reduce the otherwise required subject-level biopsies. As the different tasks and evaluation methods, the performance cannot be compared directly. The second one is the motivation and method. Method [2] focus on instance detection and segmentation evaluation while [3] evaluates the multi-modal shift and data augmentation. Our work focuses on ordinal classification and class dependency, unknown inter- and intra-class distributions. In addition, our study was validated on different types of ground truth based on saturated biopsy, for the first time, and is arguably more representative of true disease status.

We agree with the reviewer that any potential dependency between these transitions may not be modelled by our approach, but its significance remains unknown and unobserved from existing clinical data. We model the prediction distribution using a Poisson distribution to encourage the model to learn the underlying order of classes, which seemed to provide a superior alternative to the existing i.i.d. classes assumption.

We would also like to thank R4 for other suggestions including testing other ordinal classification approaches, practical memory bank size (which is currently configured empirically to the size of available training samples) and validation on further data sets such as ProstateX.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Overall reviewers were positive, with the primary concerns being novelty and contributions. The rebuttal clearly delineates the key differences from prevoius work in terms fo tasks, evaluation, and ground truth.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Overall reviewers were positive, with the primary concerns being novelty and contributions. The rebuttal clearly delineates the key differences from prevoius work in terms fo tasks, evaluation, and ground truth.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I agree with meta reviewer 1.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I agree with meta reviewer 1.



back to top