Abstract

The precise diagnosis of urinary stones is crucial for devising effective treatment strategies. The diagnostic process, however, is often complicated by the low contrast between stones and surrounding tissues, as well as the variability in stone locations across different patients. To address this issue, we propose a novel location embedding based pairwise distance learning network (LEPD-Net) that leverages low-dose abdominal X-ray imaging combined with location information for the fine-grained diagnosis of urinary stones. LEPD-Net enhances the representation of stone-related features through context-aware region enhancement, incorporates critical location knowledge via stone location embedding, and achieves recognition of fine-grained objects with our innovative fine-grained pairwise distance learning. Additionally, we have established an in-house dataset on urinary tract stones to demonstrate the effectiveness of our proposed approach. Comprehensive experiments conducted on this dataset reveal that our framework significantly surpasses existing state-of-the-art methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1064_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/BioMedIA-repo/LEPD-Net.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Jin_Location_MICCAI2024,
        author = { Jin, Qiangguo and Huang, Jiapeng and Sun, Changming and Cui, Hui and Xuan, Ping and Su, Ran and Wei, Leyi and Wu, Yu-Jie and Wu, Chia-An and Duh, Henry B. L. and Lu, Yueh-Hsun},
        title = { { Location embedding based pairwise distance learning for fine-grained diagnosis of urinary stones } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a method for urinary stone classification from abdominal X-ray imaging called LEPD-Net. The model uses a context-aware region enhancement (CRE), stone location embedding (SLE), and fine-grained pairwise distance learning (FGPD) module to enhance urinary stone diagnosis in a private dataset of over 400 patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The motivation, in principle, is effectively conveyed: the clinical application is interesting and relatively unique.
    • The organization and layout of the paper are effective, with well-constructed illustrations and tables of results.
    • The proposed method is interesting and seems well-suited to the application.
    • The experiments appear to be thorough, with an ablation study and adequate comparison to prior methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • While this is a unique application, I feel that some details about the clinical context could be clarified. For example, it is slightly misleading/confusing to the reader to refer to this problem as urinary stone “diagnosis” (i.e., does this patient present with urinary stones?). Rather, this paper concerns the discrimination of different types of urinary stones in images that already known to contain stones. Given that the authors are classifying different types of urinary stones, please clarify the importance of distinguishing these different types. Why is it important to do so? Does this impact treatment? Are some types worse than others? Are these types even pathological (e.g., are some harmless/benign?)?
    • Related, given that all image patches contain stones (as far as I’m aware), I feel that the clinical applicability of such a model is heavily limited. How would this method be applied “in the wild” to a new patient for whom we do know (a) whether they have urinary stones or (b) where those stones are located? Surely, in clinical practice, the vast majority of patients do not present with stones and an even vaster majority of “image patches” would contain no stones at all. How would the proposed model handle this imbalance given that it’s encountered no negative examples during training?
    • Relation to prior work could be expanded. For each proposed component of LEPD-Net, I would encourage the authors to conduct a thorough literature review and cite methods that use the same or similar approach. For example, what other medical image analysis papers use prior location information to enhance the learning process? How is the pairwise distance learning model related to, e.g., supervised contrastive learning [1]?

    [1] Khosla, Prannay, et al. “Supervised contrastive learning.” Advances in neural information processing systems 33 (2020): 18661-18673.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    I’m aware of the difficulties surrounding releasing private hospital data, but are there any plans to release the dataset used in this paper?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Comments/questions:

    • Please clarify that by urinary stone “diagnosis”, the authors mean that they are distinguishing different types of urinary stones. The authors should use this to bolster the clinical motivation: Why is it important to distinguish these types of stones? Are some worse than others?
    • Is every image patch already known to contain a stone? If so, then I feel this method is heavily limited in terms of clinical impact. When deploying on real-world patients, we do not know where stones are located or whether the patient has urinary stones at all. I would expect this model to fail when encountering the large number of “negative” image patches it would encounter in practice at test time. A more useful version of this model might be one that includes patches with no stones and first classifies whether any stone is present, then provides a more fine-grained classification if necessary.
    • I urge the authors to find and cite relevant prior studies with similar approaches to the proposed method (e.g., supervised contrastive learning is quite similar to the proposed pairwise distance learning module).

    Minor comments/questions:

    • What exactly is meant by “fine-grained diagnosis”? This phrase is being used in this paper as if it is a standard term with a specific meaning – please define this or use another phrase.
    • Section 3.2 is slightly unclear: “The resulting 6-dimensional vector y ∈ R6 is subsequently embedded through a sequence comprising a 1D convolutional layer, a batch normalization layer, and a sigmoid linear unit (SiLU) layer. This embedded feature is then expanded to match the dimensions of the region-enhanced features xff ∈ Rc×h×w, yielding the location- embedded features yem ∈ Rc×h×w”… What exactly does “expanded” mean? Does this mean the embedding produces a chw feature vector that is “reshaped” to cxhxw? Something else?
    • Also, why was a sigmoid linear unit (SiLU) used here when presumably ReLU is used elsewhere throughout the network.
    • “…we integrate the label smoothing strategy [17] with the distribution of stones”… I am aware of label smoothing, but what does “with the distribution of stones” mean?
    • “To ensure a fair comparison, all models are trained under identical settings.” The authors should acknowledge that using fixed hyperparameters across models may actually give advantage to certain methods. E.g., if the chosen hyperparameters were selected for the proposed method, they may not be well-suited to other baseline methods.
    • “Besides, the SLE module encodes…” Awkward – “besides” is not appropriate here
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clinical application and method proposed in this paper are unique and interesting. However, I have major concerns about the clinical applicability of this approach given that it has only encountered “positive examples” of image patches that are a priori known to contain urinary stones. The authors should either clarify the language surrounding the motivation to reflect this fact or consider expanding their approach to incorporate negative examples with no stones.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    One of my main concerns was that the proposed method seemed to only accommodate image patches with known stones, which would make real-world deployment infeasible (most patches – and patients – do not have stones). The authors’ rebuttal has clarified that “non-stone (NS) patches” were extracted and used to train the model. However, I find it strange that this information was absent from the original submission, and the rebuttal did not resolve my concern surrounding how the model would be used at inference time on a new patient: “our model first performs course segmentation using a sliding window strategy on a KUB image to identify high-confident NS patches and patches containing suspicious stones.” Why was this information not present in the original submission? Is this a separately trained model to first flag suspicious vs. non-suspicious patches? The details of how this step works are important.

    Overall, the method is interesting, the results appear convincing, and the application is unique. Currently I am leaning toward rejection due to some fundamental confusion over methodology and real-world applicability but am open to acceptance if the aforementioned details are clarified.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a framework that could utilize the deep learning methods for diagnosis of urinary stones based on the medical images, which considers the location information into the image modality and strengthens the interactions between image pairs. Besides, this work constructs the in-house KUB dataset for stone diagnosis

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This work utilizes the localization information of the stone region to improve the accuracy of the prediction.
    2. This work leverage metric learning to distinguish the dissimilar samples and pull the similar samples together
    3. The authors construct their in-house datasets for urinary stones diagnose, which could be the supplementary for the exiting datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors do not compare their models with the exiting methods which are aim for the stone detection or segmentation. These methods use a larger dataset compared to the small in-house dataset constructed by the authors. For example, the CAD model (https://doi.org/10.1186/s12894-021-00874-9) is a 17-layer residual network trained and evaluated on plain X-ray images of 1017 patients with a radio-opaque upper urinary tract stone. Another U-Net model (https://doi.org/10.1109/ICSIP49896.2020.9339452) also provide the supplementary dataset for segmentation.
    2. The authors claim that CNN-based approaches may not effectively incorporate localization information. The authors should explain why the CNN based models are better than the transformer based models.
    3. There many models which could segment the stones. Why do the author only use the coarse segmentation network?
    4. The writing needs to be improved.
    5. The authors do not provide the source for results reproduce. Besides, the authors also do not provide the detailed parameters of their model for reproduce.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    There is no link of the source code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should compare their models with the exiting methods which are aim for the stone detection or segmentation on different exiting datasets.
    2. The authors should explain why the CNN based models are better than the transformer based models according to table 1.
    3. The author should explain why they do not use the fine-grained segmentation network?
    4. The writing needs to be improved.
    5. The authors should provide the corresponding part for results reproduce like GitHub repo link.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors utilize interesting modules to consider the location and similarities of the stones.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The author(s) have addressed my comments.



Review #3

  • Please describe the contribution of the paper

    A novel model architecture is presented to classify urinary tract stones via low-dose KUB radiography (vs. the generally much higher accuracy NCCT approach), achieving an accuracy comparable to the NCCT approach (based on the performance of an in-house dataset vs. the generic accuracies of the methods reported).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The presented model architecture is novel and achieves good performance on results compared to other more generic architectures. The performance has been tested on a reasonably large inhouse dataset.

    The proposed fine-grained pairwise distance learning module seems to be useful to handle class imbalances, potentially more generally.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method seems to require manual locations of the stone areas to classify them. How sensitive to the manual location accuracy and precision is the classification? How critical is the manual location of the stones vs. their automated classification regarding the amount of manual work required and improvement of the diagnosis?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Presumably only the code, but not the inhouse dataset will be release, which could make reproducibility and comparison difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please clarify what input is needed for the model vs. what it does automatically. This is not quite clear from the paper and the architecture presented. I assume the location of the stone and its bounding box are an input, which may require substantial manual work, but classification may of course still save time / improve diagnosis. A brief discussion of the practical usefulness in this context would be helpful as well.

    The introduction claims a third contribution on consistent performance improvements over recent state-of-the-art approaches on medical and natural images. This is not really demonstrated, as the data is limited to urinary stones, so please amend or justify better.

    Indicating the strength of the online augmentations applied would be useful.

    Why has the encoder-decoder architecture not be replaced with a UNet? Or what would a UNet improve? (Sec. 4.1)

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Generally this is good work, but possibly limited by the manual annotation required, if I understood this correctly.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The author’s have addressed my comments and well clarified my main concern on the annotation. Overall this is good work.




Author Feedback

We thank R1, R3, R4 for the appreciation of motivation, novelty, and clinical uniqueness. Below, we summarize and address the major concerns.

Q1: Applicability on patients without stones (R3), and clarifications on model’s input and automation (R4) A1: We clarify that our model is not limited to image patches already known to contain stones. The image patches with stones are extracted through a two-step process as given in Sec. 2 Par. 2. Non-stone (NS) patches are randomly extracted from each KUB image, with the coordinates centered on the NS patch box. We extracted 410 NS patches in our 5-fold validation. Thus, the model can be deployed on new patients with or without stones. We will add the clarification upon acceptance. During clinical deployment, our model first performs course segmentation using a sliding window strategy on a KUB image to identify high-confident NS patches and patches containing suspicious stones. For suspicious patches, the location information is generated automatically, without manual intervention, by calculating the center of the coarse stone region. Finally, the acquired location information and patches are used for automatic fine-grained classification. The process will be added to Sec. 4.1.

Q2: Importance of classifying different types of stones (R3) A2: Reference [18] shows that current diagnostic accuracy is low. Identifying the accurate type of urinary stone is critical in precision treatment planning. For example, renal stones can be treated with medical expulsion therapy or surgical intervention, while phleboliths are common benign pelvic calcifications that often require treatment of venous thrombosis to cure. Thus, accurate classification of each individual stone is essential. We will emphasize motivation in Sec. 1, Par. 2.

Q3: Citing more related papers (R3) A3: For CRE, we identify one paper [19] on medical knowledge-guided regions. For FPD, we find a similar idea [23] in supervised contrastive learning. Although the concepts of each proposed component are related to these methods, LEPD-Net is a distinct approach.

[19] Wang K, et al. “KGZNet….”, IEEE BIBM 2019. [23] Khosla P, et al. “Supervised contrastive learning” NeurIPS 2020.

Q4: Existing methods for comparison (R1) A4: We did not conduct experiments for the following reasons. First, our model is designed for fine-grained diagnosis with auxiliary coarse segmentation to provide location information. Therefore, comparing it to fine-grained stone segmentation methods is not necessary. Second, the model [9] uses a simple 17-layer ResNet architecture, whereas we have already utilized ResNet18 (as in Table 1) for comparison.

Q5: CNN-based models outperformed Transformer-based methods (R1) A5: We explain that although transformer-based methods can capture long-range dependencies, their heavy-weight architectures are susceptible to overfitting on our dataset. CNN-based methods, which have fewer parameters, tend to achieve better results.

Q6: Reproducibility (R1) A6: We have uploaded the code to a GitHub repository, but concealed crucial information within the link due to double-blind review. The full link will be added upon acceptance.

Q7: Replacement of the coarse segmentation network (R1, R4) A7: The network, which can be any segmentation network, is used to detect potential stone region(s) from the relatively large KUB image before precise downstream classification (explained in Sec. 3.1, Par. 1). For computational efficiency, we employ a lightweight model. We will leave the investigation using different segmentation models in future work.

Q8: Why use SiLU (R3) A8: We consider location as auxiliary information to enhance image features. Thus, SiLU is chosen to smooth location features within the range of [0,1].

Q9: Strength of online augmentation (R4) A9: Online augmentation is performed to mitigate the risk of overfitting. The justification will be added to Sec. 4.1.

Other minor comments will also be corrected.

Thank you!




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Satisfactory rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Satisfactory rebuttal.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top