Abstract

Improved thyroid nodule risk stratification from ultrasound (US) can mitigate overdiagnosis and unnecessary biopsies. Previous studies often train deep learning models using manually selected single US frames; these approaches deviate from clinical practice where physicians utilize multiple image views for diagnosis. This paper introduces ThyGraph, a novel graph-based approach that improves feature aggregation and correlates anatomically proximate images, by leveraging spatial information to model US image studies as patient-level graphs. Graph convolutional networks are trained on image-based and patch-based graphs generated from 505 US image studies to predict nodule malignancy. Self-attention graph pooling is introduced to produce a node-level interpretability metric that is visualized downstream to identify important inputs. Our best performing model demonstrated an AUROC of 0.866±0.019 and AUPRC of 0.749±0.043 across five-fold cross validation, significantly outperforming two previously published attention-based feature aggregation networks. These previous studies fail to account for spatial dependencies by modeling images within a study as independent, uncorrelated instances. In the proposed graph paradigm, ThyGraph can effectively aggregate information across views of a nodule and take advantage of inter-image dependencies to improve nodule risk stratification, leading to better patient triaging and reducing reliance on biopsies. Code is available at https://github.com/ashwath-radha/ThyGraph.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3759_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ashwath-radha/ThyGraph

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Rad_ThyGraph_MICCAI2024,
        author = { Radhachandran, Ashwath and Vittalam, Alekhya and Ivezic, Vedrana and Sant, Vivek and Athreya, Shreeram and Moleta, Chace and Patel, Maitraya and Masamed, Rinat and Arnold, Corey and Speier, William},
        title = { { ThyGraph: A Graph-Based Approach for Thyroid Nodule Diagnosis from Ultrasound Studies } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This manuscript proposes a method called ThyGraph for the diagnosis of multiview thyroid ultrasound images. Firstly, this method uses CNN to extract image features for each frame, then separates each frame into patches, and creates a two-layer undirected graph incorporating partition coordinates. Then, the author introduces an image-based graph convolutional network (Image-GCN) and a patch-based graph convolutional network (Patch-GCN) to extract and analyze features from the two-layer undirected graph. Additionally, the self-attention graph (SAG) pooling is utilized to allow consideration of a greater number of nodes during the GCN learning process. By utilizing the first SAG pooling layer, attention can be quantified for each coordinate. The author conducted experiments on a private dataset of 505 samples, outperformed the mainstream methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This manuscript presents a new perspective on the diagnosis of multiview thyroid ultrasound images, viewing different views as different graph nodes, and using the power of graph neural networks to extract and integrate global and local information for each frame.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method introduced in this manuscript is concise and clear, but it lacks detailed implementation. Readers may struggle to obtain sufficient specific implementation details from the description. For example, Why are the ranges for x and y set to [1,9] and [1,4] respectively in Image-Based Graph Construction? How is z encoded in Patch-Based Graph Construction? Why is the frame-level feature vector represented as 1x6016? How are ResNet101, DenseNet201, and ResNeXt101 utilized for feature extraction? What is the rationale behind the formula used for the adjacency matrix in Aij? Have alternative definitions been explored? Additionally, the method is relatively simple overall, and has been reflected in multiple articles, lacking in-depth research and exploration of the thyroid from multiple perspectives. It is suggested to study deeper into the difficulties of the problem of thyroid and provide more experimental or theoretical evidence to validate the effectiveness and reliability of the method.

    Research with similar ideas: DeepGP: An Integrated Deep Learning Method for Endocrine Disease Gene Prediction Using Omics Data MLMSeg: A multi-view learning model for ultrasound thyroid nodule segmentation MV-GCN: Multi-View Graph Convolutional Networks for Link Prediction Multi-GCN: Graph Convolutional Networks for Multi-View Networks, with Applications to Global Poverty

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The writing and structure of the manuscript are concise and standardized. But the proposed method is general and has already been demonstrated in other articles, lacking a targeted design and discussion for multiview thyroid issues. It is suggested to refine the method for thyroid problems. There is lacking discussion on the effectiveness of the method and why it is effective. It is recommended to explore the reasons why this method works and conduct more comparative experiments to illustrate its advantages over other methods.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This manuscript presents a new approach to addressing multiview thyroid diagnosis problems, but the description of the method implementation is not sufficiently detailed, and there is a lack of exploration into the effectiveness of the method. The proposed method framework has applications in multiple domains but lacks a targeted design for thyroid diagnosis issues.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents ThyGraph, a novel graph-based methodology for improving thyroid nodule risk stratification from ultrasound images. By modeling ultrasound studies as patient-level graphs, ThyGraph effectively aggregates spatial information across multiple images, enhancing diagnostic accuracy. This approach, which outperforms previous models in accuracy metrics, also introduces a self-attention graph pooling mechanism for interpretability, potentially reducing unnecessary biopsies and aligning with real-world clinical practices.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel Formulation Using Graph-Based Approach: ThyGraph introduces a graph-based model for analyzing ultrasound images of thyroid nodules, a significant shift from traditional approaches that treat each ultrasound frame independently. This novel formulation allows the model to leverage spatial relationships and inter-image dependencies, capturing a more holistic view akin to how clinicians assess multiple images for diagnosis. This approach is particularly innovative because it mirrors clinical practices more accurately than single-image models, potentially leading to better diagnostic tools.
    2. Original Use of Data Through Patient-Level Graphs: The use of patient-level graphs to model entire studies of ultrasound images is an original approach that stands out in the realm of medical imaging. By aggregating features across anatomically proximate images, the model effectively simulates a more realistic diagnostic scenario where all relevant data points are considered collectively, enhancing the predictive power and relevance of the analysis.
    3. Demonstration of Clinical Feasibility: The paper not only proposes a novel theoretical model but also demonstrates its clinical feasibility through significant improvements in diagnostic metrics (AUROC of 0.866±0.019 and AUPRC of 0.749±0.043). The performance superiority over existing attention-based models underlines the practical value and potential for real-world application, suggesting that this model could help reduce unnecessary biopsies and improve patient management.
    4. Introduction of Self-Attention Graph Pooling for Interpretability: Another innovative aspect of ThyGraph is the integration of self-attention graph pooling, which provides interpretability at the node level. This feature is crucial in clinical settings as it aids radiologists and physicians in understanding why certain images or patches are deemed significant by the model, thereby boosting the trust and transparency of automated diagnostic systems.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper exhibits specific weaknesses that could be addressed to strengthen its scientific contribution and applicability:

    1. Absence of Evaluation on Established Public Benchmarks: The paper does not utilize widely recognized public benchmarks like TN3K[1] and DDTI[2], which are critical for validating the effectiveness and generalizability of new diagnostic methods. These benchmarks are essential for comparison as they provide standardized datasets that have been used in prior research, allowing for consistent assessment and benchmarking against other methodologies.

    2. Uncertainty Regarding the Availability of Implementation The paper does not specify whether the implementation code will be made publicly available. The availability of source code is crucial for the reproducibility of research results and for facilitating further research and verification by the scientific community. Transparency in sharing code allows other researchers to replicate the study’s findings, explore the model’s robustness, and extend the methodology, which is a cornerstone of progressive scientific discovery.

    [1] Multi-task Learning for Thyroid Nodule Segmentation with Thyroid Region Prior, Thyroid region prior guided attention for ultrasound segmentation of thyroid nodules [2] An open access thyroid ultrasound image database

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Benchmarking on Standard Datasets: Consider evaluating ThyGraph on established benchmarks like TN3K and DDTI. This evaluation would enhance the credibility of your findings by demonstrating performance consistency across different clinical datasets and allow for direct comparison with existing methods.

    Code Availability: Please clarify whether the implementation code for ThyGraph will be made publicly available. Sharing the code would greatly facilitate reproducibility and allow the research community to further validate and potentially build upon your work.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation of “Weak Reject” for this paper is based on the following major factors:

    Lack of Validation on Established Benchmarks: The paper does not utilize established public benchmarks such as TN3K and DDTI for thyroid nodule analysis. The absence of testing on these benchmarks is a significant omission because it restricts the ability to compare the proposed method directly with existing approaches, which are validated on these datasets. This limitation makes it difficult to gauge the true effectiveness and generalizability of the proposed model across varied clinical scenarios and different datasets.

    Unclear Code Availability: There is no mention of whether the implementation code will be made available. This lack of transparency impacts the reproducibility of the research and hinders the ability of the community to validate, critique, or build upon the findings. In the current research environment, where reproducibility is a cornerstone of scientific credibility.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I agree with the other reviewe’s opinions about the weakness of these work. And hold the view that testing the algorithms on the public available benchmark is important.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel graph-based deep learning model ThyGraph for predicting the malignancy of thyroid nodules. A series of related experiments were completed by modeling graph nodes as image level and patch level, thus demonstrating the good performance of this method. In addition, a visualization method is proposed to associate attention weights with image orientation and anatomical location, helping the model achieve better interpretability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) This article highlights a crucial issue in clinical diagnosis and alleviates the limitations of existing methods in classifying thyroid nodules. 2) This paper proposes a new visualization method that strongly demonstrates the rationality of ThyGraph. 3) The motivation behind the proposed model is well-defined and robust, strong correlation with actual clinical diagnosis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The experimental part of the paper is too small. The article lacks the necessary ablation experiments, so it cannot effectively confirm the effectiveness of each proposed component. 2) Idea is not super new. The graph model are well explored in this area.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    None

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) Generally speaking, image-level models should perform better than patch-level models because dividing into patches will lose too much information. However, the results of the experiments in the paper are the opposite, which is counter-intuitive. 2) Limitation of the paper should be discussed for better reading. For example, in actual clinical diagnosis, there are usually not a large number of labels available for training. In this case, can unsupervised or semi-supervised be a possible solution? 3) In the experimental part, SAG only integrated with patch-GCN but did not integrate with image-GCN for related experiments. This part should be improved. 4) The experimental part only compared Wang model and MS-AMIL, which may not fully illustrate the superiority of ThyGraph. More method performance comparisons should be added as much as possible. 5) Transformer can also establish information associations on a global scale. So what are the advantages or starting points of using graph convolutional networks for feature modeling in this article? 6) One of the contributions of this paper is the addition of anatomical context, but the effectiveness of this strategy is not shown in the experimental part. Additional ablation experiments should be included. 7) The paper develops additional OCR algorithms for text extraction, but if the extraction results are biased, will it have an impact on subsequent results? If so, how to solve this problem? 8) Figure 3 claims to be thyroid ultrasound images of two patients, but the two images seem to be exactly the same. Please adjust this part. 9) The paper does not give specific training configuration details, please improve this part. 10) Please improve the calculation process of each module to improve reproducibility.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experimental part is not sufficient to prove the effectiveness and rationality of the method; the results presented by the experiment are not outstanding and there is room for improvement.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The author explained my concerns about the article in detail and indicated that changes would be made in the article, and I agreed to the publication of the article.




Author Feedback

We appreciate the reviewers’ valuable comments. If the paper is accepted, we will make the following revisions.

Availability of implementation (R1, R2, R3): The source code is now publicly available on GitHub and will be included for the camera-ready paper.

Method Novelty (R1, R3): We thank R1 for identifying previous work that appears to overlap with our research. We believe there may be some terminology differences that have led to confusion about the relationship between our model and these studies. In graph model literature, there is a concept of creating embeddings using multiple “views,” which entails using different types of node information (e.g., graph structure and feature content). In our case, we are looking at different images that give different “views” or perspectives of a nodule that are represented as unique graph nodes. Thus, the papers suggested by R1 do not attempt similar tasks to our model (i.e., aggregating information across an imaging study); three of the papers did not involve imaging and the fourth (MLMSeg) applies graphs for segmenting single thyroid ultrasound images. We understand the confusion of this terminology and will clarify the text in our amended manuscript.

Comparative/Ablation Experiments (R1, R3): This classification task across an imaging study requires two steps: feature extraction and aggregation across images. During development, several feature extractors were tested, but results were excluded due to space. In terms of aggregation strategies, the Wang and MS-AMIL results demonstrated alternative ways of combining features across images. We excluded the results of trivial aggregation strategies (e.g., feature averaging) as they predictably did not work since most images in a study do not include nodules. Space constraints prevented further ablations on anatomical context. One excluded ablation built graph edges based on feature similarity instead of anatomical adjacency, but it underperformed and was omitted to streamline the narrative.

Method Implementation Details (R1): The x, y, and z coordinates mapped the ordinal anatomical locations into Cartesian coordinates (e.g., right lateral -> x=1, etc.). This mapping was meant to help with clarity; if it is adding confusion, it can be removed. Other details were omitted for space, particularly those related to feature extraction, because that step’s architecture mirrored the already published ThyNet paper. More details will be included in the final version.

Absence of Evaluation on Public Benchmarks (R2): We agree that public benchmarks are important to standardize results and reproducibility. However, current thyroid ultrasound benchmarks (including TN3D and DDTI) are restricted to single images hand selected by experts, and not full imaging studies. Because ThyGraph’s use case is to spatially align an entire image study and aggregate information across images to make a malignancy risk assessment, there are no standardized benchmark datasets.

Limitations (R3): Limitations will be emphasized in the final manuscript. As for label availability, labels were automatically extracted from patient records using NLP, so no manual labels were required. While the NLP method was not the paper’s focus, we realize this step is valuable for future reproducibility, so the label extraction code is included in our public codebase.

Advantage of GCN over Transformer (R3): This study aims to aggregate input information from multiple ultrasounds, considering their spatial relationships and orientations, to make a diagnosis. Graph-based methods excel at modeling spatial dependencies and encoding anatomical information, helping a GCN capture physiological correlations between images. While Vision Transformers can use positional embeddings, they are not inherently suited to model spatial relationships, which are more critical in this medical context. Vision Transformers are still effective for feature extraction and can be explored in future work.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I appreciate authors for submitting rebuttal. During the initial review, novelty and evaluation on public datasets are raised as major concerns. Though the authors response on evaluation on public datasets is convincing, the response to novelty is not convincing. However, considering the results and response from the reviewers, I recommend Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I appreciate authors for submitting rebuttal. During the initial review, novelty and evaluation on public datasets are raised as major concerns. Though the authors response on evaluation on public datasets is convincing, the response to novelty is not convincing. However, considering the results and response from the reviewers, I recommend Accept



back to top