Abstract

The current models for automatic gout diagnosis train convolutional neural network (CNN) using musculoskeletal ultrasound (MSKUS) images paired with classification labels, which are annotated by skilled sonographers. However, this prevalent diagnostic model overlooks valuable supplementary information derived from sonographers’ annotations, such as the visual scan-path followed by sonographers. We notice that this annotation procedure offers valuable insight into human attention, aiding the CNN model in focusing on crucial features in gouty MSKUS scans, including the double contour sign, tophus, and snowstorm, which play a crucial role in sonographers’ diagnostic decisions. To verify this, we create a gout MSKUS dataset that enriched with sonographers’ annotation byproduct visual scan-path. Furthermore, we introduce a scan path based fine-tuning training mechanism (SFT) for gout diagnosis models, leveraging the annotation byproduct scan-paths for enhanced learning. The experimental results demonstrate the superiority of our SFT method over several SOTA CNNs.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1614_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1614_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Tan_Follow_MICCAI2024,
        author = { Tang, Xin and Cao, Zhi and Zhang, Weijing and Zhao, Di and Liao, Hongen and Zhang, Daoqiang and Chen, Fang},
        title = { { Follow Sonographers’ Visual Scan-path: Adjusting CNN Model for Diagnosing Gout from Musculoskeletal Ultrasound } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a method incorporating sonographer eye gaze for gout diagnosis in musculoskeletal ultrasound scans. The method generated the attention map of ultrasound image according to the sonographer’s visual scan path, which showed to improve the image classification with lesion areas.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Using sonographer’s scan path to help gout diagnosis. An improved diagnosis performance in musculoskeletal ultrasound when using the proposed scan-path fine-tuning training mechanism.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The contribution is limited and the literature review for the very closed work is missing. The paper has very similar methodology with existing work that using gaze-based attention map to assist diagnosis of medical images (e.g. [8], [12]), and there is no justification to clarify its methodological novelty over these work. It also missed some clinical-based discussion of how necessary to apply the method in diagnosing gout and what would be the challenge or failure cases when using it.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    ‘We design a novel learnable kernel to recognize different sonographers’ eyes scanning pattern. ’ This might be an over-claim as the method is to regress an attention map rather than modeling between different sonographers.

    In practice, different sonographers may have distinct visual patterns, and it is very likely that many saccades are meaningless in the recorded gaze. How is the method generalized to the variability of the gaze patterns?

    The writing needs improve and the grammar mistakes should be corrected.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend weak reject as the contribution is limited, and it missed some implementation details and clinical discussions. I am looking forwards to the authors’ response.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This research proposes an innovative approach for the improvement of automatic diagnosis of gout using musculoskeletal ultrasound (MSKUS) images. The research focuses on the integration of sonographers’ visual scan path into CNN model training to improve the diagnostic capability for gout. In particular, the study introduces a novel framework called scan-path-based fine-tuning training mechanism (SFT), which allows the CNN model to learn gout diagnosis and capture the sonographer’s attention simultaneously. Moreover, the authors propose a joint optimization objective function of model prediction loss and scan-path interpretation loss to obtain a mutual adjustment between the generation of fixation graphs (obtained from the visual scan path analysis) and diagnostic classification (obtained by a CNN). Finally, the authors compare the approach with other state-of-the-art techniques considering various evaluation metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A notable contribution of this paper is introducing an innovative method to incorporate visual scan path data into gout diagnosis performed by convolutional neural networks (CNNs). Unlike existing techniques, the approach presented by the authors necessitates visual scan path recordings solely during the training phase, a feature that enhances its compatibility with clinical practice. The possibility to seamlessly integrate this approach to different CNNs is another key strength of the work. Furthermore, the paper meticulously evaluates the proposed method, providing a comprehensive analysis that elucidates its efficacy.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper initially presents the use of visual scan information for enhancing diagnosis as a novel insight. However, this concept has been previously explored in the literature, as acknowledged by the authors themselves through references to prior works. As I understand, the paper’s novelty lies in the methodological approach to leveraging visual scan path information rather than the mere idea of its use, thus the presentation could be revised to emphasize this distinction.

    Although the paper discusses the integration of human attention information with CNN classification to emphasize relevant image features for gout diagnosis, it lacks clarity regarding how sonographer experience influences the automation of diagnosis arising from visual scan path information. Further elucidation on how sonographer experience is inferred from visual scan path data and its impact on the overall diagnostic process would enhance the paper’s comprehensibility.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper demonstrates good potential for reproducibility, as the proposed method is thoroughly described and supported by pseudo-code for the fine-tuning algorithm. This enhances the transparency and accessibility of the methodology, facilitating replication and verification of results by other researchers.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is well-written and provides a useful approach for ultrasound annotation improvement. I have some suggestions to highlight the paper’s strengths and clarify certain aspects that didn’t sound completely clear.

    • The paper would benefit from clearly articulating, in the results analysis text, which CNN yields the best results in integrating the visual scan path and under what contexts. Are certain networks more suitable than others depending on the circumstances of the examination?
    • Among the four learnable kernels, is there a preferable one? Providing brief insights into the performance differences or advantages of each kernel could enhance the understanding of the methodology.
    • A significant advantage of this method, in my opinion, is underemphasized in the paper: the ability to seamlessly integrate visual scan path information into various CNN architectures. This aspect should be highlighted more prominently, especially in the discussion of test results.
    • In the abstract, it initially seems like the focus of the work is on constructing a dataset to test the relevance of visual scan path information, which may mislead readers regarding the subsequent content. Clarifying the primary focus of the study would improve the abstract’s coherence.
    • It would be helpful to define the acronym “TLS” the first time it is mentioned in the introduction for clarity.
    • The paper mentions several potential byproducts resulting from annotation. Could you provide examples of what these might entail? could further integrating these byproducts into future work be a fruitful avenue to explore?
    • There is a stray “——” at the beginning of page 3 that needs to be addressed.
    • In Figure 2, “DNN” is mentioned. Is this a typo or an unclarified acronym? The corresponding figure in the additional material video seems clearer.
    • In paragraph 3.2, “Table I” should be “Table 1” with a reference. Consistency in referencing is essential for clarity.
    • The method “CAM” is mentioned in paragraph 3.2 but written in lowercase. It should be capitalized for consistency.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The score I assigned to this paper is primarily based on the novelty of the approach proposed by the authors and its potential impacts in the field of ultrasonography. The ability to easily integrate the proposed method with existing ultrasound protocols and the flexibility to adapt it to different CNN architectures are particularly promising. Additionally, the fact that detecting the sonographer’s gaze is only necessary during the training phase represents a significant advantage in terms of practicality and integration with clinical practice.

    However, I suggested some changes and clarifications to improve the clarity and comprehensibility of the work.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a method to incorporate sonographer gaze-tracking data into the training of a DNN model for the task of predicting gout in MSKUS images. The novelty here lies in how the scan-path data is integrated into the training pipeline. Instead of simply convolving the scan-path with a fixed Gaussian kernel (the current SOTA), the authors propose using a learnable kernel, which can adapt to scanning patterns of different sonographers. This method not only improves classification accuracy overall, but also offers the potential for good explainability by showing the end-user the attention maps that led to the classification. Crucially, the proposed system does not require gaze-tracking input after the model is trained, offering a high degree of clinical suitability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper presents a novel approach to using gaze-tracking to enhance the accuracy of a difficult classification task. The use of a learnable attention kernel is clever and proves itself to be an effective way to increase performance over the current standard: a fixed Gaussian kernel.

    Not only does the proposed system improve classification accuracy, the attention maps can be shown to the end-user as justification for the model’s decision, allowing for good explainability.

    The authors perform a thorough literature review of existing methods that incorporate sonographer gaze-tracking into their training pipelines.

    The fact that the proposed method does not require gaze-tracking information during test, makes it highly attractive for real-world clinical application.

    While the dataset is not huge (1127 samples),the authors use 5-fold cross validation to divide their train-validation data which is a good decision given the high degree of variance that can arise from splitting training and validation data from a smaller dataset.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Their dataset is not very large. The entire dataset is 1127 samples, which when split 80/10/10, leaves only around 110 for the test set, which may not be enough to adequately test the performance of the system. It was also not clear if images from any patient were include in both the train and test sets, an example of data leakage.

    Some details surrounding the exact implementation of the learnable kernel is missing/not clear.

    There is are some minor typos in the paper. In Section 2.2: “model can better aware”, “Our” is capitalized in the conclusion, “differenrt” in caption for Figure A3.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors give a thorough description of the data collection process, including the details required to recreate their pre-processing of the eye tracking data. The experimental setup is also well detailed, as are all the hyperparameters used during training. The authors also provide pseudo-code for their custom training loop. With the exception of some small details around the learnable kernel, I would have no problem implementing this model myself.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    One of the conversations I would like to see added to this paper is one about how to translate the insights gained here to training setups that do not include any eye tracking information, as this will apply to most readers. Could the expert eye scan path be proxied somehow? Perhaps with a single or series of landmark points? Mouse cursor tracking?

    You also mention a list of “crucial features” for gout diagnosis in your abstract: double contour sign, tophus, and snow-storm. I am curious about whether the authors collected individual classification labels for such features. I would also like to see a discussion about how one would go about adding such labels to the training strategy.

    In the appendix you present 4 different potential structures for the learnable kernel, but you do not have any discussion about which was selected and why. Some more information is needed as to which kernel structure was chosen and how it was actually implemented. You mention that there are multiple kernels, but it is not clear what the means exactly. Are they multiple channels of the same kernel? What is the size/shape of the kernel(s)? Some details here are missing from the paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think this paper is highly relevant, not only in the MSKUS space, but their method could be easily extended to other types of diagnoses and even modalities. If gaze-tracking can be effectively proxied by mouse clicks, than this method could be further extended to even more projects. The proposed method outperforms existing human-attention-fused models without the need of additional gaze-tracking data (in test). This is not only impressive from a model accuracy point of view, but also hugely important for the system’s potential clinical translation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

N/A




Meta-Review

Meta-review not available, early accepted paper.



back to top