Abstract

Advantages of cine-MRI include high spatial-temporal resolution and free radia-tion, and the technique has become a new method for analyzing and assessing the swallowing function of patients with head and neck tumors. To reduce the labor work of physicians and improve the robustness of labeling the cine-MRI images, we propose a new swallowing analysis method based on a revised cine-MRI segmentation model. This method aims to automate the calculation of tongue dor-sum motion parameters in the oral and pharyngeal phases of swallowing, fol-lowed by a quantitative analysis. Firstly, based on manually annotated swallow-ing structures, we propose a method for calculating tongue dorsum motion pa-rameters, which enables the quantitative analysis of swallowing capability. Sec-ondly, a spatial-temporal hybrid model composed of convolution and temporal transformer is proposed to extract the tongue dorsum mask sequence from a swallowing cycle MRI sequence. Finally, to fully exploit the advantages of cine-MRI, a Multi-head Temporal Self-Attention (MTSA) mechanism is introduced, which establishes connections among frames and enhances the segmentation re-sults of individual frames. A Temporal Relative Positional Encoding (TRPE) is designed to incorporate the temporal information of different swallowing stages into the network, which enhances the network’s understanding of the swallowing process. Experimental results show that the proposed segmentation model achieves a 1.45% improvement in Dice Score compared to the state-of-the-art methods, and the interclass correlation coefficient (ICC) of the displacement data of swallowing feature points obtained respectively from the model mask and physician annotation exceeds 90%. Our code is available at: https://github.com/MinghaoSam/SwallowingFunctionAnalysis.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2715_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/MinghaoSam/SwallowingFunctionAnalysis

Link to the Dataset(s)

https://github.com/MinghaoSam/SwallowingFunctionAnalysis

BibTex

@InProceedings{Sun_ANew_MICCAI2024,
        author = { Sun, Minghao and Zhou, Tian and Jiang, Chenghui and Lv, Xiaodan and Yu, Han},
        title = { { A New Cine-MRI Segmentation Method of Tongue Dorsum for Postoperative Swallowing Function Analysis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper aims to automate the estimation of tongue dorsum motion in the oral and pharyngeal phases of swallowing. A Temporal Relative PositionalEncoding (TRPE) is designed to incorporate the temporal information for better segmentation of the tongue dorsum.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The exploration of swallowing functionality through motion estimation derived from imaging is an intriguing aspect of the study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The registration algorithm used to align different frames is not clearly described. Since the displacement field is a critical feature for downstream analysis of swallowing function, it is essential to clarify this process.

    2. There is no discussion of earlier work on relative position embedding, such as Shaw, Peter, Jakob Uszkoreit, and Ashish Vaswani’s “Self-attention with relative position representations” (arXiv preprint arXiv:1803.02155, 2018) and Zheng, Wenfeng, et al.’s “Design of a modified transformer architecture based on relative position coding” (International Journal of Computational Intelligence Systems, 16.1, 2023). This omission makes it difficult to assess the novelty of the proposed temporal relative position embedding.

    3. Following the previous point, what rationale supports the effectiveness of your strategy for extracting feature points? Why is this approach considered effective, and why would deformable registration not provide accurate motion estimation? The discussion does not address how this research differs from prior studies on tongue motion estimation, such as “DRIMET: Deep Registration-based 3D Incompressible Motion Estimation in Tagged-MRI with Application to the Tongue” by Bian, et al. (Medical Imaging with Deep Learning, PMLR, 2024).

    4. The dataset size appears relatively small. Has cross-validation been employed to report performance metrics?

    5. In experiment section, neither standard deviations nor statistical tests are reported, making it impossible to evaluate the efficacy of the findings.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See comments in “weakness” section

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Previou research are not discussed. Lack of rationale and explaination behind the motion estimation strategy.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a quantitative and segmentation method for measuring motion of tongue dorsum in cine-MRI to assess swallowing functions. The proposed segmentatoin model combines spatial and frame (temporal) features using transformers. The model is validated and compared with prior methods using datasets involving patients with post-operative effects.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is clearly written, and the topic is important in the field. The authors propose new methods for quantifying tongue motion based on some features, and new segmentation method utilizing both spatial and temporal information of cine-MRI. Experiments seem to validate the effectiveness of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is unclear that how the method in 2.1 improves upon traditional methods. How is the procedure in 2.1 different from existing method for measuring tongue motion?

    • As a related question, Fig.4 present a comparison of the displacement of 10 feature points proposed by this paper, and claim their correlation is high. I think the experiment should be reversed such that, the measurement points by traditional methods (e.g., landmarks) estimated by the proposed method should be compared against the traditional measurement points detected by physicians. Again, it would be good if what is the main improvement made by measuring 10 feature points over tranditional methods, because the readers may be unfamiliar with the process.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Please refer to weaknesses for concerns
    • The order of Fig. 4 and 5 shoud be switched.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall the paper addresses important problem, and provided solution in a clear and simple manner, backed up by experiments. However, some of the contributions and experiments are unclear, and it would be good if further details on the motivation behind the proposal and experiments are provided.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    (1) The paper introduces a new method for analyzing swallowing function using cine-MRI by automating the calculation of tongue dorsum motion parameters through a revised segmentation model that incorporates a spatial-temporal hybrid model and Multi-head Temporal Self-Attention (MTSA) mechanism.

    (2) This approach significantly enhances the accuracy of labeling cine-MRI images, achieving a 1.45% improvement in Dice Score over state-of-the-art methods and demonstrating high reliability with an interclass correlation coefficient over 90% between model-derived and physician-annotated data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) This paper integrates a spatial-temporal hybrid model with a Multi-head Temporal Self-Attention (MTSA) mechanism.

    (2) The evaluation result shows that the proposed approach demonstrates significant quantitative improvements, achieving a 1.45% increase in Dice Score over existing methods and an inter-class correlation coefficient exceeding 90%.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors should better enhance the quality / readability of the figures, to demonstrate the novelty of this study.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors should provide at least some pilot / prototype source code of this study to support the readers better understand the proposed approach.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) The authors should better to provide at least some pilot / prototype source code of this study. (2) The authors should re-design Fig.1, 2 to better demonstrate the novelty of this study.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    (1) This manuscript provide novel information to the research of swallowing function of patients with head and neck tumors. (2) However, the authors should better enhance the quality of this manuscript, especially the figure design to better demonstrate the proposed approach.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors significantly enhanced the paper by revising the manuscript based on reviewers’ comments.



Review #4

  • Please describe the contribution of the paper

    This paper introduces a novel method for swallowing analysis using a tongue dorsum segmentation neural network. The study presents two main contributions: a technique for quantifying tongue motion parameters from segmented masks, and a segmentation model enhanced with a temporal transformer. Experimental results demonstrate the model’s robustness and the consistency of motion parameters with physician annotations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper present a quantitative research focusing on the motion trajectories of swallowing-related structures.
    • They present a novel temporal self-attention mechanism and temporal relative positional encoding to extract temporal features in cine-MRI data
    • Their experiments demonstrate the effectiveness of the segmentation model as well as the consistency between the motion parameters calculated from their model predictions and physician annotation.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No main weakness.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It is better to summarize their main contribution more clearly in the introduction part.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is well-written and easy to understand. Their contributions are valuable to postoperative swallowing function analysis. Their experiments are solid to support their main claims.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Their contributions are valuable to postoperative swallowing function analysis. Their experiments are solid to support their main claims.




Author Feedback

The authors thank the four reviewers for their critical comments and overall ratings (5 4 4 3) of our paper, and our rebuttal focuses on the weakness listed by them. [R3; R4; R5] Comments on the clarification of computation of the tongue dorsum motion. In fact, our method, inspired by reference [11], is designed for better automation and robustness. We now provide a detailed explanation of step 1 in Section 2.1. The are two major stages. (1) We establish a coordinate system for frame alignment. (i) Since the mandible is acknowledged as a crucial and primary support structure for tongue movement, we set the lower margin of the attachment point of the genioglossus muscle on the inner side of the mandible as the origin [11]. (ii) The X-axis is determined by the second cervical intervertebral disc due to its fixed position [11], and the coordinate system is constructed. (2) We then mark the upper border of the tongue dorsum with integrity and accuracy. [R3; R5] Comments on prior studies of swallowing analysis, including deformable registration-based methods. In Section 1, paragraphs 2 and 3, we addressed the limitations of prior quantitative studies on swallowing analysis. Additional insights into prior research are as follows. (1) Prior studies on Cine-MRI may be either inaccurate or less effective. For instance, (i) Young et al. (2023,DOI:10.1002/hed.27309) studied four specific boundary points on the mid-sagittal plane of the tongue. But the four points cannot adequately capture the complexity or describe the diversified features of tongue movement. (ii) Yang et al. (2020,DOI: 10.1371/journal.pone.0228652) studied the characteristics of tongue root during swallowing with an improved deformable registration algorithm to track its motion in four directions using deformation vectors. However, they only focused on local tongue root movement without fully reflecting the entire tongue motion. (2) Unlike tagged MRI, which is less accessible due to its requirement for specific scanning sequences, cine-MRI entails more manual intervention and complex algorithms for accurate deformable registration. While deformable registration methods can estimate overall tongue movement, post-registration error validation is indispensable. Appendix E of DRIMET (Bian et al, PMLR, 2024) states these errors resulting from border mismatches should be corrected manually, otherwise they may undermine downstream tasks. [R3; R4; R5; R6] Method novelty and effectiveness. In Section 1 paragraph 4, we elucidated the motivation and the novelty of our method. Further elaboration on the novelty and effectiveness of our method is provided below. (1) Our method efficiently aligns 10 feature points with minimal annotation, enabling automated extraction. These 10 feature points (P1~P10) can provide a more complex and precise description of the tongue dorsum’s displacement field. (2) The accuracy of our approach depends solely on the ROI precision, eliminating further manual intervention. (3) Our latest clinical statistics from oral physicians reveal significant differences in the motion amplitudes between the squamous cell carcinoma (SCC) and normal control groups at points P8 and P9 (p<0.05). To follow the rebuttal policy, we will not provide our new experimental results (included in one author’s master thesis recently), which can illustrate the effectiveness of the proposed method. [R2; R5; R6] Reproducibility. All the codes and datasets, including 32 swallowing cycles (with 8 images per cycle) labeled by oral surgeons, are uploaded to GitHub for public access. The link will be made public later. [R5] Novelty of our temporal relative position embedding (RPE). We have cited earlier work on RPE ([18] Wu et al.). In CV tasks, the RPE is typically applied to 2D image patches, but our method adapts RPE to the temporal axis for exploring correlations among image frames. The rebuttal and revision policy be strictly followed with NO substantial changes to this paper.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a novel method for analyzing swallowing function using cine-MRI by automating the calculation of tongue dorsum motion parameters through a revised segmentation model that incorporates a spatial-temporal hybrid model and Multi-head Temporal Self-Attention (MTSA) mechanism. Strengths include (1) the valuable contributions to postoperative swallowing function analysis; (2) the model’s effectiveness and consistency with physician annotations; and (3) significant accuracy improvements, with a 1.45% increase in Dice Score over state-of-the-art methods and high reliability, evidenced by an interclass correlation coefficient over 90%. Weaknesses include that the quality and readability of the figures can be improved, and it would strengthen the paper if some pilot or prototype source code were available to support reproducibility. Despite these minor weaknesses, the strengths, including methodological innovation and robust experimental validation, outweigh the weaknesses, leading to a recommendation for acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents a novel method for analyzing swallowing function using cine-MRI by automating the calculation of tongue dorsum motion parameters through a revised segmentation model that incorporates a spatial-temporal hybrid model and Multi-head Temporal Self-Attention (MTSA) mechanism. Strengths include (1) the valuable contributions to postoperative swallowing function analysis; (2) the model’s effectiveness and consistency with physician annotations; and (3) significant accuracy improvements, with a 1.45% increase in Dice Score over state-of-the-art methods and high reliability, evidenced by an interclass correlation coefficient over 90%. Weaknesses include that the quality and readability of the figures can be improved, and it would strengthen the paper if some pilot or prototype source code were available to support reproducibility. Despite these minor weaknesses, the strengths, including methodological innovation and robust experimental validation, outweigh the weaknesses, leading to a recommendation for acceptance.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top