Abstract

Periodontal disease is a leading cause of tooth loss and is linked to systemic conditions such as endocarditis, diabetes, cardiovascular disease, and osteoporosis. Intraoral ultrasound (IUS) videos offer a non-invasive means for diagnosing periodontal structures, but existing segmentation methods rely on extensive manual annotations. We propose OralSAM, a one-shot video segmentation network inspired by the Segment Anything Model (SAM), which requires annotation from only a single frame. Our network integrates an adaptive feature correlation module to capture temporal dependencies and refine segmentation consistency across frames. Additionally, we introduce a self-prompting strategy based on optical flow, dynamically adjusting point prompts based on motion cues in consecutive frames to improve segmentation accuracy. To further enhance robustness, we incorporate a self-correction mechanism that refines mask embeddings adaptively, reducing propagation errors in intermediate frames. The combination of these components ensures effective generalization to unseen anatomical structures and improves temporal coherence in IUS videos. We evaluate OralSAM on both IUS and public datasets, demonstrating superior performance over state-of-the-art methods. Unlike conventional methods, our approach significantly reduces annotation effort while maintaining high segmentation accuracy. Our approach provides a scalable solution for real-time clinical applications, enabling more efficient and accurate periodontal disease assessment. Code is available at https://github.com/BioMedCom/OralSAM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4605_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/BioMedCom/OralSAM

Link to the Dataset(s)

N/A

BibTex

@InProceedings{KumLog_OralSAM_MICCAI2025,
        author = { Kumaralingam, Logiraj and Sivaanpu, Anparasy and Hoang, Manh-Hai and Alavi, Javaneh and Nguyen, Kim-Cuong T. and Punithakumar, Kumaradevan and Lou, Edmond H. M. and Major, Paul and Le, Lawrence H.},
        title = { { OralSAM: One-shot Segmentation for Intraoral Ultrasound Videos with Adaptive Feature Correlation and Self-prompting Strategy } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15966},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a new zero-shot method for segmenting anatomical structures in ultrasound video sequences. The method is designed with the specific application of gum segmentation in intra-oral ultrasound images. The novelty of the proposed approach lies in the seamless and prompt-less integration of temporal constraints within existing SAM-like methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is innovative and, based on the results presented with the CAMUS dataset, probably generalizable to many contexts where ultrasound image sequences are acquired and used. Since ultrasound is inherently a real-time image modality and ultrasound probes also tend to move during acquisition (in addition to any physiological motion being observed), zero-shot video segmentation/tracking makes complete sense and is well-addressed in this paper.

    The paper is also very clearly written and easy to follow for the most part. I also truly appreciate seeing newer imaging modalities being introduced at MICCAI.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper claims that the new method offers improved performance over existing SAM-based methods. The quantitative improvements in terms of the Dice and IoU measures are quite marginal based on Table 1, and this is to some degree acknowledged in the paper. Statistical significance tests would bring more value to the conclusions as some results may be statistically significant and others not.

    Furthermore, Figure 2 is said to show qualitative results on “challenging cases”. While it is easy to see in this Figure that the original SAM model does not perform as well as the other models depicted (this is largely corroborated by the quantitative results in Table 1), the segmentations produced by the 4 other models look virtually identical to me. If they are significantly different, it would be helpful to annotate the images in order to point out where the differences are, or perhaps zoom in on portions of the image where differences can be seen.

    There is also no discussion of what the uncertainty around the manual segmentation used as ground truth looks like. An assessment of inter-rater and intra-rater variability in segmentation on a subset of the data used in the study might help better qualify the differences measured between the different methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    For readers unfamiliar with intra-oral ultrasound, it would be helpful to provide an annotated image where the main anatomical structures are labeled, as well as where the cross-section depicted in the image lies within a diagram of the mouth.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is an interesting paper with potential implications beyond the main application where the methods are tested. However, claims around the results seem rather weakly substantiated. A more careful and nuanced analysis would definitely improve the paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    Based on Segment Anything model (SAM), this study proposed a novel one-shot video segmentation framework for Intraoral Ultrasound Videos. Experimental results and ablation studies show its feasibility.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) To improve SAM’s performance in ultrasound images, the authors design an adaptive feature correlation (AFC) module to leverage the intra-frame temporal information and spatial similarity with the support mask. 2) Using SAM to deal with video requires proper prompt for each frame. This study proposed a self-prompt method to automatically generate point prompt for each frame.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Major issues: 1) Ablation study shows that the SPP’s effect is not that significant. 2) There is a notable previous work MemSAM https://github.com/dengxl0520/MemSAM for echocardiography. MemSAM is designed for medical ultrasound and leverages temporal information. The lack of analysis and comparison with MemSAM makes this study less convincing. Besides, what is the performance compared to SAM 2?

    Minor issues: 1) using the abbreviation CBCT before introducing its full name. 2) In Fig. 1, add S_max to make AFC Module clearer. 3) In section 2.1, how to upsample the support mask? What is the shape of it? In equation (3), the subscripts i, j refers to what? 4) This study is for one-shot segmentation. So the only annotation is the first frame mask M0? Why in Fig.1 there is support mask M(t-1)? Is it the predicted mask of the previous frame? 5) What is the q, p, y in equation (6)? 6) Section 2.3 is a bit confusing. Maybe use “Loss Function” instead and let self-correction mechanism as a sub-section.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Consider using the point tracking method to replace optical flow? You just need to track the contour points while optical flow will compute every pixel. https://link.springer.com/chapter/10.1007/978-3-031-72083-3_60 https://link.springer.com/chapter/10.1007/978-3-031-73647-6_5 https://arxiv.org/abs/2410.11831 https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/htl2.12111 https://ieeexplore.ieee.org/abstract/document/10944021

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This study misses a comparison with a notable state-of-the-art method. Otherwise, I do not have substantial concerns. I would recommend Weak Accept.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a one-shot self-prompting strategy for SAM to improve performance in intra-oral ultrasound videos. The method relies on only 1 labelled frame to classify the rest of the video, reducing labelling costs. The main components of the method are: (a) an adaptive feature correction module to ensure features are correlated across time (b) a novel optical flow self-prompting strategy to guide the model in subsequent frames (c) self-correction mechanism in the loss function to ensure predictions improve over time

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -The motivation is clear and compelling, and the methodology is easy to understand and addresses the problems raised in the motivation. The components of the method are novel and work well together. -The method relies on only 1 labelled frame to classify the rest of the video, which reduces the cost of labelling and the impact of scarce/coarse labels -The results (quantitative and qualitative) are well-presented and show clearly how OralSAM outperforms other SOTA foundation models (SAM, MedSAM, etc.)

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    -There could be more comparisons against other models and more rigorous experiments performed: e.g. what happens if the authors were to combine this approach with different backbones and pre-trained weights (e.g. MedSAM, instead of SAMUS)? What about additional loss functions?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clarity of the paper, novelty of the method, as well as the results presented all contribute to this paper’s score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Thank you for approving our work. We are deeply grateful for your constructive feedback and will refine the article according to your insightful suggestions.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The consensus leans towards acceptance, with scores of 4 (Weak Accept), 5 (Accept), and 4 (Weak Accept). Reviewers generally find the work interesting and novel but highlight areas for improvement, particularly regarding evaluation rigor and comparison to relevant prior work.



back to top