Abstract

Medical image segmentation typically relies on large, accurately annotated datasets. However, acquiring pixel-level annotations is a labor-intensive process that demands substantial effort from domain experts, posing significant challenges in obtaining such annotations in real-world clinical settings. To tackle this challenge, we present the SA-Net framework, which leverages cross-supervision from segment anything models (SAM) and 2D segmentation networks to learn from sparse annotations. Specifically, we design an interactive graph learning segmentation network, which employs a bilateral graph convolution (BGC) module to capture more detailed features from multiple perspectives, facilitating the generation of high-quality pseudo-labels, which can serve as direct supervision for semantic segmentation networks and SAM, enabling the synthesis of additional annotations to enhance the training process. The multi-scale attention (MSA) module facilitates cross-layer interaction by partitioning channel label groups and capturing global information across layers, while the recovery module (RM) utilizes deep features and low-level features to fuse global context information and reconstruct lesion boundary regions. Our experimental results on LUNA16, AbdomenCT-1K, and self-collected datasets demonstrate the effectiveness of SA-Net. Our code is available at https://github.com/CTSegPilot/SA-Net.git.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3529_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/CTSegPilot/SA-Net.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{SuHua_Sparsely_MICCAI2025,
        author = { Su, Huaqiang and Liu, Zaiyi and Yao, Lisha and Li, Sunyun and Lin, Hun and Chen, Guoliang and Chen, Xin and Lei, Haijun and Lei, Baiying},
        title = { { Sparsely Annotated Medical Image Segmentation via Cross-SAM of 3D and 2D Networks } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {534 -- 544}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes the SA-Net framework, which leverages cross-supervision from segment anything models (SAM) and 2D segmentation networks to learn from sparse annotations. Specifically, this paper designs the bilateral graph convolution (BGC) module, the multi-scale attention (MSA) module, and the recovery module (RM) to enhance representation and improve the segmentation accuracy under sparse annotations.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper is well-organized and well-written.
    • This paper proposes a cross-supervision strategy that combines SAM (SAM-MED3D) with conventional 2D segmentation networks to address medical image segmentation under sparse annotations.
    • The proposed method achieves impressive segmentation performance using only one-third of the full annotations.
    • The design of specific modules (BGC, MSA, and RM) enhances the network’s representational capacity and contributes to improved segmentation results.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The proposed framework contains several complex modules. The motivations behind each module should be further clarified, and experimental evidence (e.g., feature visualizations, not just ablation studies) should be provided to demonstrate their necessity and effectiveness.
    • The baseline methods should include more relevant and competitive state-of-the-art approaches suitable for sparse annotation settings (e.g., general methods like SAM2 with mask prompts or few-shot methods like UniverSeg) rather than relying only on traditional methods.
    • In addition to segmentation accuracy, computational efficiency metrics should be reported and compared to provide a more comprehensive evaluation of the proposed method.
    • More implementation details are needed, such as data splitting strategies, the process for generating sparse annotations, and how the baseline methods were implemented.
    • Releasing the source code is strongly recommended to enhance the reproducibility and credibility of the results.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a cross-supervision framework for sparse annotation in medical image segmentation, achieving promising performance. However, the proposed method involves several complex modules whose motivations and effectiveness lack sufficient explanation and supporting evidence. The baseline comparisons are limited, excluding relevant SOTA methods. Moreover, important implementation details and efficiency metrics are missing, and the source code is not available, affecting reproducibility. These limitations reduce the overall clarity and completeness of the work.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The author’s reply addressed most of my concerns.



Review #2

  • Please describe the contribution of the paper

    The paper introduces SA-Net, a framework for sparsely annotated medical image segmentation, leveraging cross-supervision between 3D and 2D networks alongside the Segment Anything Model (SAM) to generate pseudo-labels. The proposed method includes the 1) Bilateral Graph Convolution (BGC) module, which enforces semantic-boundary constraints via graph interactions; 2) the Multi-scale Attention (MSA) module, enabling cross-layer global context capture through channel/spatial partitioning; and 3) the Recovery Module (RM), which fuses deep and shallow features to refine lesion boundaries.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The manuscript is well organized. The experiments on multiple datasets are sufficient to demonstrate the effect of the proposed method.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Several main concerns are as follows.

    1. The paper’s methodology exhibits a “modular patchwork” issue, where components like SAM, BGC, MSA, CCA, and CSA operate as isolated “plug-and-play” modules rather than parts of a unified framework. Also, the modules seem to address different challenges (blurred boundaries, semantic gaps, scale variance) without anchoring to a central challenge.
    2. None of the compared approaches (Table 1) explicitly utilize SAM or SAM-derived frameworks. The ablation study (Table 2) includes SAM as part of the baseline but the performance of the baseline (A2) has surpassed all other methods except Cross-teaching. SA-Net does not compare against other SAM-based methods, which weakens the empirical validation of SA-Net’s novelty and practical utility.
    3. Experimental details and hyperparameters are missing. Also, the SAM (e.g., fine-tuned SAM-Med3D?) and the backbone of the 2D networks are unspecified.
    4. The paper states that “only one-third of the slices are annotated” but fails to clarify the selection criteria. Are labeled slices chosen randomly, uniformly spaced, or based on anatomical landmarks?
    5. There is no ground truth in Figure 3.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the work demonstrates potential, several critical limitations prevent a stronger endorsement. I recommend weak accept.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The paper proposes a technically sound framework and demonstrates reasonable empirical performance through comprehensive experiments. The author explains the role of each module in Rebuttal, but I think the overall motivation and core objective are not sufficiently clear. The framework integrates multiple techniques (e.g., SAM, graph convolution, multi-frequency attention, cross-attention) addressing different challenges (low contrast, blurred boundaries, semantic gaps, etc.). These components, although empirically effective, limit the perceived novelty and clarity of the contribution.



Review #3

  • Please describe the contribution of the paper

    This paper propose a novel SA-Net segmentation framework that combines a graph neural network with SAM to address the limitations of existing sparsely annotated methods for medical image segmentation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper combines SAM with traditional 2D segmentation networks, allowing them to generate pseudo-labels for each other to address the problem of sparse annotations.

    2. The proposed BGC and MAS modules enhance the model’s ability to capture semantic and geometric information, as well as its capacity for contextual modeling.

    3. The proposed method achieves state-of-the-art performance on three benchmark datasets.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. In Section 2.1, the authors mention that a confidence threshold is used in the MSA module to control pseudo-label generation and reduce noise. This appears to be an important parameter in the model. What value was it set to, and how was this value determined?

    2. The authors should more clearly explain the motivation for incorporating a graph network into a 2D framework, as well as the underlying mechanism through which it contributes to performance.

    3. The proposed model appears to have a relatively large computational and parameter overhead. It would be helpful if the authors could provide more training and inference details to demonstrate the model’s efficiency.

    4. In Section 1, the authors state that the proposed model leverages the exceptional segmentation capability of SAM. However, none of the baseline methods in the comparison seem to use SAM or other large-scale models, which may result in an unfair comparison.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on #6 and #7.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have adequately addressed the concerns I raised, and they have committed to making the code publicly available upon acceptance of the paper.




Author Feedback

Thanks for the rebuttal invitation. We itemize our responses to significant points as follows: (1)Experiment details and source code (R1, R2, R3): We implemented SA-Net using PyTorch and calculated the standard deviation using 10-fold cross-validation, and evaluated its consistency across different dataset partitions. SA-Net and comparison methods used the Adam optimizer with an initial learning rate of 0.0001 and a batch size of 2. The slice selection employed a “cross-annotation” strategy to enhance data diversity by selecting slices from multiple planes, while a uniform spacing strategy was applied for single-plane slices to ensure adequate representativeness. Upon acceptance, we will make SA-Net implementation publicly available on GitHub. (2)Ablation experiment analysis, design motivations, and central challenges of each module (R1, R2, R3): We propose the SA-Net framework, which integrates cross-supervision from SAM and a 2D segmentation network to address the challenges of lesion segmentation in CT images, including low contrast, ambiguous boundaries, and semantic gaps. The pre-trained SAM (SAM-Med3D) is utilized to generate high-quality pseudo labels, thereby providing effective supervisory signals for network training and resulting in the A2 method achieving enhanced segmentation performance. The BGC module constructs graph representations by aggregating pixels with similar features into nodes and subsequently employs bilateral graph convolution to capture long-range dependencies and enable inter-task information propagation, thereby facilitating mutual interaction between segmentation and boundary detection, and improving segmentation accuracy. The MFCA module refines high-frequency feature extraction, using multi-frequency channel attention to reduce noise and improve segmentation accuracy. The BAA module applies cross-attention between boundary and internal regions to enhance boundary features, thereby addressing low contrast and ambiguous boundaries. The RM combines deep semantic features with shallow structural information, thereby enhancing segmentation accuracy in low-contrast CT images. The CCA and CSA modules enable effective feature interaction and information fusion through cross-layer channel and spatial attention mechanisms, enhancing detection performance for small lesions and mitigating information loss and semantic gaps. As illustrated in Fig. 4 on the self-collected dataset, ablation experiments show that each module yields statistically significant performance improvements (A1: P-value<0.0001;A2: P-value<0.01;A3: P-value<0.01;A4: P-value<0.05). (3)Comparison of SAM-derived frameworks and parameter overhead (R1, R2, R3): SAM is combined with two 2D segmentation networks and used as a baseline (A2), where the 2D network backbone is a U-Net architecture with a ResNet-50 encoder, resulting in SA-Net having a larger number of parameters than the comparison methods. Moreover, to compare with SAM-based methods, we present the SAM-Med3D experiments in Table 2 (A1). Although extensive comparisons with other SAM-based methods (e.g., SAM2 and UniverSeg) have not yet been conducted, SA-Net’s primary contribution lies in leveraging the synergy between cross-supervised learning and SAM to improve image segmentation under sparse annotation conditions. (4)The confidence threshold hyperparameter (R2, R3): We employ a pseudo accuracy (Pacc) threshold scheme, setting the sample-level confidence threshold t=0.96. Volumes with Pacc>t apply a lower voxel-level threshold of 0.70 to include additional reliable predictions, whereas volumes with Pacc≤t apply a higher threshold of 0.85 to exclude noise. Thresholds were set via ablation experiments on the validation split of the self‐collected dataset. (5)Description of Fig. 3 (R2): Fig. 3 depicts cases of 3D visualization of surface distances between the segmented surface and the ground truth. The segmentation result is closer to the ground truth when the green area is larger.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces SA-Net, a novel framework for sparsely annotated medical image segmentation, leveraging cross-supervision from SAM and 2D networks. Although one reviewer maintained concerns about the clarity of the overall motivation and perceived novelty due to the integration of multiple techniques, the authors provided explanations for each module’s role, and other reviewers found the responses satisfactory. The framework is deemed technically sound, demonstrates strong empirical performance for a challenging problem, and the commitment to public code release further supports its value. Therefore, I recommend accepting it.



back to top