Abstract

Medical image segmentation, a critical task in medical image analysis, plays a key role in assisting clinical diagnostic workflows. However, traditional fully supervised learning methods for segmentation require large, high-quality annotations from expert physicians, which is resource-intensive and time-consuming. To mitigate this, scribble supervised segmentation approaches use simplified annotations to reduce annotation costs. Nevertheless, the simplistic nature of scribble annotations limits the model’s ability to accurately distinguish foreground anatomical structures from the background and differentiate between various anatomical classes. This limitation results in low accuracy in capturing foreground morphology and hinders the model’s generalization ability. To address this, we propose an Enhanced Foreground Feature Discrimination Network (EFFDNet) that better leverages semantic information in scribble annotations to improve the network’s foreground discrimination ability. EFFDNet introduces an innovative Foreground-Background Separation Loss (FBSL), enhancing the model’s ability to distinguish between foreground and background features, and improving the morphological accuracy of foreground anatomical region recognition. Additionally, we propose a new Foreground Augmentation with Diverse Context (FADC) strategy to further enhance the network’s attention on the foreground and increase training sample diversity, mitigating overfitting and improving generalization. We validate our approach through systematic experiments on two publicly available datasets, demonstrating significant improvements over existing methods. The code is available at: https://github.com/Aurora-003-web/EFFDNet.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0317_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Aurora-003-web/EFFDNet

Link to the Dataset(s)

ACDC dataset: https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html NCI-ISBI dataset: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=21267207

BibTex

@InProceedings{LiuJin_EFFDNet_MICCAI2025,
        author = { Liu, Jinhua and Tan, Shu Yun and Yang, Xulei and Xu, Yanwu and Yeo, Si Yong},
        title = { { EFFDNet: A Scribble-Supervised Medical Image Segmentation Method with Enhanced Foreground Feature Discrimination } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {194 -- 204}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors present a technique for scribble-based image segmentation. The technique’s main contributions consist in a “contrastive loss” named Foreground-Background Separation Loss (FBSL) and a Foreground Augmentation with Diverse Context (FADC) strategy. Experimental results on two datasets suggest that the proposed technique either outperforms or is statistically equivalent to other SOTA weakly-supervised segmentation techniques.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The methods proposed are quite interesting and sufficiently novel.
    2. The experimental set-up is sensible and sufficiently broad.
    3. The results are believable. Particularly appreciated is the statistical analysis!
    4. The paper is well-written and easy to follow overall.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Lack of clarity in some points: it is rather difficult to understand some parts of the Methods’ section. Some explanations to provide additional intuitions behind the math would be very useful. Additionally, I find Fig.2 quite confusing: a re-design (and a longer caption) would help the reader in getting a better overall understanding of the technique.
    2. Unclear experimental process: “In testing, all slices were reassembled to reconstruct 3D images for evaluation. A five-fold cross-validation approach was applied to assess segmentation accuracy.” Did the authors not use the provided test sets and created new ones? If so, why? Also: how were the competing techniques implemented and tuned? Are these results taken from the literature or obtained directly by the authors? These are very important details that need clarification to properly assess the impact of the proposed technique.
    3. Reliance on a particular backbone: the authors state “We employ U-Net [24, 19] as the backbone, which can be replaced with other advanced models”. This is an important aspect: can the authors provide a discussion on how the proposed methods would be implemented in other segmentation architectures? What is the benefit of sticking to a U-Net instead?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. The sigma (summation) sign in the denominator of Eq 1 doesn’t seem to sum anything. I suggest to provide a different notation to improve readability.
    2. Please specify in the caption of table 1 a) what the numbers represent (aka DSC) and b) what the asterisk means (so that the reader can interpret the table without having to dig into the main text). I would also clarify which structures belong to which dataset.
    3. The authors state “we design a new mechanism, FADC, which enhances the network’s sensitivity of foreground regions and mitigates overfitting, thereby improving generalization”. I am not sure the current experimental set-up allows to support such statement, which would typically require for the trained models to be tested on other datasets of the same imaging domain. If it cannot be supported by stronger evidence, I suggest to rephrase this statement throughout the paper.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The methodology is interesting and sufficiently novel, the results believable. A few more details are needed to make sure the experimental settings are fair, but I am strongly leaning towards “accept”.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces a scribble-supervised medical image segmentation method designed to enhance foreground feature discrimination by leveraging implicit foreground-background semantics in scribble annotations. The method is evaluated on two public datasets (ACDC and NCI-ISBI), demonstrating state-of-the-art performance, with improvements in Dice scores compared to existing weakly supervised methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel loss an its use of local region aggregation and contrastive learning to enforce foreground-background separation in feature space. It explicitly exploits the spatial semantics of scribbles (e.g., foreground regions tend to cluster) rather than treating them as sparse labels or seeds.
    • Creative data augmentation with foreground swapping to diversify foreground-background contexts. This addresses a key limitation of weak supervision (limited foreground variability) by synthetically creating new training samples.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The method is validated only on 2D slices reassembled into 3D volumes, raising concerns about its behavior for structures with varying spatial extent/slice presence (e.g., lungs in apical vs. mid slices) or true 3D contexts. The lack of evaluation on 3D datasets or multi-instance foregrounds limits its demonstrated applicability.
    • While FBSL improves foreground-background separation, it fails to address intra-foreground misclassification (e.g., RV vs. Myo confusion). This limitation is critical for multi-class scenarios with similar foreground structures (e.g., kidneys or lesions).
    • Additionally, all presented examples involve single foreground clusters, leaving the method’s performance on multi-instance single-class structures (e.g., bilateral kidneys or multiple nodules in a single image) untested.
    • The paper does not thoroughly justify why the mean teacher setup is superior to alternatives like adversarial training or transformer-based approaches, especially for scribble supervision.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Minor comments:

    • Figure 3: Adding a column showing input scribbles (or overlaying them on GT) would help readers assess how scribble shape/extent influences segmentation difficulty.
    • Figure 4: Using the same 2d slices in Figures 3 and 4 would clarify the incremental impact of FBSL and FADC by enabling direct visual comparison to baselines/competitors.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a technically sound approach with two novel contributions (FBSL and FADC) that improve scribble-supervised segmentation. The strengths include clinically relevant datasets and clear ablation studies. However, the lack of validation on multi-instance (either single-class or multi-class) foregrounds, 3D contexts, and lack of justification of the base components (mean teacher) limit its impact. A rebuttal addressing these gaps could solidify the contributions and justify acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a weakly supervised segmentation model, Enhanced Foreground Feature Discrimination Network (EFFDNet), that leverages foreground-background semantic information in scribble annotations. The model contains two novel designs compared to previous scribble:

    1. Foreground-Background Separation Loss (FBSL), a novel loss function that utilize the idea of contrastive learning.
    2. Foreground Augmentation with Diverse Context (FADC), a data augmentation that mix up the foreground and background of the images in the dataset.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The model shows signifacant improvement compare to other weakly/scribble supervised segmentation models. The model also showed comparable performance compared to a base supervised model (UNet)

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. This paper does not clearly define how the bounding box is derived from the scribble annotations. Based on Figures 1 and 2, it appears that the bounding box does not tightly conform to the actual scribble.

    2. Novelty is just incremental. Contrastive loss has already been used in previous segmentation models, such as Wang, Jing, et al. “Positive–negative equal contrastive loss for semantic segmentation.” Neurocomputing 535 (2023): 13-24.

    The idea of mixing images has already been applied to segmentation training, such as Shen, Zhiqiang, et al. “Adaptive Mix for Semi-Supervised Medical Image Segmentation.” arXiv preprint arXiv:2407.21586 (2024).

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The reported performance improvements are significant compared to other weakly supervised models. However, the paper’s contributions appear incremental, as EFFDNet primarily builds on the established mean teacher framework while incorporating existing data augmentation techniques and loss function.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the Reviewers and AC for their comments and positive feedback.

  1. Novelty (Reviewer 3) As pointed out by Reviewer 1 and 2, our method demonstrates innovation. In response to Reviewer 3’s concern about the lack of novelty and incrementality of our approach, we provide the following clarification. First, our methodology is centered on addressing a critical challenge: enhancing the network’s ability to discriminate foreground regions in scribble-supervised medical image segmentation. Each component is tightly aligned with this central objective, forming a cohesive framework rather than an incremental assemblage of techniques. Second, although our proposed Foreground-Background Separation Loss (FBSL) employs contrastive loss as a foundational tool, it incorporates distinctive and novel elements. Specifically, we leverage the implicit semantic knowledge embedded during the annotation process—namely, the semantic relationship between foreground and background. We observed that regions annotated with foreground scribbles tend to correspond to target anatomical structures, whereas other areas are more likely to represent non-target regions. Based on this insight, we designed FBSL to capitalize on these inherent semantic cues, with contrastive loss serving merely as a tool to support our core motivation. Lastly, in contrast to the mixed augmentation strategy mentioned by Reviewer 3, our Foreground Augmentation with Diverse Context (FADC) introduces a unique perspective. While it shares a similar data augmentation function with traditional methods, its primary goal is to generate foreground samples within diverse background contexts, thereby enhancing the network’s discriminative capability for foreground regions. Thus, the FADC strategy embodies a novel motivation and design.
  2. Lack of Validation on Multi-Instance Foregrounds (Reviewer 2) The proposed FBSL and FADC methods enhance foreground region discrimination in weakly supervised learning (WSL) tasks, improving the network’s ability to extract fine-grained features. This is helpful in addressing intra-foreground misclassification, such as confusion between the right ventricle (RV) and myocardium (Myo). The effectiveness of our approach is supported by the visual results in Figures 3 and 4. We also believe that enhanced foreground discrimination aids in more accurate identification of multi-instance single-class structures. Reviewer 2’s comments align with our considerations, and we have emphasized misclassification mitigation as a key direction for future research in the Conclusion section of the manuscript.
  3. Lack of Evaluation on 3D Datasets (Reviewer 2) The proposed strategy is transferable to 3D scenarios, not limited to 2D settings. We are confident that our framework can be effectively adapted to 3D contexts, offering substantial performance improvements. The foreground-background semantics we emphasize are also present in 3D scenes, and enhancing foreground discrimination is crucial for accurate segmentation in both 2D and 3D medical imaging tasks.
  4. Reliance on a Particular Backbone (Reviewer 1) We adopt U-Net as the backbone network, as it serves as the foundational architecture in many advanced weakly supervised medical image segmentation methods that enhance performance through tailored weakly supervision strategies. In our study, U-Net is used as the common backbone for both our proposed framework and the comparison method, ensuring fair evaluation. Our approach improves WSL by introducing a novel FBSL loss and an FADC-based data augmentation strategy. Moreover, it can be flexibly adapted to various advanced network architectures (only the backbone network needs to be replaced), demonstrating strong scalability and applicability. In response to the concerns from Reviewers 1, 2, and 3 regarding the clarity of the methodology, presentation of the text, tables, and figures, and minor errors, we will address them comprehensively in the final version.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top