Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Retinal vessel segmentation from fundus images is an important task in intelligent ophthalmology. Because vessel annotation is particularly challenging, the scarcity of training labels hinders the model robustness for real-world scenarios. Recent research has shown that SAM, a foundation model for natural image segmentation, demonstrates impressive performance on medical images after few-shot fine-tuning. Therefore, fine-tuned SAM holds promise as a pseudo label generator to alleviate the label scarcity problem in vessel segmentation. However, the limited labeled data fails to represent real-world distribution, fine-tuned SAM might produce erroneous predictions in unseen image patterns, which is known as open-set label noises. In this work, we propose SAM-OSLN to reduce open-set label noises and improve the quality of generated pseudo masks. Firstly, we introduce the prototype technique to perform open-set aware SAM fine-tuning and identify open-set label noises accordingly. Subsequently, we design an explicit label denoising method and an implicit training strategy to jointly mitigate the impact of open-set label noises. Extensive experiments demonstrate that SAM-OSLN outperforms previous state-of-the-art methods on multiple fundus datasets under synthetic and real-world scenarios.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1772_paper.pdf

SharedIt Link: https://rdcu.be/eHwOz

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_60

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaMin_Towards_MICCAI2025,
        author = { Zhang, Minqing AND He, Mengxian AND Yuan, Wu},
        title = { { Towards Robust Retinal Vessel Segmentation via Reducing Open-set Label Noises from SAM-generated Masks } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {631 -- 641}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes SAM-OSLN, a framework for robust retinal vessel segmentation (RVS), which integrates open-set-aware fine-tuning, context-based denoising (CLD) and multi-scale weighting (MWS) to enhance the quality of pseudo-labels under domain shifts. From the perspective of data-centric AI, it takes a novel approach. Corresponding experiments were conducted to verify the performance of the method. The following section provides comments on more detailed strengths and weaknesses.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The research topic is novel. Starting from a data-centric perspective, SAM-OSLN supplements effective annotation data by reducing the open-set noise in the masks generated by the foundation model SAM.
2. The results are satisfactory. Good performance has been achieved in both synthetic and real data scenarios, and it outperforms existing state-of-the-art (SOTA) methods in the real dataset scenario.
3. The experimental results are relatively comprehensive and well visualized.
4. The article is written in a relatively smooth manner.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The explanation of the teacher-student relationship between Mt and Ms is insufficient. Moreover, the teacher-student relationship between the two models is not clearly shown in Figure 2. It lacks an explanation of how these two models reflect the teacher-student relationship.
2. The entire process revolves around open-set pixels. The reason for determining the screening threshold Bc in Section 3.1 is not adequately explained. Moreover, the existing language description of Bc is not accurate enough.
3. Since it represents open-set pixels, it is necessary to supplement in Equation 7 that x belongs to o.
4. The rationale for the weight calculation method based on ranking in Equation 8 is insufficiently explained, and there is a lack of sensitivity analysis for the setting of hyperparameters.
5. The perspective of the article is relatively novel, but the overall degree of innovation in the method is average. It relies on the pseudo-labels generated by the foundation model SAM, and the framework complexity is relatively low.
6. The content presented in Section 3.1 slightly overlaps with the previous text. Some of the content could be merged into Section 1.
7. In Section 3.3, the authors need to further explain the implementation details of the self-training and Fine-tuned SAM methods used for comparison.
8. The table formats are inconsistent. Some are three-line tables while others are not. The formats need to be unified.
9. In Section 3.4 for comparative methods, apart from semi-supervised learning and directly using the pseudo-labels generated by SAM, are there other methods that apply pseudo-label correction for comparison? If so, they need to be added.
10. No experiments have been conducted to verify whether the student model has an impact on the experimental conclusions. The effectiveness of different network architectures has not been studied yet. For example, it is not clear whether the conclusions still hold if a model other than TransUnet is used.
11. The source code is not made available, resulting in low reproducibility.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please check the weaknesses section.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

My main concerns have been addressed. I raised my score.

Review #2

Please describe the contribution of the paper

The paper proposes SAM-OSLN, a framework designed to mitigate open-set label noises when using the Segment Anything Model (SAM) for retinal vessel segmentation (RVS). By incorporating open-set aware fine-tuning, context-based label denoising, and multi-scale weighting supervision, the method aims to improve the robustness and domain generalization (DG) capabilities of RVS models under limited labeled data settings.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-The paper tackles a practical and challenging problem in RVS: the scarcity of labeled data and the noise introduced by using SAM-generated pseudo labels under domain shift.

-The idea of explicitly identifying open-set pixels and mitigating noise through both contextual and weighting strategies is well-motivated and thoughtfully designed.

-The proposed SAM-OSLN method shows consistent improvements over baselines and SOTA methods across synthetic and real-world scenarios.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

-Reproducibility is insufficient due to the absence of code release. Given the multi-stage pipeline and the critical tuning of parameters (e.g., neighborhood voting, prototype generation), this hinders verifiability and future research based on this work.

-Method novelty is limited. Although the combination of open-set awareness and denoising is nicely engineered, many of the components (prototype-based detection, pseudo label refinement, teacher-student learning) are adapted from existing paradigms without significant innovation in methodology or theory.

-Dataset choice for domain generalization evaluation lacks standardization. The paper constructs its own benchmark using a combination of fundus datasets, but more widely used RVS datasets such as DRIVE, CHASEDB1, and STARE should be included or emphasized to allow more straightforward performance comparison with prior work.

-Outdated baseline methods. Among the comparative methods, only one (Unimatch) is from [2023]; others are relatively old. Recent advances in semi-supervised learning (SSL), noisy label learning (NLL), and RVS-specific methods should be incorporated to strengthen the evaluation, especially newer methods in each paradigm (e.g., transformer-based RVS, recent SSL pipelines).
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

-Consider including standardized RVS benchmarks (e.g., DRIVE, CHASEDB1, STARE) to improve comparability.

-Incorporate more recent baseline methods for a fairer evaluation of DG capabilities.

-Release code and models to improve reproducibility and adoption.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My evaluation is based on the overall methodological novelty, writing quality, adherence to the required format, and other identified issues.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

It can be seen that the authors have resubmitted last year’s submissions almost unchanged, just trying their luck. The suggestions I made last year and this year have not been satisfactorily addressed.

Review #3

Please describe the contribution of the paper
This paper introduces a novel framework for robust retinal vessel segmentation that addresses the challenges of label scarcity and domain generalization in real-world fundus images. The method leverages the Segment Anything Model (SAM) as a pseudo label generator and introduces a series of mechanisms to reduce open-set label noise, a problem arising when SAM encounters unseen or unfamiliar image patterns.

The key contributions include:
1. Open-set Aware SAM Fine-tuning using prototype-based decision boundaries to identify noisy regions.
2. Context-based Label Denoising (CLD) to revise pseudo-labels based on spatial voting among neighbors.
3. Multi-scale Weighting Supervision (MWS) to reduce the impact of uncertain labels at pixel, patch, and image levels.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper tackles a high-impact challenge in medical imaging: label scarcity and domain shifts in retinal vessel segmentation.
2. Comprehensive experiments are conducted across synthetic interference scenarios and real-world domain generalization tasks, demonstrating robust performance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The prototype-based estimation relies heavily on distances in feature space. There’s no analysis of sensitivity to feature distribution shifts or threshold calibration. Also, the neighbor-voting strategy in CLD assumes spatial continuity of true vessel pixels, which may not always hold in the presence of small or disconnected vessels. Some failure analysis or visualization of mislabeled open-set regions would be helpful.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper offers a well-motivated and technically sound solution to a pressing problem in medical image segmentation. It makes effective use of foundation models and combines them with original noise mitigation strategies that are validated through extensive experiments. While minor issues exist, they do not detract significantly from the overall contribution. The paper provides a meaningful step toward robust, label-efficient retinal vessel segmentation.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank reviewers’ comments and appreciation of our strengths: 1.Focusing on open-set issue of foundation model from data-centric AI view is novel 2.Tackling label scarcity and domain shift issue is practical and challenging 3.Technical approach is well-motivated and thoughtfully designed 4.Experiments in synthetic and real-world scenarios are comprehensive 5.Visual representations in the charts, textual clarity, and organization are commendable

Q1 Method Novelty@R2+R3. We highlight that our main novelty is identifying that leveraging foundation models and abundant real-world unlabeled data is one of the crucial factors for enhancing real-world robustness. The method is highly targeted and effective, leading to consistent improvements on unseen data.

Q2 Clarify Method Design and Reproducibility@R2+R3. The teacher-student framework aims to transfer the most relevant knowledge from foundation model. This has the advantage of being adaptable to mainstream segmentation networks. The neighbor-voting strategy does miss some tiny vessels. But currently, since noise-free is unachievable, suppressing open-set noise (a far more prevalent issue than tiny vessels) is prioritized, so we address it first. We have already organized and will release the code+benchmark, and provide detailed setting descriptions for future research.

Q3 Additional Dataset, Backbone@R2+R3. In fact, we have used famous datasets and more backbones. We used DRIVE and CHASEDB1 as the alternative train set to STARE. We also used U-Net and DeepLabV3. The results show consistent improvement, across various labeled data and backbones. Due to the 8-page limit, these results were not included. We appreciate your suggestions and plan to add a discussion in Experiment part.

Q4 Comparison@R2+R3. The 3 methods of NLL in Table 3 are label-correction methods. However, due to the lack of handling open-set noise, the results are not ideal. Additionally, the methods included in Table3 are those with relatively good performance in this challenging scenario. For newer methods, we actually have experimental data and can directly replace them. For instance, SDCL from MICCAI2024 achieved a lower 61.4% mean dice.

Q5 Module Analysis@R2+R4. Since we designed a multi-scale weighting strategy, our approach is robust and has no extra hyperparameters. Under 3 labeled sets and backbones (Q3), it consistently achieves improvements. By calculating the prototype (anchor) and decision boundary for each class, we can readily identify open-set noise in unlabeled data. We experimented with threshold adjustments of 5% and 10%, and the method remained quite robust, delivering consistent performance. Our method is characterized by robustness, efficiency, and the absence of extra hyperparameters.

Q6 Failure Analysis@R4. We emphasize that making full use of unlabeled data and suppressing label noise caused by the domain are important factors leading to more robust segmentation, as evidenced by the consistent improvement in external validation. Thanks for pointing out that due to the difficulty in segmenting tiny vessels, there are indeed failed cases in the current version. As pioneer research on this field, we prioritized addressing the former, which is of greater importance. The latter is indeed valuable future work, and we will mention it in the conclusion.

Q7 Writing Typos@R2 Thank you for pointing out the formula error, table format issues, and unclear descriptions. We will make revisions according to your suggestions. For example, “x belongs to o” in Equation 7.

Summarization: This paper rethinks RVS from a data-centric perspective and argues for the importance of effectively utilizing large-scale unlabeled data. Additionally, when employing foundation models SAM as label-efficient tools, we identified and mitigated the OSLN issue. Although limited space prevented a thorough explanation of all research aspects, this work is valuable for the community and may inspire further researchers.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

This work recieves mixed comments, however most reviewers raise concerns on its technical novelty due to its reliance on previously established techniques such as pseudo-label refinement, open-set estimation, and teacher-student learning. Notably, the core component—open-set pixels—was challenged for lacking sufficient explanation and analysis. The paper also misses comparisons with more recent methods and standard benchmarks. Therefore, this work is invited to give a rebuttal.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Post-rebuttal, R2 moved from “Weak Reject” to “Accept,” joining R4’s positive verdict; only R3 remains negative. I read the rebuttal and find that it answers most open points. The paper offers a well-motivated, data-centric strategy that curbs open-set label noise in SAM pseudo-labels, improving retinal-vessel segmentation under domain shift. Experiments on synthetic and real datasets show consistent gains over SOTA, and the method is technically sound with reasonable novelty. Remaining issues are incremental and can be polished later. Recommend ACCEPT.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper addresses label-efficient retinal vessel segmentation, combining open-set detection and noise-aware training in a coherent, practical framework. Reviewer 1 raised key concerns in the initial review but expressed that these were fully addressed in the rebuttal and revised their stance to acceptance. Reviewer 3 is positive, acknowledging the paper’s utility and contribution despite some limitations.

Reviewer 2 remains strongly critical and suspects the submission is a minimally revised resubmission from a prior cycle. While some of the evaluation and novelty concerns are valid, the technical correctness of the work has not been fundamentally challenged, and the empirical results appear sound.

Given the relevance of the problem, the soundness of the method, and the rebuttal’s effectiveness in resolving key points raised by multiple reviewers, I recommend acceptance.

back to top

Towards Robust Retinal Vessel Segmentation via Reducing Open-set Label Noises from SAM-generated Masks

Author(s):