Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The accurate segmentation of myocardial scars from cardiac MRI is essential for clinical assessment and treatment planning. In this study, we propose a robust deep-learning pipeline for fully automated myocardial scar detection and segmentation by fine-tuning state-of-the-art models. The method explicitly addresses challenges of label noise from semi-automatic annotations, data heterogeneity, and class imbalance through the use of Kullback-Leibler loss and extensive data augmentation. We evaluate the model’s performance on both acute and chronic cases and demonstrate its ability to produce accurate and smooth segmentations despite noisy labels. In particular, our approach outperforms state-of-the-art models like nnU-Net and shows strong generalizability in an out-of-distribution test set, highlighting its robustness across various imaging conditions and clinical tasks. These results establish a reliable foundation for automated myocardial scar quantification and support the broader clinical adoption of deep learning in cardiac imaging.The codes are available at: https://github.com/Danialmoa/YoloSAM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2947_paper.pdf

SharedIt Link: https://rdcu.be/eHxaI

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_51

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Danialmoa/YoloSAM

Link to the Dataset(s)

N/A

BibTex

@InProceedings{MoaAid_Robust_MICCAI2025,
        author = { Moafi, Aida AND Moafi, Danial AND Mirkes, Evgeny M. AND McCann, Gerry P. AND Alatrany, Abbas S. AND Arnold, Jayanth Ranjit AND Mehdipour Ghazi, Mostafa},
        title = { { Robust Deep Learning for Myocardial Scar Segmentation in Cardiac MRI with Noisy Labels } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {529 -- 539}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes an automatic pipeline for robust segmentation of myocardial scars in CMR. The paper also states the problem of the evaluation when ground truth noisy label is provided by semi-automatic pipelines. To address this, the authors uses KL divergence loss in addition to Dice loss for robustness. Evaluation is on a relatively large dataset.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The problem statement is clear and relevant to the need of the clinical use. The paper also addresses an empirical problem of low quality semi-automatic ground truth label in CMR scar segmentation. The evaluation is extensive on multi-cohort datasets which can be viewed as a valid evaluation.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Though the authors propose the KL divergence loss for segmentation task in finetuning the SAM and YOLO models, robust learning already addresses similar questions (supervised learning, Symmetric Cross Entropy for Robust Learning with Noisy Labels ICCV2019). The authors need to review the related literature more thoroughly. The description of the methodology is opaque and hard to follow to details regarding the motivation and mathematical formulation of KL divergence in case of the existence of noisy label and loss formulation in Sec 2.3. This also applies to the formulation of ‘noisy label’. The implementation details are mostly missing for effectively reproducing the study. The readability of the results section needs improvement: The results section is mixed with new experimental design for examining the features extracted from a trained model from acute MI cases with Shapley analysis. The proposed method is not validated well empirically via ablation in the absence of theoretical insight of using the additional loss function. It’s unclear what the errors in YOLO will propagate to the SAM segmentation.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

It would improve the readability by using the subsections for the results parts as it already demonstrates different studies. Overall, proper mathematical formulation can help the readers to grasp the idea quickly such as clearly define the KL divergence and area similarity regarding its inputs.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper focuses on a relevant problem and provides a solution supported by existing established methods with an additional KL divergence loss to address the noisy label problem in medical image segmentation which is worth discussing in the conference. However, the method is not well presented with lack of methodological presentation and justification by either empirical or theoretical studies. The readability of the paper has room to improve.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors combine YOLO for object detection and SAM segmentation models for segmenting scar on LGE cardiac MRI data. This approach was compared to standard segmentation models with good performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This is an innovative approach to combine two SOTA computer vision models and fine-tune them for a specific medical imaging challenge.

The dataset for fine-tuning and evaluation is really strong, such a large multi-vendor, multi-cohort dataset is not widely available.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The proposed method doesn’t really outperform training a standard u-net for this data.

I would like more details on the KL-term and an ablation study to prove its benefits.

Why are the labels said to be noisy? It seems like they have been made by experts in a large clinical study. Sure, some inter/intra observer variations are expected but is this really noisy?

The approach claims to be robust ( in the title) but it is not clear what the approach is robust to (e.g. robust to domain shifts)?

Can you use SAM without bounding box prompts, it would be interesting if the YOLO model could be skipped
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation is based on the innovative approach to this challenge and the strong evaluation
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have made an excellent, and very clear, response to my initial concerns. Based on this and the remaining positive aspects of my initial review, I recommend acceptance.

Review #3

Please describe the contribution of the paper

The authors propose a supervised pipeline for myocardial scar segmentation from LGE CMR, aiming to reduce reliance on noisy ground-truth labels that are often affected by inter- and intra-observer variability. The approach begins by fine-tuning a YOLO-based architecture on expertly annotated pixel-level masks to predict bounding boxes that encapsulate myocardial scar regions. These bounding boxes, along with the corresponding images, are then used to fine-tune the Segment Anything Model (SAM) to generate the final segmentation masks. To further enhance performance, the authors incorporate an additional loss function based on Kullback–Leibler (KL) divergence. The proposed method is evaluated using multiple segmentation metrics and demonstrates superior performance compared to the state-of-the-art nnU-Net architecture. Moreover, the model is tested on a dataset featuring a different (previously unseen) type of myocardial infarction, showcasing its generalisability.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Good level of novelty: The method creatively combines two state-of-the-art models—YOLO and SAM—to address the challenge of noisy (i.e., highly variable) pixel-level scar labels. By first converting them into more consistent bounding box annotations and then using these “cleaner” labels to fine-tune the SAM architecture, the pipeline offers a more reliable approach to myocardial scar segmentation in a clinically challenging task.
- Good generalisability: The proposed pipeline is validated across eight cohorts with chronic myocardial infarction (MI), demonstrating strong robustness and generalisability across a variety of clinically relevant scenarios.
- Simplicity and extendability to other clinical problems: Myocardial scar segmentation is widely regarded as a difficult problem. The proposed method achieves competitive performance without introducing additional architectural complexity (aside from the KL loss). This makes it a promising and easily adaptable framework for other clinically relevant segmentation tasks.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Limited comparison with acute MI cohort: While not a major weakness, the evaluation of the proposed method on only one acute myocardial infarction (MI) cohort—with no disclosure of the number of patients—limits the strength of the generalisability claim. Including additional acute MI cohorts would further validate the method’s robustness across different clinical presentations.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Lack of clarity on methodological details:
1. Input resolution: As stated in Section 2.1, the included cohorts differ in spatial resolution. It is unclear whether the images were standardised before being rescaled to 256 × 256. Clarification on the preprocessing steps would be helpful.
2. SAM output mask selection: The SAM model generates multiple output masks without assigning explicit semantic labels. The manuscript does not explain how the relevant mask corresponding to myocardial infarction (MI) is selected during training or inference.
3. Acute MI cohort details: The number of cases in the acute MI cohort is not disclosed, making it difficult to fully assess the robustness and generalisability of the proposed method to previously unseen clinical scenarios.
Limited reproducibility: Unfortunately, the authors have not provided access to the code or disclosed the model hyperparameters, which limits the reproducibility and further validation of the proposed method.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors present a promising and well-motivated approach by combining two state-of-the-art models to tackle the challenging task of myocardial scar segmentation. The pipeline demonstrates strong generalisability across multiple cohorts and offers a simple yet effective framework that could be extended to other clinical applications. However, the manuscript currently lacks important methodological details, including preprocessing steps, SAM output mask selection strategy, and dataset specifics (e.g., acute MI cohort size), which are critical for reproducibility and transparency. I believe this work has merit and could be a strong contribution if these issues are addressed satisfactorily during the rebuttal phase. Hence, I recommend a weak accept, contingent on the authors’ ability to clarify and expand upon these aspects.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors provided sufficient clarifications on the methodology, do not see a reason to reject the paper.

Author Feedback

We thank the reviewers (R’s) for their very constructive and insightful feedback and for highlighting the novelty and effectiveness of our approach. Implementation details (R1, R2, R3): We will release full code and implementation details, including preprocessing and training configurations, upon acceptance, to support reproducibility. Learning with noisy labels (R1, R2): The labels are based on FWHM semi-automatic delineations. While reproducible, these labels are sharp and often misaligned with true anatomical boundaries due to thresholding artifacts, intensity heterogeneity, and partial volume effects. Hence, we consider them noisy in the sense of structured boundary uncertainty, rather than incorrect. This noise differs from the unstructured label noise addressed by the symmetric cross-entropy (SCE) loss, which was designed for image classification tasks with discrete one-hot labels, assuming randomly corrupted labels. In contrast, our method explicitly tackles spatially structured uncertainty along scar boundaries in highly imbalanced medical segmentation (scar ≪ background). We will note this distinction and cite the SCE reference accordingly. KL and Methodological clarity (R1, R2): We apply Gaussian smoothing to the target mask to obtain soft labels (continuous distributions), capturing uncertainty in transitional boundary regions. The model predictions are mapped to [0, 1] using a sigmoid activation, followed by a log-softmax to obtain the predicted distribution. The prediction and the soft target scores are then normalized as probabilities (p for foreground and 1-p for background). KL divergence (using nn.KLDivLoss in PyTorch) is then applied between the probability distributions of predicted and soft target, penalizing overconfident predictions near ambiguous regions. We will clarify this in Section 2.3 and provide a concise formulation of KL divergence in our context. Ablation study (R1, R2): We have performed ablations with and without the KL term and observed consistent improvements using the term. These results will be available in our public repository. Results section (R1, R3): We will separate experimental results using subsections and add details of the acute MI cohort (a multicenter dataset with 50 patients and 406 images). YOLO error propagation (R1): To mitigate error propagation, we first fine-tuned SAM using ground-truth boxes to learn segmentation under clean localization; then fine-tuned SAM’s decoder with YOLO-predicted boxes to simulate prompt noise. We also used box augmentations to improve robustness. We will elaborate more on this design in the revised version. Comparison with U-Net (R2): As shown in Tables 2-3, our method achieves significantly better Dice (0.60 vs. 0.58) and HD (10.7 vs. 11.8) than nnUNet (p < 0.05, Wilcoxon test). Notably, we outperform clinically meaningful metrics that reflect boundary precision and anatomical fidelity (perimeter and area similarities), supporting our method’s clinical relevance. We will indicate the significance in the revised version. Robustness (R2): In our work, robustness refers to the model’s ability to learn from uncertain annotations and generalize effectively across diverse clinical cohorts. SAM without prompts (R2): We explored SAM without prompts or YOLO alternatives (center-based, proportional boxes, decoder modifications), but consistently found lower performance, likely due to complex scar shape and class imbalance. Box prompts provide localized attention, guiding SAM to focus on relevant regions. Preprocessing (R3): The images were processed using Circle CVI software, which accounts for DICOM pixel spacing internally before resizing and center-cropping to 256×256. We will clarify this in the revised version. SAM output mask (R3): We trained SAM to predict a single, high-confidence scar mask per image using a prompted prediction, indicating the likely scar location by setting multimask_output=False. We will state this in the revised version.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

There are some concerns from reviewers needed to be solved in the rebuttal phase.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper addresses the important challenge of myocardial scar segmentation in cardiac imaging by integrating two well-established models—YOLO and SAM—into a novel and clinically relevant framework. The approach introduces an additional KL divergence loss to mitigate the impact of noisy labels, which is a significant and timely concern in medical image segmentation. This contribution has potential for generalisability and real-world application, making it highly relevant to the MICCAI community. Given the the clinical relevance of the proposed solution and positive commnets from all reviewres the paper is recommneded for acceptance at the MICCAI.

back to top

Robust Deep Learning for Myocardial Scar Segmentation in Cardiac MRI with Noisy Labels

Author(s):