Abstract

Scalp disorders are highly prevalent worldwide, yet remain underdiagnosed due to limited access to expert evaluation and the high cost of annotation. Although AI-based approaches hold great promise, their practical deployment is hindered by challenges such as severe data imbalance and the absence of pixel-level segmentation labels. To address these issues, we propose ``ScalpVision’’, an AI-driven system for the holistic diagnosis of scalp diseases. In ScalpVision, effective hair segmentation is achieved using pseudo image-label pairs and an innovative prompting method in the absence of traditional hair masking labels. Additionally, ScalpVision introduces DiffuseIT-M, a generative model adopted for dataset augmentation while maintaining hair information, facilitating improved predictions of scalp disease severity. Our experimental results affirm ScalpVision’s efficiency in diagnosing a variety of scalp conditions, showcasing its potential as a valuable tool in dermatological care. Our code is available at https://github.com/winston1214/ScalpVision.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/5080_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/winston1214/ScalpVision

Link to the Dataset(s)

Scalp Dataset: https://aihub.or.kr/aihubdata/data/view.do?searchKeyword=%EB%91%90%ED%94%BC&aihubDataSe=data&dataSetSn=216

BibTex

@InProceedings{KimYou_Scalp_MICCAI2025,
        author = { Kim, Youngmin and Kim, Saejin and Moon, Hoyeon and Yu, Youngjae and Noh, Junhyug},
        title = { { Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a new framework ScalpVision for 1) label-free (synthetic pseudo-label based) hair segmentation; 2) scalp disease classification, by addressing class imbalance via data augmentation, using a generative model DiffuseIT-M. The segmentation combines a U2-Net prediction (the model is trained on synthetic images/labels by drawing curves on hair-free patches) and a SAM prediction, where the prompt for SAM is derived from the initial mask. DiffuseIT-M performs image-to-image translation, changing the scalp condition while preserving hair content. The segmentation approach is evaluated on 150 manually annotated images from a specialized dataset from AI-Hub. DiffuseIT-M is evaluated against DiffuseIT and ACG qualitatively and quantitatively. Resulting classification performance is also evaluated (F1 score, macro and per-class).

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • I believe the proposed approach and application have novelty.
    • Results show significant improvement w.r.t. the baselines. Qualitative results look compelling, which should be a good indication that the methodology is sound.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    I am much less convinced of the correctness of many of the maths and equations as reported in the paper. My initial rating of the paper will reflect this, but I will gladly update my rating if the authors point out factual errors in my review or provide updated formulas that can be reasonably changed in the final version.

    • Algorithm 1:
    • Line 4, especially H_copy - Dilated, seems unlikely to be correct. A Quick Look at the code seems to indicate that instead of dilation, an opening is done.
    • I am not convinced that Line 9 and Eq 1 match what is implemented either (actually there is no mention of PCA in the paper but there is in the code).
    • Eq. 3: the loss l_mask depends on x_src and x_trg, which do not depend on the predicted noise at timestep t. Maybe x_trg should be replaced by hat{x}_0 as Fig. 1 seems to indicate ?
    • Eq. 4: x_{t-1}, and so also x_0, is equal to x_t wherever M=0 (so almost everywhere). I am not sure this makes sense. I am not sure about the second term on the RHS either.
    • I am surprised that M is obtained from hat{M} and M_AP with the (intersection) AND operator as stated. Indeed looking at Fig.3., the segmented hair in hat{M} looks thinner than in M, which should not be possible if I am not mistaken.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Eq. 5 also contains at least a few typos.
    • I do not fully understand the authors’ analysis of mask guidance 1-M in relation to Fig. 5. It looks like regardless of the choice of guidance, there is significant transfer of scalp color (and also hair location). Can you please clarify?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My current assessment is that the paper is borderline, but I am willing to improve my rating if the authors can convince me that the maths are correct, or that they can be made correct without significant changes to the paper in the final version.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    My assessment is still that the paper is borderline.

    The rebuttal did not remove my main concerns about the correctness of the maths in the paper w.r.t. what is implemented.

    • Eq. 5: no response from authors
    • Eq. 4: after the rebuttal, I still do not understand the second term in the RHS, which involves xhat_0 and a gradient w.r.t. the loss (???).
    • There seems to be a discrepancy between the results of Fig. 5 and Eq. 4.
    • Re. absence of mention of PCA, I think it should definitely be mentioned even if not fully detailed due to space limits. Currently I do not understand how the mean of the points in the hair mask can serve as a prompt for SAM when hair typically is non convex, so shouldn’t the mean generally fall outside of the actual hair strand?

    Between accept/reject: I chose accept purely because the code is provided and the results seem good, which could make the paper reproducible even if the content is not entirely correct.



Review #2

  • Please describe the contribution of the paper

    Authors present a system for diagnosing different conditions on scalp images. There are two important obstacles to tackle. First, existing training sets are highly imbalanced due to the rarity of the conditions. Towards this end, they want to augment their datasets with image-translation. However, this requires hair segmentation and there are no datasets with manually labeled hair segmentations. They tackle both of these challenges and propose a system that achieve better results than the other alternatives.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Use of synthetically generated data to get rough segmentations is a good idea.
    • Combining two segmentation predictions to improve the final result works pretty well.
    • Mask guidance for translation seems very useful.
    • The translated images look very realistics.
    • The final results seem convincing. Classification accuracy for almost all the conditions increases when training set is augmented with the proposed method.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • While the segmentation part is very well explained, the image translation part can be explained better. More specifically, it is unclear how the generated image matches the desired condition.
    • The presented solution is very specific to the application.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. it is a solid application article.
    2. authors modify existing technology to solve problems related to their applications. modifications are very reasonable.
    3. the developed techniques are rather specific to the application.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a framework, ScaleVision, to automate scalp diagnosis: presence of scale diseases and classify their severity. For this task, the authors first train a hair segmentation model (Unet) on a synthetic dataset that they create by simulating hair and dandruff on clean patches of microscopic scalp images. The predicted masks from Unet are then used to sample positive and negative point prompts for SAM based on the proposed algorithm. The authors also propose a mask-conditional image2image translation model termed DiffuseIT-M to create a diverse training dataset for scalp diagnostic. The mask conditioning for the model is a combination of the SAM-generated mask and the predicted mask from the UNet. Using DiffuseIT-M, the authors generate an augmented training dataset to train a scalp condition classifier for scalp diagnostics. The proposed method achieves superior performance on the AI-HUB dataset on the task of hair segmentation as well as image translation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The framework is well-designed to utilize the segmentation foundation model as well as the pretrained diffusion model for segmentation and image translation in a training-free setup. The authors show the effectiveness of the proposed approach to tackle data imbalance and improve diagnostics performance against several other augmentation baselines. In order to avoid false positives, and avoid prompting near the mask boundaries, they propose a simple algorithm to uniformly sample points from the predicted mask, and guide SAM’s predictions to generate high quality mask for conditioning the diffusion model. The paper is well organized and easy to follow.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Effect of mask guidance appears similar to some extent in Fig 7, a quantitative evidence would be a good addition. The authors should also report the size of the test set used for evaluating image translation. Although the authors compare report the performance of classification prior augmentation works using 2 different model, I wonder why the authors didnt compare against other simple augmentation strategies like MixUp, CutMix or alpha-blending. While creating the synthetic data, the authors made strong assumptions on the shape of the hair and dandruff, I wonder if they tested with other primitives and how it affected the segmentaion performance. It would be nice if the authors could also provide some qualitative samples of the simulated data based on severity of scalp disease.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a novel combination of exploiting SAM and a pretrained diffusion model for image translation in a training-free setup. The method shows potential as a good augmentation strategy.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    the authors have successfully addressed my concerns




Author Feedback

We thank the reviewers for their constructive feedback. We will correct all errors and, within the page limit, add experiments and figures to address every concern.

[R1]

  1. In Algorithm 1, we abbreviated opening (erode + dilate) for brevity. To avoid confusion, we will update the pseudocode to: MORPH_ERODE(H_copy, kernel); opened = MORPH_DILATE(eroded, kernel).
  2. In the code, we call: mean, eigenvectors = cv2.PCACompute(…). The returned mean is merely the centroid of the points, computed internally before PCA, and does not affect Eq. 1 in the paper. The eigenvectors output supports an auxiliary module, which was implemented but omitted from the manuscript for space reasons. Thus, while the code includes PCA for extra analysis, this functionality lies outside the method in our paper.
  3. You are correct that the mask loss should use the intermediate denoised prediction \hat x_0(x_t) rather than the raw target image. We will revise Eq. (3) and its accompanying text to match Figure 1.
  4. There was a typo in Eq. (4) that inverted the mask logic. In our implementation, hair regions (M=0) are preserved and only the scalp is updated. We will correct Eq. (4) to: x_{t-1} = x_{t} \odot (M) + \bigl[\hat{x}{0} (x_t) - \nabla{x_t}\ell{total}\bigl(\hat{x}_{0}(x_t)\bigr)\bigr] \odot (1-M), so that the first term leaves hair pixels unchanged and the second term applies updates exclusively to scalp regions.
  5. We previously visualized pre-threshold score maps, which thinned \hat M. We will replace these with true binary masks for both \hat M and M_{\rm AP} so mask widths reflect actual segmentation.
  6. Style loss is global by design and can shift color even where M=0. In our observations, although the overall hue changes, closer inspection reveals unnatural artifacts (e.g., color bleed, texture mismatch) compared to real images. We will add a brief note explaining this behavior and include the quantitative mask-guided ablation from R3-1 to demonstrate how mask guidance localizes edits.

[R2]

  1. Section 2.2 gave a high-level overview of DiffuseIT-M, but we agree that detailing how the source image, its hair mask, and the denoised prediction jointly steer the reverse diffusion—preserving hair content inside M while transferring scalp style—is essential; we will expand that discussion.
  2. While we focused on scalp health due to the unique challenge of missing hair segmentation labels, the same “preserve critical region + edit context” mask-guided diffusion applies to other medical tasks (e.g., preserving tumor masks while editing surrounding tissue), and we will highlight this in the revised manuscript.

[R3]

  1. We will include an ablation study to evaluate the impact of mask guidance by applying our segmentation method to the generated results. To compare with previous studies, we evaluated with the Jaccard index: our method achieved 0.4485, outperforming DiffuseIT (0.3619) and AGG (0.3785). (See the discussion in R1-6.)
  2. We have conducted an evaluation on all 50,639 augmented images in comparison to the original ground truth images. We will include this information to clarify our evaluation process.
  3. We experimented with mixing-based augmentations but found they break multi-label integrity (combining two condition labels yields ambiguous or unbalanced targets), making them unsuitable for our multi-condition classification. We will clarify this rationale.
  4. We tested third-degree polynomial curves and ellipses for pseudo-labels, observing <1% difference in segmentation metrics. This validates our choice of simple line + circle primitives and demonstrates robustness to pseudo-label design.
  5. If space permits in the camera-ready version, we will include a concise grid of synthetic images across severity levels for each condition to illustrate the diversity and realism of our augmentations.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I have read the manuscript, review comments, rebuttal letter. All reviewers recommend acceptance. This meta reviewer believes that the authors did a good job in addressing concerns.



back to top