Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Fluorescent staining is crucial for studying the morphology and dynamics of subcellular structures in biological and medical research, though being slow, expensive, and causing phototoxicity in live cells. Existing methods use deep generative models for image-to-image translation to generate diverse fluorescent images of subcellular structures. However, the pixel-level image generation approaches struggle to preserve fine structural details during the reconstruction process. In this paper, we introduce DiffStain, a novel approach that leverages mask-guided diffusion models for semantic virtual staining. The goal is to generate fluorescent images based on a brightfield input image. Rather than relying on deliberately selected image filters for subcellular structure segmentation, our approach employs an unsupervised deep neural spectral clustering method to combat the noisy and ambiguous structural boundaries. We also integrate mask guidance into the reverse denoising process, which helps highlight the regions of the subcellular structures that require precise representation in the generated fluorescent images. The masks produced by the spectral clustering model provide valuable feedback, enabling iterative refinements of the fluorescent images. Experiments showcase that our DiffStain method achieves state-of-the-art virtual staining performances on public microscopy datasets. Code is available at: https://github.com/StrengthInNumber/DiffStain.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2432_paper.pdf

SharedIt Link: https://rdcu.be/eHw8o

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_14

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/StrengthInNumber/DiffStain

Link to the Dataset(s)

https://open.quiltdata.com/b/cellpainting-gallery/tree/cpg0000-jump-pilot

BibTex

@InProceedings{HanYik_DiffStain_MICCAI2025,
        author = { Han, Yikai AND Jiang, Jimao AND Pei, Yuru},
        title = { { DiffStain: Conditioned Diffusion-Based Semantic Virtual Staining with Mask Guidance } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {139 -- 148}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper introduces DiffStain, a new framework for generating subcellular structure-specific fluorescent images from brightfield images. DiffStain employs a conditioned diffusion model, where subcellular structure masks guide the iterative denoising process. It also incorporates an unsupervised deep neural spectral clustering (NSC) module to extract masks. The generated masks are used to guide the denoising process, ensuring the output highlights the structures of interest. The framework is evaluated on a public microscopy dataset, showing good performance compared to existing methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The studied topic is of biological importance and potential for scientific translation.
- The paper is well-written and easy to follow.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The motivation of the proposed method needs to be further elaborated. The authors briefly mentioned that there have been several works using diffusion models for fluorescent image generation. But their limitaions and the necessity to propose a new method are still a bit unclear.
- Technically, the proposed method integrates spectral-based unsupervised segmentation with a mask-guided diffusion model. While its application to fluorescent image generation is novel, the underlying methodology is well established. Therefore, the extent of the technical novelty may be questionable.
- Only one dataset with relatively small sample size (i.e., 2k images) was used. The paper would greatly benefit from a more extensive evaluation using more diverse datasets to showcase the generalizability of the method.
- Given that generating virtual fluorescent images is only the first step in downstream biological studies, the authors are encouraged to provide examples demonstrating how improvements in image quality could facilitate further quantitative analysis.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Excellent writing quality, while the technical novelty and evaluation can be further strengthened.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Most of my concerns have been addressed, and I appreciate the authors’ efforts. While a few issues remain unresolved in the current version, the strengths of the paper outweigh the weaknesses, and I therefore recommend it for acceptance.

Review #2

Please describe the contribution of the paper

The authors present “DiffStain,” an innovative method for generating fluorescence images from brightfield images, utilizing a mask-guided diffusion approach. This technique leverages masks generated through neural spectral clustering to preserve subcellular features effectively while providing robustness against noise within brightfield input images. Experimental results indicate that DiffStain outperforms current state-of-the-art methods by producing sharp and well-defined fluorescence images, thereby enhancing the visualization of structures of interest.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper offers several notable strengths. Firstly, DiffStain introduces mask guidance to direct the diffusion model for image generation, transforming brightfield images into fluorescence images. The use of neural spectral clustering for the extraction of subcellular structure masks, which guide the diffusion process under noisy conditions, is particularly novel. Additionally, the evaluation of the proposed method is extensive, demonstrating DiffStain’s superiority over competing methods. Crucially, the authors have made their source code publicly available, significantly enhancing the reproducibility of their approach.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The proposed method appears to heavily rely on the existence of an anchor image. However, details regarding the acquisition of the anchor image and its specific contents are not adequately addressed in the paper. Instead, the paper emphasizes the utilization of the anchor image, particularly in its enhancement of the affinity matrix for consistent subcellular structure identification and clustering across images. Furthermore, the implications of anchor image selection on the method’s generalizability warrant further discussion. Additional complexity introduced through mask guidance aims to improve image generation quality. Please add a discussion on how this affects the generation. Finally, I have concerns regarding the method’s generalizability to datasets other than JUMP Cell Painting, please comment on that. Lastly, the reliance on the pre-trained DINOViT model for spectral clustering. Have the authors explore different options to featurize patches for image guidance. Please comment on that.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

P
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite the absence of certain technical details and the need for clarification on specific aspects, I believe these issues are minor. Accepting this work would be advantageous to the research community.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

I thank the authors for addressing my comments. However, I understand that the performance of the method strongly depends on the selection of the anchor image. Moreover, performance robustness to the selection of the anchor image were questionable. The performance may be only good with this selection, not others. Therefore, I suggest a rejection to this paper.

Review #3

Please describe the contribution of the paper

The paper presents DiffStain, a mask-guided conditioned diffusion model designed to generate subcellular structure-specific fluorescence images from brightfield microscopy inputs. The authors introduce an unsupervised neural spectral clustering (NSC)-based masking scheme to efficiently identify subcellular structures, and use mask guidance during the denoising process to enhance online virtual staining by highlighting subcellular structures of interest. Experimental validation demonstrates DiffStain’s superiority over state-of-the-art methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Novel mask-guided diffusion approach: The paper introduces a novel framework that leverages NSC-derived masks to guide the denoising process in diffusion models, enabling structure-aware fluorescence image generation from brightfield inputs.
2. Strong and comprehensive evaluation: The proposed method is validated with both quantitative and qualitative comparisons, as well as ablation studies, clearly demonstrating its superiority over existing approaches and highlighting the impact of mask guidance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
(Major)
1. Limited scope of quantitative analysis: While Table 1 highlights improvements in RNA and ER channels, omitting equivalent data for DNA, Mito, and AGP channels leaves the comparative assessment incomplete.
2. Redundant visualization strategy: Figures 2 and 3 both focus on qualitative results without illustrating the mask guidance mechanism. Replacing one (e.g., Figure 2) with a schematic demonstrating examples of NSC masks and its conditioning effect during denoising would better contextualize the core technical contribution. (Minor)
3. Anchor image selection criteria in NSC?
4. Empirical vs. theoretical basis for structure patch size (q) optimization per fluorescence channel?
5. Computational overhead analysis of mask-guided denoising?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper makes significant methodological progress through its mask-conditioned diffusion framework, demonstrating both technical novelty (NSC-guided denoising) and practical utility. While the current validation could be strengthened through expanded visualizations of the masking mechanism, the reproducible framework and open-source implementation offer substantial value to the computational microscopy community. Addressing the identified comments, this paper has strong potential to influence both methodology development and biological image analysis applications.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the reviewers for their efforts in reviewing our paper and for their constructive comments. @R1: Motivation. Existing methods use generative models for image-to-image translation to generate fluorescent images [6, 8, 13, 14, 20]. Class-guided DDPM relied on carefully prepared class priors to guide the denoising process [8]. However, pixel-wise image generation often modifies the entire image to align with the target domain’s distribution, which can lead to the loss of local structural details. The proposed DiffStain is an efficient mask-guided conditioned diffusion model for generating subcellular structure-specific fluorescence images. @R1: Technical novelty. To the best of our knowledge, we are the first to present an efficient mask-guided conditioned diffusion model for generating subcellular structure-aware fluorescence images. We present an unsupervised NSC-based masking scheme, enabling the efficient identification of subcellular structures and providing mask feedback in online virtual staining. @R1, R3: Dataset. In our current work, we evaluated the proposed approach using the JUMP Cell Painting dataset [4]. We used approx. 20,000 brightfield images with the same dataset setting as [8]. We agree that evaluation using more diverse datasets is helpful to showcase the generalizability of the method, which will be explored in future work. @R1, R2: Quantitative analysis. Table 1 reports the quantitative results of virtual staining by compared methods, measuring the consistency between the predicted fluorescence images and the ground truth. We agree that more quantitative analysis on downstream tasks would be helpful to demonstrate improvements in image quality. We conducted a quantitative analysis of five types of subcellular structures and reported the average performance in Table 1. We show two structures of RNA and ER due to limited paper space. We would provide results of all structures in an extended version. @R2: Visualization strategy. We agree it would be helpful to show qualitative results of the mask guidance. We have assessed the effectiveness of the mask guidance in the ablation study (Table 1 and Fig. 3). The variant without the mask guidance is the same as Palette [20]. Fig. 3 shows the side-by-side comparison of Palette and ours, highlighting the importance of the mask guidance. @R2, R3: Anchor image selection. We augment the affinity matrix with information from an anchor image, ensuring consistent subcellular structure identification across images. We arbitrarily select an anchor image with structures of interest from subdivided small FOV images. By using the shared anchor image, we synchronize cluster assignments across small FOV images. @R2: Patch size. To account for the fine granularity of subcellular structures, the image is subdivided into small FOV images, where the patch size is comparable to subcellular structures. In our current experiments, the resolution of the small FOV image is set to 224x224, as the pre-trained DINOv2 model [15] and q=224. The patch size is set to 8x8. @R2, R3: Computational overhead. The average inference of a 512×512 five-channel fluorescence image requires 3 minutes. The computational complexity of NSC is relatively low, requiring only 7.5 seconds to generate masks from five-channel fluorescent images regarding different subcellular structures. The mask-guided denoising process has a relatively large time complexity and requires approx. 65 seconds. @R3: Patch features. We leveraged DINOViT features [15], which capture long-range relationships between repetitive fine-grained structures through self-attention. We agree it would be helpful to consider different options to feature patches, such as handcrafted and learned features, as well as those from pre-trained foundation models, to be explored in future work.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

The paper introduces DiffStain, to generate subcellular structure-specific fluorescent images from brightfield images. DiffStain incorporates a subcellular structure masks to guide the iterative denoising process, while the masks are extracted using an unsupervised deep neural spectral clustering (NSC) module. All three reviewers appreciate the methodological novelty, experimental results, as well as code release. They have concerns about improvements in other fluoresence channels (R2), computational overhead of mask guidance (R2), Anchor image selection (R2, R3), visualization to show the advantage of mask (R2). R1 and R3 also ask generalization beyond JUMP. Whilst I think it is not absolutely necessary as Jump is a huge multi-cohort dataset, yet I am not entirely clear how authors to select those 10 plates from jump and do train/test split. It would be very beneficial if the model can be generalised between diffferent cohorts/cell lines and drug/genetic pertubation.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

DiffStain: Conditioned Diffusion-Based Semantic Virtual Staining with Mask Guidance

Author(s):