Abstract

Anomaly detection is an emerging approach in digital pathology for its ability to efficiently and effectively utilize data for disease diagnosis. While supervised learning approaches deliver high accuracy, they rely on extensive annotated datasets, suffering from data scarcity in digital pathology. Unsupervised anomaly detection, however, offers a viable alternative by identifying deviations from normal tissue distributions without requiring exhaustive annotations. Recently, denoising diffusion probabilistic models have gained popularity in unsupervised anomaly detection, achieving promising performance in both natural and medical imaging datasets. Building on this, we incorporate a vision-language model with a diffusion model for unsupervised anomaly detection in digital pathology, utilizing histopathology prompts during the reconstruction process. Our approach employs a set of pathology-related keywords associated with normal tissues to guide the reconstruction process, facilitating the differentiation between normal and abnormal tissues. To evaluate the effectiveness of the proposed method, we conduct experiments on a gastric lymph node dataset from a local hospital and assess its generalization ability under domain shift using a public breast lymph node dataset. The experimental results highlight the potential of the proposed method for unsupervised anomaly detection across various organs in digital pathology. Code: https://github.com/QuIIL/AnoPILaD.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2270_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/QuIIL/AnoPILaD

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanJia_PathologyInformed_MICCAI2025,
        author = { Wang, Jiamu and Byeon, Keunho and Song, Jinsol and Nguyen, Anh and Ahn, Sangjeong and Lee, Sung Hak and Kwak, Jin Tae},
        title = { { Pathology-Informed Latent Diffusion Model for Anomaly Detection in Lymph Node Metastasis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {452 -- 461}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose an anomaly detection method for detecting lymph node metastasis in an unsupervised way using a text conditioned latent diffusion model reconstruction as a measure of out of distribution score

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A recipe to getting prompts for the WSIs through a combination of manual keyword curation and using the vision language model CONCH to get paired data of images and prompts
    • Good evaluation done both at the WSIs level and patch level. In addition to AUC & AUPR, the authors also reported Dice and IoU to get a sense of whether the model is able to look at the right region to predict a WSI to be OOD
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • It is not clear whether the performance is driven by the prompts on patches that have not been seen during training. For. eg. in the test OOD dataset, there are highlighted 6% and 4% of keywords that the model has not seen during training. If the model is prompted with these keywords, the reconstructions are going to be higher. In that sense, these labels are leaking whether a patch is OOD
    • It would be good to include the prompts used for AnoPILAD in Fig.3 and the actual H&E counterparts as well
    • It is a minor improvement over AnoDDPM with the diffusion model being a conditional generative model
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My major concern is stemming from the prompt leaking the information that a patch is OOD as mentioned in the weakness. One way to rectify this is to come up with a list of keywords that are overlapping between OOD and In-dist in the test set so that we could disentangle how much of the performance is driven by the prompt

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces AnoPILaD, a latent diffusion model for anomaly detection in lymph node WSIs. The primary objective of the work is to enhance model accuracy for anomaly detection by applying an unsupervised learning approach to address the lack of annotated image data in digital pathology. AnoPILaD uses a reconstruction-based approach to learn healthy tissue patterns combined with a visual language model strengthening the model’s inductive bias during reconstruction and improving model sensitivity. The primary goal is to provide a robust model capable of detecting lymph node metastases with high accuracy, regardless of the lymph node localization or organ type.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The work introduces an innovative and refined latent diffusion model for anomaly detection that integrates reconstruction-based learning with visual language models and shows better performance compared to other current methods. The model was evaluated on two distinct datasets: a local dataset of gastric lymph nodes and the publicly available Camelyon16 Challenge dataset for breast lymph nodes. It is thoroughly benchmarked against 4 density-based methods and four reconstruction-based models. The evaluation metrics, including the area under the curve (AUC) and the area under the precision-recall curve (AUPR), were calculated on a patch level, demonstrating that the proposed model outperforms density-based methods by a significant margin and slightly performs better than reconstruction-based methods on the local dataset. The model also performs robustly on the public dataset, maintaining the least decrease in performance compared to other methods indicating its generalizability across different datasets. Performance evaluation on the WSI level also shows strong results on the local dataset with a performance drop on the public dataset. However, it shows a smaller performance drop compared to the other methods. These findings suggest that the proposed model is feasible and robust for detecting lymph node metastasis making it a promising approach for digital pathology.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There is an inconsistency between the dataset sizes mentioned in-text (section 3.1) and the size reported in Table 1. This could lead to confusion and misinterpretation of the dataset. The paper should clarify this discrepancy to ensure a consistent and transparent description of the dataset.

    While the authors mention the amount of in-distribution and out-of-distribution data, the base prevalence is not explicitly mentioned in the paper. Since the base prevalence is important for accurately interpreting the AUPR, I would recommend that the authors clarify it to ensure a better understanding of the impact of the AUPR results.

    In Section 3.2 it is stated the model was trained for 400,000 steps. Could you please clarify why this seemingly arbitrary number was chosen instead of an early-stopping approach?

    The tables are not self-explanatory. Please extend the captions so the full text doesn’t have to be read to understand the full content of the tables.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The citation order is seemingly randomly assigned and should be corrected in the final version for readability and clarity. Crosslinks to figures, tables, and literature do not work and should be added to the final version. Typo on page 6 line 9: „samll“ small Table 2. NLL and Regret have the same values for both datasets. Please check if those value are correct.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a structured presentation and evaluation of a new model for lymph node anomaly detection. The presented model shows a slightly better performance compared to similar reconstruction methods while showing better robustness on an independent dataset. The evaluation seems comprehensive and indicates that the suggested combination of latent diffusion models and vision-language models to improve anomaly identification in pathological images is a promising approach. However, there are some inconsistencies in the dataset sizes mentioned in-text and in Table 1, which should be clarified before acceptance. Additionally, the base prevalence for the AUPR should be clarified, to ensure interpretability of the presented results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors adressed the concerns of the review with a satisfying scope. The paper is an interesting option for a poster presentation.



Review #3

  • Please describe the contribution of the paper

    The study utilizes vision-language models effectively to steer reconstruction of normal tissue in a semantically meaningful way, facilitating the differentiation between normal and abnormal tissues. The authors propose AnoPILaD, a latent diffusion-based unsupervised anomaly detection framework guided by pathology-specific text prompts. It also demonstrates robust generalization across organs (gastric and breast lymph nodes) with substantial improvements over state-of-the-art density and reconstruction-based methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Integrating latent diffusion models with clinically curated text prompts is novel and compelling. -Extensive comparison against multiple baselines (VAE, AE, MemAE, AnoDDPM). -Uses cross-domain testing (gastric vs. breast nodes), strengthening claims of generalizability.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There is no dedicated ablation study on the effect of prompt quality or weighting strategy.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduces a significant advancement in the unsupervised detection of lymph node metastatic disease in digital pathology. The use of vision-language models for semantically guided diffusion reconstruction is original, effective, and well-validated. The authors demonstrate a rigorous comparison and clear gains, making this paper a good candidate for acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

● Reviewer #3 raised a concern whether the performance is driven by prompt leakage (unseen or OOD-specific keywords): We clarify that all 74 pathology keywords in the pool specifically describe normal lymph nodes and contain no OOD-specific keywords. The exact same set of keywords was used for both training and inference (i.e., both in-distribution and OOD). Fig. 2 shows the distribution of the top 10 frequent keywords. While two keywords appear more frequently in the OOD test set (with 6% and 4% frequencies), they were also present during training, albeit at lower frequencies. This variation is expected, as the underlying image features differ between normal and abnormal images, which naturally results in different keyword associations. In summary, our method does not introduce unseen or OOD-specific keywords. Thus, there is no prompt leakage.

● Reviewer #2 mentioned the inconsistency in the dataset sizes: We have double checked the dataset sizes in the manuscript. The C16 WSI counts are incorrect, which should be:32 valid, 80 in-distribution, 49 OOD, and 22 OOD macro. We apologize for this confusion and will fix errors in the final manuscript.

● Reviewer #2 asked to include base prevalence to interpret our AUPR results: In response to the reviewer’s comment, we computed prevalence values as follows: 0.3976 (LH patch); 0.4956 (LH slide); 0.1882 (C16 patch); 0.3798 (C16 slide); 0.2157 (C16 Macro slide). The prevalence values for C16 patch and C16 Macro slide are lower than those in the other cases, but they are not extremely low. We will include these details in the final manuscript to facilitate accurate interpretation of the results.

● Reviewer #2 raised a concern on the 400,000 training steps: Due to the page limit, the training details were not fully illustrated in the manuscript. We note that during training, we used FID (Frechet Inception Distance) to measure the quality of the images generated by the diffusion model. At 400k steps, the model achieved the lowest FID, which was chosen for subsequent testing. We will clarify this in the final manuscript.

● Reviewer #3 mentioned that our method is a minor improvement over AnoDDPM: We partially agree with the reviewer that our method shares a similarity with AnoDDPM in its use of DDPM. However, our method introduces a distinct contribution by incorporating pathology-specific prompts, based on normal pathology keywords, into the diffusion process. Although a conditional diffusion model is widely used, we uniquely leverage a tailored text prompt built base upon domain-specific normal keywords. Both quantitative and qualitative assessments demonstrate the substantial impact of this approach, with particularly notable performance gains observed on C16, which has tissues from a different organ and institution. These results emphasize our method’s robustness to domain shifts.

● Reviewer #1 pointed out that there is no ablation study on the effect of prompt quality or weighting strategy: We believe that the quality of the text prompt influences model performance. To ensure the quality of the prompt, we consulted an experienced pathologist to confirm the relevance of the selected keywords. However, we were unable to examine the effect of prompt quality or weighting scheme within this study. We acknowledge this as an important avenue for future study.

● The reviewers pointed out specific errors and areas for improvement: In Table 2, we will correct the values of NLL and Regret for C16 as follows: 0.3250 AUC and 0.1600 AUPR for NLL and 0.6480 AUC and 0.3441 AUPR for Regret. In Figure 3, we will include the H&E counterparts. We acknowledge the issues with citation order, crosslinks, and typos. We will thoroughly review and address these in the manuscript to enhance the readability and accuracy.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The reviewers generally agree that the integration of text-guided latent diffusion with reconstruction-based anomaly detection presents a novel and timely contribution for digital pathology, particularly in the challenging task of lymph node metastasis detection.

    At the same time, a few important concerns were raised that should be addressed in the rebuttal. First, reviewers questions whether performance may be partially driven by prompt leakage—i.e., using unseen or OOD-specific keywords that could bias the anomaly score. This issue could be clarified by describing how prompts are assigned at inference and whether prompt overlaps between training and test data are controlled.

    Second, while all reviewers appreciate the scope of the evaluation, minor inconsistencies in dataset size reporting, table clarity, and missing details (e.g., rationale for training duration, use of early stopping) should be clarified to strengthen the submission.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposes AnoPILaD, a prompt-guided latent-diffusion framework for unsupervised detection of lymph-node metastasis. All reviewers acknowledge its clear methodological novelty—combining pathology-curated keywords with diffusion reconstruction—and its strong empirical gains across two datasets versus both density- and reconstruction-based baselines. The authors’ rebuttal clarifies dataset counts, prevalence, training schedule, and demonstrates that identical keyword pools are used for in-distribution and OOD samples, partially addressing R3’s prompt-leakage concern. Remaining issues (ablation on prompt weighting, extended captions, code release) are minor and can be resolved in camera-ready. Overall, methodological soundness, clinical relevance, and robust evaluation justify acceptance for publication at the conference.



back to top