List of Papers Browse by Subject Areas Author List
Abstract
Unsupervised anomaly detection (UAD) in medical imaging is crucial for identifying pathological abnormalities without requiring extensive labeled data. However, existing diffusion-based UAD models rely solely on imaging features, limiting their ability to distinguish between normal anatomical variations and pathological anomalies. To address this, we propose Diff3M, a multi-modal diffusion-based framework that integrates chest X-rays and structured Electronic Health Records (EHRs) for enhanced anomaly detection. Specifically, we introduce a novel Image-EHR Cross-Attention module to incorporate structured clinical context into the image generation process, improving the model’s ability to differentiate normal from abnormal features. Additionally, we develop a static masking strategy to enhance the reconstruction of normal-like images from anomalies. Extensive evaluations on CheXpert and MIMIC-CXR/IV demonstrate that Diff3M achieves state-of-the-art performance, outperforming existing UAD methods in medical imaging. Our implementation is available at https://github.com/nth221/Diff3M.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3259_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/nth221/Diff3M
Link to the Dataset(s)
MIMIC-CXR dataset: https://physionet.org/content/mimic-cxr/2.1.0/
MIMIC-IV dataset: https://physionet.org/content/mimiciv/3.1/
CheXpert dataset: https://stanfordmlgroup.github.io/competitions/chexpert/
BibTex
@InProceedings{KimHar_Harnessing_MICCAI2025,
author = { Kim, Harim and Wang, Yuhan and Ahn, Minkyu and Choi, Heeyoul and Zhou, Yuyin and Hong, Charmgil},
title = { { Harnessing EHRs for Diffusion-based Anomaly Detection on Chest X-rays } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15962},
month = {September},
page = {240 -- 250}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposes a diffusion model that incorporates EHR conditions for anomaly detection in Chest X-rays. The model consists of three main components: the image-EHR cross-attention module to provide EHR embeddings, the conditional noise estimator that take EHR conditions for normal X-ray generation, and the masked pixel generation network that enhances the normal reconstruction. Experiments on two EHR-Chest X-ray datasets demonstrate the effectiveness of the proposed method, and further analysis of the attention weights of EHR enhances the model’s interpretability.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper first incorporates structural clinical information into conditional diffusion models using cross-attention mechanism for anomaly detection in chest X-ray images, and proposes a new masking strategy that enhances the reconstruction of normal images. 2.The manuscript is well-written. Extensive experiments on CheXpert and MIMIC-CXR/IV datasets demonstrate the effectiveness of the proposed method. And further analysis regarding attention weights of EHR features enhances the model’s transparency and potential application value.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
In Section 3.2, the authors introduce a static masking strategy to force the model to rely on EHR conditional embedding for reconstructing normal-like images. However, the experiments fail to clarify whether the static or random nature of masking operates independently from the incorporation of EHR information in improving normal image reconstruction. Further explanation would significantly enhance the rigor of the study.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper is well-written. Although using a Feature Tokenizer to encode tabular data and employing cross-attention to combine multimodal information are common practices, this paper effectively integrates these approaches into a conditional diffusion model and successfully applies a novel masking mechanism for chest X-ray anomaly detection. Comprehensive experimental content demonstrates the model’s effectiveness.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
The paper proposes a diffusion-based anomaly detection framework that leverages both chest X-rays and electronic health records (EHRs) for better anomaly detection. The authors introduced several new modules, including a novel Image-EHR cross-attention mechanism, and a new kind of masking in the diffusion model. They show the performance of their model for anomaly detection on two chest radiography datasets, CheXpert and MIMIC-CXR/IV, which show state-of-the-art performance in anomaly detection.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Experiments are sound and the paper is well explained and structured
- The authors properly compared their methodology with state-of-the-art diffusion modelsand feature based anomaly detections.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
There is no statistics in the results (no confidence intervals, no standard deviation).
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This is a very meaningful paper to advance the field of anomaly detection using mask diffusion models in clinical settings. The lack of statistics in the tables weakens its message.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The authors introduce a diffusion model-based technique for anomaly detection in medical images (evaluated on chest x-ray in this paper), which notably introduces (1) the utilization of EHR (electronic health records) for inference and (2) a learned strategy for predicting masks for the model to inpaint healthy tissue in the image. Evaluated on two chest x-ray datasets, they obtained consistently improved AD performance over prior diffusion-based and feature-based AD methods, paired with extensive ablation studies which evaluate questions such as the influence of the novel components, the attention weights of the model to different EHR categories, and others.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The problem scenario itself is well motivated and impactful, due to AD methods not requiring image-level labels for training, and Chest-Xray being a very common modality.
- The integration of EHR into medical anomaly detection (AD) is novel to my knowledge. In general, I think the field should be thinking more about multimodal models that utilize the extensive information already present in EHR (at least on datasets where this information is available).
- The method has solid technical (ML) novelty. While attending diffusion models to tokenized data is not novel (see e.g. the latent diffusion model paper, Rombach et al., CVPR 2022), the PCM module is novel. Additionally, the PCM implementation and motivation makes sense, especially the use of the scaling parameter s.
- The results are good, overall. In terms of the metrics, AUROC and AUPRC are good, although AUROC may be a bit biased towards small anomalies, but AUPRC circumvents that. The comparison AD methods form strong and varied baselines for both feature-based and diffusion-based approaches. The improvements of your method are overall clear and consistent over the others, although not by a large amount–I think this is OK given that there may be a performance “ceiling” here.
- The ablation studies are varied and helpful: a. It’s clear that utilizing PCM improved performance (Table 2). Using EHR (Table 2) didn’t add that much benefit, but it didn’t hurt, and this makes sense given that EHR information may only be vaguely related to the presence of anomalies. b. The finding that whether MSE vs max_abs is better changes depending on the dataset is interesting, and your explanation for it is reasonable; this may be of interest to the AD community. c. The attention weight analysis for different EHR features is especially interesting, and it makes sense that demographic features (which are most likely only vaguely related to visible anatomical information AT BEST) would have basically zero attention weight. I’m curious to see how these types of EHR weights would change with different tasks (including non-AD), datasets/modalities, and additional EHR features.
- The paper is very well-written and well-organized, with no visible grammatical errors/typos.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The performance improvements over the baselines are consistent, yet somewhat small. Also, the only diffusion-based model that you compare to, DiAD (AAAI 24) while strong, was developed outside of the medical imaging context, so this may slightly hurt it’s relevance. It may have been better if you compared to a recent diffusion-based AD method particularly developed for medical AD, e.g. THOR from “Diffusion Models with Implicit Guidance for Medical Anomaly Detection” by Bercea et al., MICCAI 2024. It would be helpful if the authors discuss how they expect their model to compare to such methods.
- The MPG module introduces an entire additional UNet to the overall model, which surely raises computation cost compared to the other compared diffusion-based approach, DiAD. It would be helpful if the authors comment on this tradeoff between increased computation cost and improved performance.
- The authors only focus on one modality/anatomy: chest x-ray, although of course it is a very common and clinically-impactful one.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- I’m curious how specifically, the conditioning vector c_r is input into the model. Is it just concatenated or attended to within the bottleneck of the denoiser or something?
- Although there probably isn’t room for it in a MICCAI paper, I’m curious about if you did an ablation study for the lambda term in the loss function: the tradeoff between the main denoising objective and the masking objective.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method introduces well-motivated technically-novel components (the utilization of EHR for medical AD and the PCM+MPG module) to diffusion-based anomaly detection, which produces consistent performance improvements over what appear to be strong comparison methods. The authors also provide extensive ablation studies, which add a lot to the work, that answer crucial questions such as the performance benefits due to the novel components, and the importance of different categories of EHR to inference. Finally, the paper is polished: very well-written and organized. Despite small concerns (using a diffusion-based AD comparison method that was not developed for medical images, and potential computation cost issues), I believe that this is a promising method which is well-validated and potentially impactful. I think that it is of interest to both the medical AD community and the general MICCAI community at large (the latter particularly to showcase the benefit of utilizing EHR for imaging predictions).
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We sincerely thank all reviewers for their detailed and constructive feedback. We are especially grateful for the thoughtful evaluation of our methodology and for recognizing the novelty and impact of integrating EHR data into a diffusion-based anomaly detection framework.
** Clarify whether the static or random nature of masking strategy operates independently of EHR integration. (Reviewer 1) -> Thank you for the thoughtful comment. We consider random masking with specific textures as conceptually aligned with the definition of synthetic anomalies. As the task-specific nature of synthetic anomaly-based experiments can be reasonably expected [1], we decided that additional experiments were not needed for this paper. Nonetheless, we agree that further exploration of masking strategies could provide deeper insights and plan to investigate this in future work. [1] Xu et al., A Survey on Industrial Anomalies Synthesis, arXiv:2502.16412.
** DiAD may lack relevance due to its non-medical origin. (Reviewer 2) -> We appreciate this observation. Although DiAD originated outside the medical imaging domain, its arXiv version [2] reports competitive results on medical datasets. In our experiments, our method achieved higher image-wise performance, which shows the benefit of EHR integration. Although the margin of improvement is modest, we believe this result underscores the benefit of multimodal conditioning in medical anomaly detection. [2] https://arxiv.org/pdf/2312.06607
** MPG increases computational cost compared to DiAD. (Reviewer 2) -> We agree with this observation. As the first diffusion-based anomaly detection framework that jointly utilizes EHR and chest X-ray data, our primary goal was to explore performance gains from multimodal integration. We recognize the importance of efficiency and are currently working on latent diffusion-based variants, inspired by DiAD, to reduce computational overhead in future iterations.
** Only chest X-ray modality is considered. (Reviewer 2) -> We agree that broader validation across multiple modalities would strengthen the generalizability of our approach. However, the lack of publicly available datasets containing paired EHR and medical images of sufficient size (outside of MIMIC) has limited our ability to explore additional modalities. We are hopeful that future dataset initiatives will enable more comprehensive evaluations.
** How is the conditioning vector c_r injected into the model? (Reviewer 2) -> We appreciate the interest in implementation details. The conditioning vector c_r is first summed element-wise with the timestamp embedding, normalized, and linearly transformed. It is then injected into both the MPG and NP modules via element-wise summation at every residual block starting from the bottleneck layer onward.
** Any ablation on the lambda parameter? (Reviewer 2) -> We appreciate the insightful suggestion. Due to space limitations, we did not include an ablation on the lambda parameter. We agree that this parameter could influence the early-stage training dynamics of the IECA modules. While we hypothesize that final performance is relatively robust to its value, we will explore this further and consider including the analysis in the final version or follow-up work.
** No statistical reporting (such as confidence intervals). (Reviewer 3) -> We acknowledge this limitation. Due to resource constraints, we were unable to conduct repeated runs to report statistical variations. We fully agree on the importance of reporting such measures and aim to include them in future work.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
All reviewers recommend acceptance, citing the novelty of integrating EHR data into a diffusion-based anomaly detection framework and praising the method’s solid design, clarity, and consistent performance across datasets. The paper includes meaningful ablations and attention analyses that strengthen its contribution. However, concerns include missing statistical reporting, limited modality scope, and the lack of comparison to medical-specific diffusion methods like THOR.
Additionally, from the single visual example shown, Diff3M reconstructions appear less sharp than baselines, which might influence anomaly classification performance. It would be helpful to include full qualitative outputs (not just zoom-ins) and evaluate localization performance in addition to image-wise classification to better understand where improvements stem from.