List of Papers Browse by Subject Areas Author List
Abstract
Open-source datasets play a crucial role in data-centric AI, particularly in the medical field, where data collection and access are often restricted. While these datasets are typically opened for research or educational purposes, their unauthorized use for model training remains a persistent ethical and legal concern. In this paper, we propose PRADA, a novel framework for detecting whether a Deep Neural Network (DNN) has been trained on a specific open-source dataset. The main idea of our method is exploiting the memorization ability of DNN and designing a hidden signal—a carefully optimized signal that is imperceptible to humans yet covertly memorized in the models. Once the hidden signal is generated, it is embedded into a dataset and makes protected data, which is then released to the public. Any model trained on this protected data will inherently memorize the characteristics of hidden signals. Then, by analyzing the response of the model on the hidden signal, we can identify whether the dataset was used during training. Furthermore, we propose the Exposure Frequency-Accuracy Correlation (EFAC) score to verify whether a model has been trained on protected data or not. It quantifies the correlation between the predefined exposure frequency of the hidden signal, set by the data provider, and the accuracy of models. Experiments demonstrate that our approach effectively detects whether the model is trained on a specific dataset or not. This work provides a new direction for protecting open-source datasets from misuse in medical AI research.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2265_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{JanJin_PRADA_MICCAI2025,
author = { Jang, Jinhyeok and Lee, Hong Joo and Navab, Nassir and Kim, Seong Tae},
title = { { PRADA: Protecting and Detecting Dataset Abuse for Open-source Medical Dataset } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15973},
month = {September},
page = {475 -- 485}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper presents PRADA, an innovative framework aimed at protecting and detecting the misuse of open-source medical datasets. By leveraging the memorization properties of deep neural networks, the framework embeds reliable hidden signals using a small number of samples with minimal impact on the primary task performance. A key contribution is the introduction of the EFAC score, a novel metric that is independent of dataset class size and effectively detects whether a model has been trained on a specific protected dataset. PRADA demonstrates generalizability across different model architectures and can be customized for various tasks, indicating broad applicability.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper introduces the EFAC score, which measures the correlation between normalized signal frequency and model accuracy to verify dataset usage. This metric addresses the limitations of class-dependent approaches by leveraging asymmetric frequency distributions caused by data imbalance, thereby improving detection reliability. The proposed method is validated on both classification and segmentation tasks across multiple medical imaging datasets. Moreover, a comprehensive comparison with Undercover [1] based on key criteria—harmlessness, verifiability, and the EFAC score—reinforces the empirical soundness of the framework. By adopting watermarking-inspired techniques for dataset protection, this work makes a valuable contribution to the emerging field of dataset-level security in medical AI, which is becoming increasingly critical.
Reference: [1] Jang, J., Han, B., Kim, J., & Youn, C. H. (2024, September). Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias. In European Conference on Computer Vision (pp. 1–18). Springer Nature Switzerland.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Limited Novelty: The core idea appears to build heavily upon the Undercover framework [1], with the main novelty being the use of imbalanced data and asymmetric frequency encoding. While the EFAC score is introduced as a new metric, it essentially adds normalization to improve robustness, raising concerns about whether it is broadly generalizable or specifically designed to favor the proposed method.
Weak Empirical Results: The performance gains are not consistently significant across datasets and architectures. Notably, the best results are observed only under specific settings (e.g., DermaMNIST with PVT-v2), suggesting limited generalization.
Insufficient Ablation and Robustness Studies: The paper lacks comprehensive ablation studies to validate the contributions of individual components.
Unclear Clinical Value: The practical deployment of PRADA remains questionable. The framework shows detection capability when the protected data is used, but it does not convincingly demonstrate low false-positive rates when the data is not used. This omission limits confidence in its real-world applicability, particularly in clinical settings.
[1] Jang, J., Han, B., Kim, J., & Youn, C. H. (2024, September). Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias. In European Conference on Computer Vision (pp. 1–18). Springer Nature Switzerland.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(2) Reject — should be rejected, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed approach demonstrates limited innovation, as it primarily builds upon existing work with only incremental modifications that may not warrant a standalone contribution. Additionally, the experimental performance is moderate, with improvements observed only under specific datasets and model architectures, offering limited evidence of broader effectiveness or scalability. Furthermore, the clinical relevance is insufficient, as the absence of evaluations for false positives in scenarios where the protected data is not used raises concerns about the framework’s reliability and practical applicability in real-world settings.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
The paper proposes PRADA a method to watermark any kind of dataset in order to detect if the data got misused against the license terms. It further introduces a new Exposure Frequency-Accuracy Correlation metric to capture protected data better. They conducted extensive evaluation between different models and task types like classification and segmentation.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
While they build upon Undercover Bias[12], they add significant changes and introduce a new metric to improve the verifiability whether a watermarked dataset was used or not.
Evaluation is extensive and well setup, and they conducted across multiple modalities and model architectures. Further, they also try to gain some deeper understanding why their method works better by qualitative samples and visualization of the feature embeddings between watermarked and raw data samples.
Method shows significant improvements advances in comparison with Undercover Bias[12].
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
There is only a single watermarking method compared directly on all datasets.
Statements and motivations are sometimes not well-supported by the paper or citations. I.e. first sentence in introduction. Also statements as: “while ensuring strong verifiability” and “Our method provides a reliable solution” maybe seen as exaggerating and vague.
While the paper provides quite a good description of the method, the paper counterintuitively does not seem to open-source its implementation. The major objection with this is that security should not rely on obscurity of the algorithm, which is always lower when code is released. The reproducibility and ease of use would significantly improve if code would be release to be open-source, as PRADA’s use case is to protect open-source datasets.
Further, there are several questions, which threaten the validity/usefulness of the method and are not well discussed as limitations.
- What if an illegal user of data unintentionally deploy strategies that counteract the watermarking i.e. by label balancing or data augmentation through generative models that expand the training set or adversial training which have not been shown to be robust in [12].
- What if intentionally the illegal operator deploys evasion strategies intentionally, how robust would that be? i.e. Outlier filtering of the dataset. Adversial training/anti-watermarking strategies.
- What if the illegal user would deploy watermarking to its own data as well, which might have an overlap in the watermark’s dataset domain? How robust would it be? What if the overlap between keys is not an empty set?
- What if multiple dataset that are watermarked are used? Could there be an accidental misclassification of ownership.
- Models are often not finetuned on a single public dataset, but additional (privately labeled) datasets are used that push the watermarked % of the data down to less than 1%. Could it be detected, OR would it enable false ownership claims (in case the protected data was not used)?
Also, these references might be relevant here:
https://dl.acm.org/doi/pdf/10.1145/3510548.3519376
https://proceedings.neurips.cc/paper_files/paper/2023/file/aa6287ca31ae1474ea802342d0c8ba63-Paper-Conference.pdf
https://ieeexplore.ieee.org/document/10646709
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
It is unclear for what the PRADA name stands for. “PRotecting and Detecting dataset abuse” => PRotecting And Detecting dataset Abuse?
Citations are often not in order of number i.e. [23, 11, 1, 6].
Figures are not linked properly.
Figure 1 PRADA Moduel => PRADA Module?
In Equations 3 and 4 there } with no matching {. Can they be removed?
In Table 1 the metrics should have an arrow in direction of improvement as it improves readability, especially for new metrics like EFAC. Table 1 has no highlighting when results are actually significantly better by p-values. This would greatly improve the readability and ease of judging if the method actually improves overall, which is not that clear on DermaMNIST (Mobilenet) and BloodMNIST.
In Figure 4a should there be no hidden signals to be learned? The legend plots on each plot is confusing. Either combine to a single legend or show the differences. Also, this figure benefits from a better caption.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper provides a decent innovation with good evaluation and investigation into the workings of their method. While there are some weaknesses and many small formatting and readability issues, the strength outweigh the weaknesses.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The paper provides a decent innovation with good evaluation and investigation into the workings of their method. While there are some weaknesses continuing after rebuttal, I remain with my initial suggestion.
Review #3
- Please describe the contribution of the paper
This paper presents PRADA, a two-step dataset verification framework for open-source medical datasets. The method embeds imperceptible hidden signals into the data, leveraging the memorization ability of deep neural networks (DNNs) to ensure that any model trained on the dataset internalizes these signals. The authors propose using data imbalance and incorporating the Exposure Frequency-Accuracy Correlation (EFAC) metric, which quantifies the alignment between model accuracy and the predefined occurrence frequency of hidden signals, enabling robust and scalable verification of dataset abuse. Extensive experiments on classification and segmentation tasks demonstrate that the method preserves model performance while providing strong verifiability, offering a practical and reliable solution for protecting sensitive medical datasets.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper introduces a framework for dataset protection in open-source medical data by embedding imperceptible hidden signals. This approach allows for practical tracing and verification of dataset misuse, which is relevant for sensitive domains such as medical imaging. The paper also proposes the Exposure Frequency-Accuracy Correlation (EFAC) metric, which measures the relationship between the occurrence frequency of hidden signals and model accuracy. The use of intentionally imbalanced hidden signal distributions is designed to improve the sensitivity of misuse detection and makes the method less dependent on the number of classes or task type. Experimental results on classification and segmentation tasks indicate that the framework maintains model performance while enabling verification. The paper provides supporting analyses, including t-SNE visualizations and mean class accuracy on hidden signals, to help explain the method’s behavior in practice.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
One major weakness of the proposed method is the sensitivity of the EFAC metric to severe data imbalance. Because EFAC is based on the correlation between hidden signal frequency and model accuracy, it may be dominated by majority signals when certain hidden signals are underrepresented. In such cases, the metric may fail to accurately reflect misuse, as the performance on minority signals can be masked by the overwhelming influence of majority signals. This is a well-known issue in machine learning with imbalanced datasets, where traditional metrics can be misleading and may not capture performance on rare or minority classes. The lack of additional strategies—such as resampling, cost-sensitive learning, or the use of balanced metrics—to address this challenge further limits the robustness of EFAC in highly imbalanced scenarios. Another limitation of the method is its dependency on class information overlap between the embedding dataset and the target dataset. As described in the paper, the approach requires the target dataset to contain the same class IDs as those present in the embedding dataset for watermarking to be feasible. This restricts the applicability of the method to only those datasets with matching class structures, or necessitates the preparation of a large variety of open-source datasets to maximize coverage. This requirement reduces the generalizability and practical utility of the approach, especially in diverse or evolving real-world settings. Finally, the paper does not provide sufficient detail regarding the experimental setup, particularly in how class matching was handled for the classification tasks. It is unclear whether only samples with matching class information were used, or if other classes were included. Further clarification on this point is needed to fully understand the scope and limitations of the proposed method. This summary reflects both methodological and practical concerns, and is consistent with the challenges and evaluation considerations described in the literature on imbalanced data and dataset watermarking.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a framework for dataset protection in open-source medical imaging by embedding hidden signals and introduces the EFAC metric for misuse detection. The approach addresses a relevant problem and provides a novel perspective on dataset verification. Experimental results suggest that the method maintains model performance while enabling verification, and the paper includes supporting analyses to illustrate the method’s behavior. However, the method has notable limitations. The EFAC metric may be sensitive to severe data imbalance, potentially reducing its effectiveness in certain scenarios. The applicability of the approach is also constrained by the need for class overlap between the embedding and target datasets, which may limit its generalizability. Additionally, the explanation of the experimental setting lacks detail, particularly regarding class matching procedures. Overall, the paper makes a meaningful contribution to the area of dataset protection, but the identified weaknesses limit its impact. I recommend a weak accept, as the core ideas are promising and could be further improved with additional clarification and experimentation.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I now find myself generally in agreement with the methodology proposed in this paper. The authors have addressed most of the concerns and questions I previously raised, which has resolved many of my doubts related to this work. Therefore, in line with my previous stance, I am maintaining my recommendation to accept this paper.
Author Feedback
We thank all reviewers for valuable comments and recognize that our work provides decent innovation(R1), valuable contribution(R2, R3) to data security in medical AI.
[Q1 Comparison]: In medical fields, data integrity is necessary to maintain task performance. We compared with the SOTA method that preserves accuracy while providing reliable verification. Other methods like BadNet can be used, but they reduce test accuracy from 86% to 75% (PVTv2+DermaMNIST), indicating a risk to data integrity. [Q2&3 Improve statement & Code] We’ll correct typos, further clarify our results, and release the code. [Q4.1 Unintended Deployment] We tested our method with Adversarial-Training (i.e., perturbation), denoise, and image smoothing, and confirmed that it remains verifiable. [Q4.2 illegal Evasion] To evade our watermarking, an adversary must know ALL the watermarking model, hidden data, and sample distribution. While exposure of these could reduce verifiability, getting all of them is highly challenging. We also confirmed robustness under outlier filtering (e.g., Spectral Signature). [Q4.3 Multiple Watermarking & Dataset] If a perfectly identical watermark is used, verification may be less discriminative, but generating such identical one is extremely difficult, as noted in Q4.2. Independent watermark types have little effect on each other’s verification. We validated that the verification is reliable when protected data is ≥ 20% of the dataset. [Q1 Novelty] This is the first data-watermarking method tailored to medical challenges: (C1) sensitivity to performance drop (C2) severe class imbalance (C3) few classes (C4) security and safety needs Undercover addressed (C1) but remains vulnerable to class imbalance (C2) and few classes (C3), both of which are prevalent in medical data. We introduce sample imbalance by class-wise pairing of target and hidden data using a non-uniform distribution, inducing performance imbalance as a hidden property. This complements Undercover on (C2&C3) and adds second verification to enhance (C4). [Q2 Results] We politely clarify that our method works well on ALL 3 classification and 1 segmentation datasets with diverse backbones. Verifiability and EFAC are pre-thresholded scores and they are later thresholded for verification. It is crucial that these scores remain clearly separable from those of Raw Data—this holds not only for PRADA on DermaMNIST+PVTv2 but across all PRADA cases in Tab. 1 and 2. [Q3 Ablation] Tab. 1 and 2 serve as an ablation study: Undercover corresponds to the setting without sample imbalance, while ours includes it. Specifically, our sample imbalance enables reliable verification not only in classification but also in segmentation, where Undercover fails. [Q4 Clinical Value] We have considered the false-positive rate (FPR). In Tab. 1, Verifiability and EFAC of PRADA exceed those of Raw Data by over 7×STD, indicating an FPR below 3e-5% via the Empirical Rule. In Tab. 2 , EFAC of PRADA exceeds that of Raw Data by several times the STD, while Verifiability does not. This reliable verification protects medical data, such as patient privacy, ensuring strong clinical value. [Q1 Class Imbalance] We politely clarify that our method is robust to class imbalance by our intentional sample imbalance, as already validated on DermaMNIST (4,693 major vs. 80 minor class samples). Since our proposed sample imbalance is applied independently within each class, it becomes a class-independent property, enabling verification even under class imbalance. [Q2 Dataset Preparation] Although matching the number of classes seems burdensome, the hidden data need not be medical. Natural images can be easily used, allowing for simple collection and use of subsets from large datasets (e.g., ImageNet). [Q3 Class Matching] Classes were matched by ID (e.g., first with first). Target samples were paired with hidden ones sharing the same ID using a decreasing distribution to induce imbalance.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper receives positive support from R1 and R3. The rebuttal clearly addresses the concerns from R2. Overall, the paper introduces a novel approach for medical data watermarking with solid experimental validation, which has potential significant clinical values.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A