Abstract

As recent text-conditioned diffusion models have enabled the generation of high-quality images, concerns over their potential misuse have also grown. This issue is critical in the medical domain, where text-conditioned generated medical images could enable insurance fraud or falsified records, highlighting the urgent need for reliable safeguards against unethical use. While watermarking techniques have emerged as a promising solution in general image domains, their direct application to medical imaging presents significant challenges. A key challenge is preserving fine-grained disease manifestations, as even minor distortions from a watermark may lead to clinical misinterpretation, which compromises diagnostic integrity. To overcome this gap, we present MedSign, a deep learning-based watermarking framework specifically designed for text-to-medical image synthesis, which preserves pathologically significant regions by adaptively adjusting watermark strength. Specifically, we generate a pathology localization map using cross-attention between medical text tokens and the diffusion denoising network, aggregating token-wise attention across layers, heads, and time steps. Leveraging this map, we optimize the LDM decoder to incorporate watermarking during image synthesis, ensuring cohesive integration while minimizing interference in diagnostically critical regions. Experimental results show that our MedSign preserves diagnostic integrity while ensuring watermark robustness, achieving state-of-the-art performance in image quality and detection accuracy on MIMIC-CXR and OIA-ODIR datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0428_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MICV-yonsei/MedSign

Link to the Dataset(s)

MIMIC-CXR-JPG dataset: A chest radiograph dataset with over 370,000 JPG images and structured labels derived from radiology reports. https://physionet.org/content/mimic-cxr-jpg/2.1.0/ OIA-ODIR dataset: A fundus image dataset containing 10,000 images from 5,000 patients, labeled with eight ocular disease categories including normal, glaucoma, and diabetic retinopathy. https://www.kaggle.com/datasets/jeftaadriel/oia-odir-dataset

BibTex

@InProceedings{KimCha_PathologyAware_MICCAI2025,
        author = { Kim, Chanyoung and Ju, Dayun and Kim, Jinyeong and Han, Woojung and Alcover-Couso, Roberto and Hwang, Seong Jae},
        title = { { Pathology-Aware Adaptive Watermarking for Text-Driven Medical Image Synthesis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {399 -- 409}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper is about MedSign, a pathology aware watermarking method that enables adding watermarks to image regions that are clinically not relevant or less relevant to predict the diagnosed disease.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is very well written and easy to follow.

    • The experiments are clear, focused, and effectively address the key questions regarding the proposed method.

    • High-quality figures significantly enhance the understanding of the method.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The paper does not cover all relevant related work and lacks comparison with some important methods. For example, it should include a comparison to: (Zhang, Xuanyu, et al. “EditGuard: Versatile image watermarking for tamper localization and copyright protection.” CVPR 2024)

    • Several parameters appear to be chosen arbitrarily, which weakens the potential impact of the method. For instance, setting τ = 0.7 assumes prior knowledge about the disease, which may not generalize.

    • The evaluation only examines how watermarking affects the confidence of predicting the target disease. It does not assess its impact on the prediction of other diseases. Since the watermarks are focused in different regions, this could confuse classifiers and potentially cause false positives. Also choosing absolute difference as metric seems weird because the confidence could also increase

    • Table 2 shows that the proposed AAS and CAS steps do not provide significant performance improvements. In fact, there is a performance drop in disease confidence compared to the base model. The influence of these steps requires further investigation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the method shows promising results, I believe that many parts of its contribution are not sufficiently motivated and it misses relevant literature.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose a watermarking method for synthetic medical data that preserves the diagnostic integrity of the data.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-written and easy to follow.
    • The method is, as far as I am concerned, novel and of high interest to the community.
    • The experimental setup uses two different datasets, the evaluation is thorough and an ablation study shows the importance of the different proposed parts of the method. The experiment in Tab.2 (a) shows nicely that the proposed approach is indeed superior at preserving the diagnostic accuracy.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • A more thorough literature review on the current watermarking approaches would be helpful for better understanding of the proposed novelty.
    • Fig. 3 raises the question whether the watermarking approach gets “stuck” in a local minima, by applying the signature-related changes to the images only to specific regions, where no pathology can generally occur (e.g. the ribs in the Chest Xrays visible in the Pixel-wise difference image).
    • In general, the method seems to be a little bit too complex for the desired problem: instead of particularly avoiding the pathological regions for watermarking, one could simply define “safe” regions for watermarking (medically founded). However, the approach can have broader application, and is, thus, still of interest.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper offers a novel and interesting approach, however the application scenario seems a little oversimplified for the proposed methodology.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper
    1. Pathology-Aware Watermarking Framework (MedSign): The authors propose MedSign, a novel watermarking system tailored for medical image synthesis. It integrates watermarking into the image generation process using cross-attention mechanisms to identify and preserve diagnostically critical regions, ensuring watermark robustness without compromising clinical utility.

    2. Adaptive Loss with Pathology Localization: MedSign introduces a pathology-aware adaptive loss function that uses attention-based localization maps to guide watermark placement, embedding only in non-critical regions. This achieves superior image quality, watermark detection accuracy, and minimal impact on diagnostic confidence, outperforming existing watermarking techniques.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Pathology-Aware Watermark Embedding The proposed method, MedSign, introduces a novel approach that adaptively embeds watermarks only in non-diagnostic regions of medical images. This is achieved through a cross-attention based pathology localization map that ensures the preservation of critical anatomical details—an essential consideration in clinical applications. This pathology-aware mechanism sets the work apart from prior methods which often degrade diagnostic integrity.

    2. Strong Experimental Validation The paper provides comprehensive qualitative and quantitative evaluations on real-world datasets like MIMIC-CXR-JPG and OIA-ODIR. MedSign demonstrates state-of-the-art robustness and image fidelity, outperforming existing watermarking techniques in both PSNR/SSIM and watermark retrieval under various image perturbations, while minimizing clinical misinterpretation—a major concern in medical imaging.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The method is tested only on CXR and fundus images, leaving out other key modalities like CT, MRI, or ultrasound, which may have different sensitivity to watermark distortions.

    2. The system relies on cross-attention maps for pathology localization, but attention mechanisms are not always reliable or interpretable, especially for ambiguous or complex pathologies.

    3. The model is trained for only 2 epochs, which may not be sufficient for robust convergence, especially for complex watermarking tasks across high-resolution medical images.

    4. There is no qualitative assessment by radiologists or medical professionals to confirm that diagnostic integrity is visually preserved, making the clinical relevance less validated.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (6) Strong Accept — must be accepted due to excellence

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Recommendation: Strong Accept

    This paper presents MedSign, a novel deep learning-based watermarking framework specifically tailored for text-to-medical image synthesis. The proposed method addresses a critical and underexplored challenge in the domain of medical AI: embedding robust watermarks without compromising diagnostic integrity.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers for their valuable feedback. To address the main concerns, we clarify the following points:

R1, R2: Literature review. Watermarking has evolved through three stages, all aiming to embed imperceptible yet robust signals. Early methods like DCTDWT operate in the frequency domain but are fragile. Learning-based methods (SSL Watermarking, Stable Signature, WAM) embed in image or feature space for improved robustness, while recent semantic-aware approaches (Tree Ring, RingID) manipulate latent or attention representations. Since these alter image structure, they are unsuitable for medical imaging and were excluded. In our evaluation, we selected representative methods from each stage. These include DCTDWT as a traditional approach, and SSL Watermark, WAM, and Stable Signature as SoTA learning-based methods. Although EditGuard is learning-based, it showed limited robustness in prior WAM evaluations, so we compared with WAM instead. As in Table 1, our method consistently outperforms it across multiple metrics.

R1: Stuck in local minima? Watermarks often appear near rib boundaries in CXR images due to a deliberate design choice. Embedding in edge-rich areas, like the ribs, balances imperceptibility and robustness, as these regions can better tolerate slight perturbations. This strategy, similar to WAM’s use of a JND map, is not a suboptimal outcome but an effective way to preserve both visual quality and clinical interpretability.

R1: Why not use predefined safe zones? Defining universally safe watermarking regions in medical images is difficult due to variable pathology locations. Fixed-region methods lack flexibility, but our adaptive approach ensures robustness and diagnostic safety.

R2: Arbitrary params. There is a trade-off controlled by the parameter \tau. Higher values improve watermark robustness by enabling broader embedding, but also increase the risk of altering diagnostic content. We set \tau = 0.7 as a balanced choice based on empirical results.

R2: Only target disease tested. Abs diff metric questionable. We evaluated diagnostic consistency using a multi-label CXR classifier, considering both the target disease and five other pulmonary conditions. Instead of predicted labels, we used confidence scores to capture subtle changes, as even small shifts can affect clinical decisions. The goal of medical image watermarking is to ensure that the watermarked image is diagnostically indistinguishable from the original. To reflect this goal, we used the absolute confidence difference to check whether the watermark either lowers or elevates the model’s confidence.

R2: AAS/CAS hurt confidence. The slight drop in confidence differences when AAS or CAS is removed reflects their role in refining attention maps and expanding embedding regions, reducing interference with diagnostic areas. Without them, embedding is limited, slightly lowering the confidence gap. Since this value is scaled from a small probability difference, the actual impact is minor. In contrast, the over one-point drop in average bit accuracy indicates a more meaningful loss in robustness, highlighting the importance of AAS and CAS in maintaining both watermark strength and diagnostic safety.

R3: Why not test on other modalities? CXR and Fundus images were chosen for grayscale and color evaluation using paired text-image datasets. Results show effectiveness across colorspaces, indicating potential for CT, MRI, and ultrasound.

R3: Can attention be trusted? Studies like DAAM and OVAM show cross-attention reliably aligns text and image, making attention maps the best method for mask generation without ground-truth labels.

R3: 2 epochs seem too few for reliable training. As the task does not involve semantic understanding, the model converged within two epochs. We fully fine-tuned the LDM decoder to enhance watermarking while preserving image quality, with additional training offering minimal benefit and risking image quality degradation.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The paper is generally well-received as clear, meeting clinical demands and technically sound. Yet, the reviewers point out that it could benefit from extra comparative evaluation, parameter justification, and expanded testing. Following my own reading, the watermarking idea itself looks both technically novel and impactful for clinical applications; so, despite one weak reject, I am recommending a provisional accept (it is already above the threshold). However, I strongly recommend the authors to implement the suggested changes and address the concerns of the reviewers in the camera ready.



back to top