Abstract

Privacy preservation in AI is crucial, especially in healthcare, where models rely on sensitive patient data. In the emerging field of machine unlearning, existing methodologies struggle to remove patient data from trained multimodal architectures, which are widely used in healthcare. We propose Forget-MI, a novel machine unlearning method for multimodal medical data, by establishing loss functions and perturbation techniques. Our approach unlearns unimodal and joint representations of the data requested to be forgotten while preserving knowledge from the remaining data and maintaining comparable performance to the original model. We evaluate our results using performance on the forget dataset, performance on the test dataset, and Membership Inference Attack (MIA), which measures the attacker’s ability to distinguish the forget dataset from the training dataset. Our model outperforms the existing approaches that aim to reduce MIA and the performance on the forget dataset while keeping an equivalent performance on the test set. Specifically, our approach reduces MIA by 0.202 and decreases AUC and F1 scores on the forget set by 0.221 and 0.305, respectively. Additionally, our performance on the test set matches that of the retrained model, while allowing forgetting. Code is available at https://github.com/BioMedIA-MBZUAI/Forget-MI.git

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3777_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/BioMedIA-MBZUAI/Forget-MI.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HarSha_ForgetMI_MICCAI2025,
        author = { Hardan, Shahad and Taratynova, Darya and Essofi, Abdelmajid and Nandakumar, Karthik and Yaqub, Mohammad},
        title = { { Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {208 -- 218}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces Forget-MI, a novel machine unlearning method specifically designed for multimodal medical data. The main contribution lies in developing a comprehensive unlearning approach that efficiently removes both unimodal and multimodal patient information upon request, leveraging a set of carefully defined loss functions. The proposed method balances data forgetting with maintaining the model’s predictive performance on remaining data.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper tackles the novel and crucially important problem of multimodal unlearning in healthcare, directly addressing patient privacy concerns and regulatory requirements.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Presentation quality can be improved. The experimental section uses present tense instead of past tense. Citation formatting is inconsistent; for example, reference [7] only mentions the year without specifying the venue (it should be cited as an ECCV 2024 conference paper).

    2. Insufficient discussion of prior work. The paper claims that unimodal retention [7] is problematic in healthcare but does not clearly explain why existing methods like MultiDelete [7] are inadequate.

    3. Limited experimental validation. Experiments are conducted on a single dataset (MIMIC-CXR), limiting the generalizability of the results.

    4. Evaluation Issues (major)

      • In evaluating unlearning, the model’s performance on the forget set should be compared against the retrain baseline [1]. The closer it is to the retrain model, the better. However, the paper incorrectly implies that simply achieving worse performance on the forget set is sufficient.
      • A good unlearning method should also preserve performance on the test set. In this paper, Forget-MI exhibits significant performance degradation on the test set, yet all the results are incorrectly highlighted (in bold) as the best based solely on MIA scores.

    [1] Li N, Zhou C, Gao Y, et al. Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper shows a limited understanding of unlearning evaluation principles as stated in the weaknesses.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Forget-MI is consistently the poorest on the test set, which is definitely not an acceptable trade-off between performance and privacy.



Review #2

  • Please describe the contribution of the paper

    This paper proposes Forget-MI, a machine unlearning method designed for multimodal medical data, which uses custom loss functions and perturbation techniques to forget specific data while preserving the rest of the model’s knowledge. Forget-MI effectively unlearns both unimodal and joint representations and is evaluated using performance on the forget dataset, the test dataset, and resistance to Membership Inference Attacks (MIA). Experiments show Forget-MI reduces MIA by 0.202, decreases AUC and F1 scores on the forget set by 0.221 and 0.305, and maintains test performance comparable to a full retraining, outperforming existing unlearning methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This work offers Forget-MI, a machine unlearning method designed for multimodal medical data, which uses custom loss functions and perturbation techniques to forget specific data while preserving the rest of the model’s knowledge.
    • A novel concept of multimodal unlearning is proposed.
    • Extensive case studies.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The proposed method does not seem to achieve a better unlearning-utility trade-off.
    • Further evaluation of the time cost of unlearning is needed.
    • The impact of hyperparameters on the proposed method needs to be further explored.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The performance of the proposed method needs to be further enhanced. As shown in Table 1, the Dt, AUC metric indicates that the performance of the proposed method does not achieve the best results (but the authors still highlight the results). This raises readers’ concerns about whether the proposed method can achieve a good unlearning-utility trade-off.
    2. More advanced baselines need to be included. Considering that the baselines selected in this paper are not the most advanced, I suggest that the authors consider the following baselines.
      • [1] Li J, Wei Q, Zhang C, et al. Single image unlearning: Efficient machine unlearning in multimodal large language models[J]. Advances in Neural Information Processing Systems, 2024, 37: 35414-35453.
      • [2] Cheng J, Amiri H. Multidelete for multimodal machine unlearning[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 165-184.
      • [3] Cheng J, Amiri H. Mu-bench: A multitask multimodal benchmark for machine unlearning[J]. arXiv preprint arXiv:2406.14796, 2024.
    3. In MU, the time cost of unlearning is also a very important indicator. Therefore, the authors need to compare the time cost of different MU methods in depth to highlight the effectiveness of the methods.

    4. There are four key hyperparameters in Eq. 5. The authors need to further explore and analyze the impact of these hyperparameters on the performance of the method and how to find the optimal hyperparameter combination.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a simple yet effective method for “right to be forgotten” in multimodal healthcare settings.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well written and clear. Perfect flow and explanations.
    • While unlearning has been explored in unimodal domains, doing so in multimodal healthcare data, accounting for both unimodal and joint embeddings (e.g., clinical notes + images), is a genuinely fresh angle.
    • The use of noise injection, four structured losses, and an embedding distance-based unlearning strategy is well thought out. It’s clear the authors understand the intricacies of representation learning.
    • A comprehensive set of experiments were conducted to evaluate the performance of the proposed method. The results are promising too.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • While the application to healthcare multimodal data is novel, the core technical approach draws quite heavily from previous teacher-student and distillation-based unlearning literature. The new contribution lies more in how existing components are combined, rather than inventing a fundamentally new unlearning method.
    • The method includes four distinct loss terms, but there’s no detailed analysis of how each one contributes to performance.
    • The noise injection strategy seems central, but there’s no detailed explanation or intuition. Is it Gaussian? Uniform? How is it calibrated to avoid hurting generalization? Noise design can deeply affect representation shifts.
    • No insight is given into when Forget-MI fails; e.g., what if embeddings are highly entangled? Are there situations where unimodal forgetting harms the other modality due to shared representation space? a discussion of this would be great.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application domain is relevant and underexplored and the results are promising and well-structured.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I found the paper interesting. The other reviewers who rejected it seem to know more about machine unlearning, but I only have basic knowledge of the topic.



Review #4

  • Please describe the contribution of the paper

    Forget-MI is a machine unlearning framework designed specifically for multimodal medical data, where both unimodal and joint representations can be forgotten upon patient request. The method uses a combination of loss functions that target forgetting (via distance from original representations) and retention (to preserve performance on retained data), supported by noise injection to generalize forgetting beyond individual samples. Evaluated on MIMIC-CXR, Forget-MI reduces forget-set accuracy while maintaining test set performance comparable to a fully retrained model.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper flows beautifully and every concept is explained extremely well. Figure 1 is incredibly clear. This is a paper that can be read by people without any knowledge on unlearning
    • The method is simple yet effective, using a combination of losses and noise to unlearn both unimodal and multimodal representations.
    • The approach is evaluated on different forget set sizes, reflecting realistic scenarios.
    • The method outperforms several baselines across multiple relevant metrics, effectively reducing retention of forget samples without compromising test performance.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • While the authors argue that retraining from scratch may not be feasible without access to the training data, their approach still involves fine-tuning on the retain set, which assumes such access.
    • The experiments lack standard deviation or statistical significance, which limits the robustness of the results.
    • The method is only evaluated on a single dataset, and although it is described as a subset of MIMIC-CXR, the exact selection criteria and scope are not clearly explained.
    • In several cases, the unlearned model’s performance remains noticeably below that of a model retrained without the forget samples.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (6) Strong Accept — must be accepted due to excellence

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses an important problem with a simple flexible method, has a comprehensive evaluation and is very well-written and explained.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I maintain my strong accept rating.




Author Feedback

We thank the reviewers for recognizing the novelty (R1,R2,R3,R4), clarity (R1,R2,R3), &value (R1,R2,R4) of our work, including our method’s extensive evaluation (R1,R2,R3) on multimodal healthcare data. Below, we address the comments: R2.Q1:Originality Unlike existing methods, which deal with one modality or delete the relationship between modalities, Forget-MI completely unlearns both patient modalities, &any inter-modality links. Our work is the 1st to target the removal of individual patient data from multimodal architectures R1.Q4,R3.Q1,R4.Q4:Metrics In Evaluation Criterion, our assessment of Forget-MI is based on a combination of metrics (performance on the forget &test sets, distance from retrain, MIA score) rather than a single criterion. This multi-metric approach is motivated by the absence of a consensus on a single evaluation standard in unlearning, &explicitly addresses the trade-off between regulatory compliance, robustness to attacks, &utility preservation. Forget-MI occasionally underperforms on our highly imbalanced test set, that includes low-represented classes(10% of labels are alveolar edema). Nonetheless, the majority of classes maintain stable performance, &forgetting quality remains high. Our evaluation reflects realistic trade-offs rather than favoring a single metric, which is what the best results in Tables1&2 show. To address confusion from bolding combined metrics, we adjusted formatting R2.Q2,R3.Q4:Hyperparameters Our method integrates 4 key loss terms, &we examine their contributions in Table2 by varying the weighting scheme. For noise, our use of Gaussain follows prior theoretical work[10] (see Effect of Noise). We empirically observed that low noise levels best balance forgetting while maintaining generalization. Due to space, we report only optimal settings and will share full details in code R1.Q3,R2.Q4,R4.Q3:Dataset&Shared Representations The selection criteria focused on patients diagnosed with Pulmonary Edema, using a split available in[6]. As the 1st study of multimodal unlearning in medical domains, we used MIMIC-CXR, the largest &most popular dataset containing images &reports. It poses challenges for unlearning due to datapoint similarity, leading to entangled embeddings. Our experiments show superiority to other baselines in this challenging setting. While a larger image-text dataset isn’t currently available, future research could expand to additional modalities &settings Regarding concerns about unimodal unlearning, note that our method targets patient IDs to select and unlearn any associated data across all modalities simultaneously R3.Q2:Baselines We conducted extensive research &addressed gaps in existing literature. MultiDelete ([7] in our paper,[2] in your review) robustly decouples modalities, but doesn’t address unimodal encodings. Our results consistently demonstrate Forget-MI superior performance. The SIU paper ([18] in our paper,[1] in your review) unlearns single visual concepts from multimodal LLMs, but Forget-MI aims to complete patient-level unlearning (not concept-level forgetting) of both image &report data. Regarding the MU-Bench survey ([3] in your review), we already include baselines it covers (SCRUB, NegGrad+), chosen for their reproducibility and broad adoption R4.Q2:MultiDelete MultiDelete performs modality decoupling by targeting the joint embedding while preserving information from individual modalities. In healthcare settings, this means it removes the link between a patient’s image &report but retains the identity of both as belonging to the patient, undermining the goal of the right to be forgotten R1.Q1:Retain Set We don’t require full access, we’ll clarify this in revised paper R3.Q3:Time Retraining takes up to 14hrs, Forget-MI takes up to 5hrs, depending on the forget perc We hope this work serves as a foundation for advancing machine unlearning in healthcare, with implications for privacy &regulatory compliance. Paper will be revised &Code will public




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Well-written paper on a timely topic where the strengths outweigh the concerns (mainly related to the experimental setup) and I also believe that the rebuttal did a good job on addressing them.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The work is interesting and timely. However, one reviewer raised concerns about the results interpretation. The AC agrees that the interpretation for Table 2 is misleading and fairly reflects the unlearning method’s performance.



back to top