Abstract

Developing models that are capable of answering questions of the form “How would x change if y had been z?” is fundamental to advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that the corresponding labels are available in the training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels. Code is available at: https://github.com/yi249/ssl-causal

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3457_paper.pdf

SharedIt Link: https://rdcu.be/dY6fX

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_28

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3457_supp.pdf

Link to the Code Repository

https://github.com/yi249/ssl-causal

Link to the Dataset(s)

https://github.com/dccastro/Morpho-MNIST https://physionet.org/content/mimic-cxr/2.0.0/

BibTex

@InProceedings{Ibr_SemiSupervised_MICCAI2024,
        author = { Ibrahim, Yasin and Warr, Hermione and Kamnitsas, Konstantinos},
        title = { { Semi-Supervised Learning for Deep Causal Generative Models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {294 -- 303}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors aim to develop causal generative models that can address counterfactual questions in clinical data, especially when the counterfactual counterparts are not available for most data points. To that end, the authors propose a semi-supervised learning method for causal generative models and show some promising results.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Causal learning is an interesting and potentially critical area of research, and it is great the authors are tackling challenges in this realm.
2. The first paragraph in the Introduction section provides a very concise and intriguing summary of the landscape in related fields.
3. Figure 2 presents an intuitive illustration of the causal dependency graph of the observed and unobserved variables in the two datasets the authors experimented on.
4. Figure 3 (b) and Figure 4 (a) shows promising results of counterfactual regularization.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. For Table 1, it is not immediately intuitive what we are looking for and/or what we shall expect for these variables, except for MAE which is the lower the better. I would assume the last three columns are the higher the better, but not very certain.
2. Empirical experiments and evaluation are not the most straightforward. But this might be some intrinsic issue with causal inference and counterfactual generation. It might be interesting to augment the evaluation with separately trained attribute classifiers.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

I wouldn’t be too concerned about reproducibility as long as the authors specify their plans to open source.

Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

Regarding Table 1. It might be helpful to provide more details in the caption or main text to help the readers what they shall look for and what they shall take away from the results. For example, which variables are the most important factors for judging the performance in which aspects, whether the metric is the higher the better or vice versa, etc. I would assume we want to have a high posterior $q(\cdot x)$, but I am not absolutely certain from the current materials.

Regarding the results in the MIMIC dataset. While it is helpful to show the counterfactual generation in Figure 4 (b), the evaluations for this experiment seem a little handwavy. I was thinking whether it will be more convincing if the authors provide additional quantitative evaluation on the altered attributes in counterfactual generations. For example, they could train attribute classifiers for sex, age, race, disease status, independent from the causal generative model, and use their probability prediction as measures to evaluate how much these attributes are altered in the generated counterfactual examples.

Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Interesting topic, somewhat convincing results, but not absolutely persuasive.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper
1. The introduction of a semi-supervised deep causal generative model.
2. The generation and evaluation of counterfactuals with missing causal variables.
3. A causal perspective on the consistency regularization technique for semi-supervised learning.
4. An investigation into the performance differences when parent or child variables are missing, inspired by the Independence of Cause and Mechanism (ICM) principle.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The causal generative model was remarkably enhanced through self-supervised learning in environments where labels are scarce. This innovative approach was successfully applied to real chest x-ray images, showcasing its impressive potential and applicability in practical scenarios.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
It seems that aside from applying self-supervised learning, the novelty of below paper may be considered relatively weak compared to previous research.
1. Ribeiro, Fabio De Sousa, et al. “High fidelity image counterfactuals with probabilistic causal models.” arXiv preprint arXiv:2306.15764 (2023).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

Thank you for researching an interesting topic. However, compared to the previous ICML2023 publication ‘High Fidelity Image Counterfactuals with Probabilistic Causal Models’, it seems difficult to give a strong accept as there appears to be no significant novelty. I look forward to your continued good research.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Researchers explored causal inference, an area of keen interest in the medical domain, and addressed a common issue of label scarcity by applying SSL (Self-Supervised Learning). The topic was well-chosen, and the experiments conducted appear to have been logically sound.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors develop a causal generative model that addresses the issues of missing data. This approach can be very applicable to clinical data, which is often incomplete.

The authors evaluate their method on a semi-synthetic datasets, called Morpho-MNIST, where the underlying causal relationships are known. Followed by a real-world medical imaging dataset: MIMIC-CXR.

They evaluate their method in the setting where each sample is fully labelled or fully unlabeled, as well as the scenario with random labels missing for each sample, and compare to an existing supervised method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors develop a novel semi-supervised deep causal generative model, that can improve generalization of deep learning in real-world applications.

They explicitly address the issue of missing data in their model, which is very common in medical records. Specifically, they predict missing values, and weight them by the prediction entropy, to model the confidence in the predicted label. In order to improve the performance of this approach, they employ a strategy of first training with only labeled data, until the predictors reach a sufficiently high accuracy, before using them to impute missing data.

The authors evaluate their method on Morpho-MNIST, where all causal relationships are known, and show a large improvement over state-of-the-art supervised causal generative models.

They also conduct an experiment comparing missing cause variables (effect present), and missing effect (cause variables present). They demonstrate that the setting with more effect labels tends to produce better joint distributions of cause (labels) and effect, thereby supporting the “independence of cause and mechanism” hypothesis.

The authors commit to releasing code with their publication.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The authors motivate their paper by the fact that deep learning models often struggle to generalize to real-world scenarios, and that this may be due to a lack of causal understanding of the data. However, the method is primarily evaluated based on its ability to implement interventions. It would be much more impactful to demonstrate that by generating synthetic images, using the proposed causal strategy, that this method could improve the performance of predictive models, by supplementing them with additional labeled data. The authors mention this as future work; however, it seems to be very accessible using the MIMIC-CXR dataset.

As noted by the authors, an additional limitation of the work is that they assume the DAG structure is known a priori. It would be interesting to conduct experiments (eg, using Morpho-MNIST), where the true DAG structure is perturbed, to measure the sensitivity to this source of noise.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

In Section 2 (Background), the authors switch between referring to endogenous variables using v_i and x_i, which I found confusing.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors present a novel causal generative method, and demonstrate that it outperforms the state-of-the-art supervised causal generative method.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

N/A

Meta-Review

Meta-review not available, early accepted paper.

back to top

Semi-Supervised Learning for Deep Causal Generative Models

Author(s):