Abstract

Interpretability is essential in medical imaging to ensure that clinicians can comprehend and trust artificial intelligence models. Several approaches have been recently considered to encode attributes in the latent space to enhance its interpretability. Notably, attribute regularization aims to encode a set of attributes along the dimensions of a latent representation. However, this approach is based on Variational AutoEncoder and suffers from blurry reconstruction. In this paper, we propose an Attributed-regularized Soft Introspective Variational Autoencoder that combines attribute regularization of the latent space within the framework of an adversarially trained variational autoencoder. We demonstrate on short-axis cardiac Magnetic Resonance images of the UK Biobank the ability of the proposed method to address blurry reconstruction issues of variational autoencoder methods while preserving the latent space interpretability.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1877_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1877_supp.pdf

Link to the Code Repository

https://github.com/compai-lab/2024-miccai-di-folco

Link to the Dataset(s)

https://biobank.ndph.ox.ac.uk/showcase/

BibTex

@InProceedings{Di_Interpretable_MICCAI2024,
        author = { Di Folco, Maxime and Bercea, Cosmin I. and Chan, Emily and Schnabel, Julia A.},
        title = { { Interpretable Representation Learning of Cardiac MRI via Attribute Regularization } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an Attributed-regularized Soft Introspective Variational Autoencoder that combines attribute regularization of the latent space within the framework of an adversarial trained variational autoencoder. They demonstrate on short-axis cardiac Magnetic Resonance images of the UK Biobank the ability of the proposed method to address blurry reconstruction issues of variational autoencoder methods while preserving the latent space interpretability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Publicly available.
    • Ablation study.
    • LPIPS seems to detect subtle changes in the image quality that are detected by SSIM.
    • Qualitatively, the proposed method seems to improve from the previous methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Non-public data.
    • It was mentioned that convergence is difficult to achieve, but no details were shared on how to overcome it. The implementation details seem straightforward from previous publicly available methods.
    • Besides the examples and the LPIPS presented, no interpretability was done on downstream tasks. This was mentioned in future work but could have helped in supporting the results for LPIPS vs SSIM.
    • The interpretability score was not presented in the paper.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    In general, reproducible except for limited access to the UK Biobank

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    While the proposed method is good with nice qualitative results. Evaluation of the interpretability in the downstream tasks is missing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Limited novelty given the lack of interpretability evaluation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper addresses blurry reconstruction issues in cardiac MRI. A VAE-based model is proposed that the attribute regularization and the soft introspective VAE (SIVAE) framework are combined to increase the interpretability in latent space and image generation ability in output. Experimental results demonstrate effectiveness in overcoming the blurring issue.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The issue of blurry image generation in cardiac MRI is an important problem, and the idea of combining attribute regularization to increase interpretability and SIVAE to overcome the blurry issue is creative.
    2. The experimental section addresses the improvement of interpretability and reconstruction quality.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The related work of interpretable features and the issue of blurring reconstruction is not elaborated.
    2. The presentation of the methodology could be improved for better clarity.
    3. The experimental results need better explanation and visual evidence.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors are encouraged to add related works on the benefits and limitations of using attribute regularization on latent representation, as well as the motivation of using SIVAE to solve deblurring issues.
    2. In Equation 3, is \gamma_reg global to all attributes and \delta local to each attribute?
    3. What is the difference of \gamma_reg in Equation 3 vs Equation 2?
    4. How is alpha = 2 obtained in Equation 6?
    5. Do Equation 6 and Equation 7 share the same \beta_rec and \beta_kl?
    6. The illustration of Fig 2 needs to be better. What relationship does the ELBO in the first row have with the encoder in the second row? Which part denotes the first step of training and which part is the second step?
    7. In Table 1 the performance change by attribute regularization is marginal and sometimes worse. This is further shown in Fig 3 that the details of cardiac regions in AR-SIVAE are not as good as those in SIVAE. Could the authors explain whether the attribute regularization has a counter effect on deblurring? Also, Fig 3 needs to provide more details: it is better to highlight the regions of LV and RV to show the improvement in the regions of interest.
    8. In Table 2, the authors need to indicate whether a higher metric number means better performance.
    9. Fig 4 should have a better illustration of the changing of latent dimensions together with the visual samples.
    10. Could the author compare the proposed method with existing works on deblurring and attribute representation learning?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper combines the benefits of attribute regularization and SIVAE. The presentation of the methodology needs to be better structured. The experiments need to be more sufficient.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This work proposes an Attributed-regularized Soft Introspective Variational Autoencoder that combines attribute regularization of the latent space within the framework of an adversarially trained variational autoencoder.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. It introduces attribute regularization loss in the SIVAE framework to preserve the interpretability of the latent space.
    2. The proposed method overcomes the limitations associated with blurry reconstruction while maintaining latent space interpretability.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In qualitative evaluation of the reconstruction of two samples, it is better to highlight the difference between baselines and proposed method.
    2. In the reconstruction performance, have you conducted the experiment in multiple trials? How about the standard deviation? If the SSIM of proposed method is still worse than baselines or its LPIPS is better than others?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    see above

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall experiment demonstrates the effectiveness of proposed method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank R1, R3, and R5 for their valuable feedback and positive assessment. In this work, we address the “important problem of blurry image generation in cardiac MRI while maintaining latent interpretability” (R3, R5). To this end, we propose AR-SIVAE, which “creatively” combines attribute regularization with adversarial autoencoders (R3). Our experiments demonstrated that our method is “effective in overcoming the limitations of blurry reconstruction while preserving latent space interpretability” (R1, R3, R5)

Main points raised:

[R1, R3] Missing evaluation of the interpretability in downstream tasks: In this work, we focus on overcoming the blurriness of VAE-based methods while preserving latent space interpretability. This could be for instance of high interest for downstream applications where detailed reconstruction ensures that subtle differences in ventricular morphology are captured accurately to differentiate between subtypes of cardiomyopathy or other cardiac conditions. Currently, there is no consensus on XAI approaches [A] and rigorously evaluating the true interpretability of downstream tasks requires thorough exploration, which was not feasible within the allocated space constraints. Prior work [3] demonstrated the capability of Attri-VAE (the baseline in our paper) to perform well for the interpretable classification of cardiac disease. The rigorous evaluation of the downstream applications was not in the scope of this manuscript, but we agree that it is an interesting avenue for future work and we plan a thorough clinical-oriented analysis.

[R3] Complementary information on related works, that will be added to the paper. The advancement of VAE generation capabilities can be categorised into approaches that focus on enhancing the network’s architecture, integrating more robust priors, introducing regularisation techniques or integrating adversarial objectives [4]. The latter has the benefits of combining the generative capability of GANs and the inference capability of VAEs, which is needed to add attribute regularisation. SIVAE has been demonstrated to be state-of-the-art in this family of approaches [4]. Concerning the attribute regularisation, its main benefit is that it can handle continuous variables. It allows to have a structured latent space and allows to have correspondences between an attribute and a specific dimension in the latent space[11]. One limitation of this approach is its capability to regularise highly correlated attributes.

[R3] Counter effect of attribute regularisation on deblurring: Adding the attribute regularisation indeed has a minimal counter effect on the quality of the reconstruction. The delimitation of the cardiac structures and of those in the background are less sharp, but we gain interpretability. This comment will be added to the figure analysis.

[R1] Difficulty of convergence: Despite mentioning that SIVAE is hard to converge, detailed convergence analysis and theoretical upper bounds are provided in the original reference. We use their observations to obtain stable training and conduct a hyperparameter search around those values. The attribute-regularization loss was then added to the training and the related hyperparameters were chosen empirically. We wanted to highlight with this sentence that the proposed method has a higher complexity than the baseline.

We thank again the reviewers for their valuable feedback and suggestions to improve the paper. We will incorporate their comments into our revised manuscript to the best of our abilities.

[A] Adebayo, et al. “Sanity checks for saliency maps.” NeurIPS (2018)




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers have noted both strengths and weaknesses of this paper, and unfortunately they have not engaged in the rebuttal discussion. After revising the authors’ responses, I’m recommending acceptance provided the authors address the reviewers comments (when appropriate) in their revised version, particularly those that pertain details on the numerical evaluation (standard errors, etc).

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Reviewers have noted both strengths and weaknesses of this paper, and unfortunately they have not engaged in the rebuttal discussion. After revising the authors’ responses, I’m recommending acceptance provided the authors address the reviewers comments (when appropriate) in their revised version, particularly those that pertain details on the numerical evaluation (standard errors, etc).



back to top