Abstract

Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the reconstruction of distinct sequences from the common latent space. We propose a generative model that compresses discrete representations of each sequence to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences. Moreover, we improve the latent space consistency with contrastive learning and increase model stability by domain augmentation. Experiments using BraTS2021 dataset show that our non-adversarial model outperforms other GAN-based methods, and VQC latent space aids our model to achieve (1) anti-interference ability, which can eliminate the effects of noise, bias fields, and artifacts, and (2) solid semantic representation ability, with the potential of one-shot segmentation. Our code is publicly available.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0936_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/fiy2W/mri_seq2seq

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Han_NonAdversarial_MICCAI2024,
        author = { Han, Luyi and Tan, Tao and Zhang, Tianyu and Wang, Xin and Gao, Yuan and Lu, Chunyao and Liang, Xinglong and Dou, Haoran and Huang, Yunzhi and Mann, Ritse},
        title = { { Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents an interesting idea of representation learning for multi-modal MRI scans by leveraging a vector-quantized VAE with a common latent space. The proposed method is beneficial for three main downstream tasks: modality synthesis, semantic segmentation, and denoising (or inverse problem in general).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.Thu authors propose an interesting way to learn a common representation of multi-modality by leveraging the vector quantization. 2.The authors further propose a latent space consistency loss to enforce the alignment of latent vectors of different multi-modalities. 3.Extensive experiments on three downstream tasks demonstrated the effectiveness of the proposed method in learning a common representation of multi-modal MRI scans.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.The author claims that this vector-quantized common latent space learning is a non-adversarial learning that outperforms many GAN-based methods. But I do not see any conflicts of applying adversarial learning at the top of this work. For example, (1) VQGAN (which is well-recognized as a tokenizer for many computer vision tasks) applies adversarial training in addition to a VQVAE. This further improves the performance of VQVAE. (2) IntroVAE also provides an alternative way to combine VAE and adversarial learning to improve the performance. Therefore, I do not see the point of advertising non-adversarial learning in this paper, as I think those two components are orthogonal and hence can be combined. 2.The authors use a non-parametric (continuous) Gaussian-like prior (i.e., p(z)) in Eq. (4) by imposing a strong assumption of the joint prior distribution of multi-modal scans (e.g., independence, which is contradicted to the fact that modalities are correlated). Whereas VQVAE and VQGAN use a learned parametric prior (e.g., PixelCNN in VQVAE and Transformer in VQGAN). The learned prior in VQVAE and VQGAN is discrete and from a categorical distribution, which may better model the mode of the data; while the proposed method is more like a semi-discrete method (i.e., continuous prior with a discrete dictionary). With that said, I suggest the authors to be more precise in their claim and compare the proposed method with VQVAE at least (VQGAN as preferred) using discrete prior. 3.The proposed latent space consistency is more like a knowledge distillation between z1 and z2 with a contrastive learning. I suggest the authors to have a more in-depth analysis from this perspective and how this can be helpful to facilitate the learning of a better common space z_s. 4.No statistical test was performed.

    [1] Taming Transformers for High-Resolution Image Synthesis [2] IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    no issues here

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See weakness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think this paper provides an interesting idea of learning the common latent space of multi-modal MRI scans. This is a hot but challenging task in general, and should also be interesting in MICCAI community. Although some claims of this paper should be enhanced and more in-depth investigation should be conducted, the results from extensive experiments on downstream tasks seems encouraging.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents a novel generative model for synthesizing MRI images across multiple sequences without requiring adversarial learning, which is often used in existing methods but can lead to instability and mode collapse.

    The proposed model leverages a vector-quantized common (VQC) latent space, derived from discrete latent representations compressed by VQVAE.. This approach enhances stability and semantic consistency in the generated images and shows potential for one-shot segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The author proposed a novel way to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences.

    The reason of the idea is clear and clarified.

    The ablation study provides a comprehensive validation of the different aspect of the proposed methods.

    State-of-the-art results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method part is not strict and clear (e.g where is M in eq. (7)).

    There is only one dataset has been used. Thus the variability is not verified. The author could add a breast MRI dataset.

    The selection of hyper-parameter is arbitrary.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Could the author add a anonymized link to the source code and model weights in the rebuttal phase?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The method part should be more clear and clarified.

    Add more datasets to verify the effectiveness

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The methods seems reasonable and effective but the detail should be more clarified. And the results seems good but the author should provide a way to verify it.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper deals with the task of MRI synthesis - given a sequence of MRI images of an object captured with different methods, the goal is to computationally generate additional MRI sequences that are not directly available. The authors build on seq2seq(a method that was introduced for this exact problem and is considered the baseline), by replacing its latent space with a discrete vector-quantized latent space, adding an additional loss term for latent space consistency and further adding training augmentations. The proposed method is shown to be superior in single-step MRI synthesis when compared to both multi-step and single step MRI synthesis of the baseline and other related methods. Furthermore, the method is shown to ignore/prevent artifacts, noise and field bias when generating MRI sequences.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Non-adversarial approach circumvents known issues with GANs such as mode collapse and results in a simpler training scheme.
    2. Superior performance using only single step MRI synthesis when compared to both single and multi step synthesis of baseline and other methods. Indicating a rich latent space and simpler inference pipeline.
    3. Extensive validation - solid ablation and comparison to 3 other methods + baseline.
    4. Novelty - vector-quantized latent space hasn’t been used for MRI image synthesis and offers intuitive benefits when considering the amount of noise in the domain. Non-adversarial learning comes with many benefits and is a considerable shift from the state-of-the-art. The additional consistency loss terms, which is contrastive in part, is also novel in the field of MRI synthesis.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Evaluation on a single dataset - generalisation of method to other cases is not explored.
    2. Before applying the proposed method to a new dataset, hyperparameter selection(codebook size K and dimension D) is required which might be compute-intensive. Gamma for consistency and VQ losses in the final loss term might also change between datasets.
    3. Claiming strong representation generalization, while showing results on a single dataset and a single downstream task is a bit limited and potentially overreaching. Such claims should ideally be supported by comprehensive testing across multiple datasets and a variety of tasks to ensure the generalizability and robustness of the method under different conditions.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. For all tables, the amount of models used to calculate mean and std is not specified, which makes it difficult to understand the statistical significance of the results.
    2. Phrasing and details - 2.1. I suggest using a grammar tool, phrases like “statistic on z_q, we can estimate…” and “we sample to train the nnU-net…” can be improved. 2.2. In table 3 - I think theres enough space to not use abbreviations, since the reader might not be familiar with them. 2.3. In the introduction, a little information about MRI sequences will be useful, what are T1, T2 and so on. As it is right now, I wouldn’t say that the paper is self-contained. 2.4. In the “Compression and Representation” section, please provide details on how the latent representations were input into the nnU-Net. Since from my understanding, nnU-Net accepts images as inputs.
    3. Figure 5 does not give any interesting information in my opinion and can free up space for more details
    4. In the “Compression and Representation” section, consider comparing your method to other representations such as dinov2.
    5. While the mean and std defined in the “Uncertainty Estimation” section are used for the consistency loss term, there is no discussion of uncertainty or confidence in the paper. I suggest removing this section and defining the mean and std in the VQC Latent Space.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Considering the novelty discussed in the strengths section, I believe that the ideas presented in the paper can be extended and applied to other domains in medical imaging, making them of significant interest to the community. Coupled with strong validation — albeit on only one dataset — this is likely sufficient for representing an incremental change in the state of the art

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We greatly appreciate the reviewers for the effort and insightful comments regarding our submission. We will further correct errors and clarify all the concerns of the reviewers in the final version. And we will evaluate our method on extensive datasets in the following research.

Reviewer#1

  • Hyperparameters selection Indeed, the selection of K and D was a little bit time-consuming. Other hyperparameter selection was based on relevant references, and was not additionally tuned for the training set.
  • Tables For all tables, results are presented as means ± standard deviations for all test samples. Inference on the test sample did not use averaging of multiple model results.
  • Phrasing (1) By statistics on z_q, we can estimate a VQC latent space … (2) To demonstrate this, we train the nnU-Net model based on the VQC latent space for brain tumor segmentation.
  • Table 3 ET: enhanced tumor, TC: tumor core, WT: whole tumor
  • Introduction We will add more information for T1 and T2 in the revision.
  • nnunet We calculate mu_q(X) in Eq. 4 from all sequences and then upsample it to the original size of the image as the input of nnunet.
  • dinov2 We thank the reviewers for their comments. The segmentation experiments here demonstrate the representation capabilities of VQC and are, therefore, only compared with other image-to-image translation models. We will further compare it to other SSL methods, such as dinov2, in the following works.
  • Uncertainty Estimation We will reorganize it in the revision.

Reviewer#3

  • Use of adversarial learning The parameter settings of adversarial training vary in different model, datasets, or tasks for medical images, making it difficult to achieve fully automated parameter settings. We refrain from using it to ensure a fair comparison.
  • Prior distribution IntroVAE and VQGAN are mainly used to generate images from random distributions and cannot be directly used for image-to-image translation tasks. It is not clear how to compare with VQGAN in this task.
  • knowledge distillation We are very grateful to the reviewers for their constructive comments, which provide new ideas for our following research.
  • statistical test We will add statistical test in the revision.

Reviewer#4

  • Eq.7 M=downsample((X1>0)&(X2>0)), refers to the foreground of z1 and z2.
  • Hyperparameters selection Our hyperparameter selection was based on relevant references, and all compared methods applied the same hyperparameters.




Meta-Review

Meta-review not available, early accepted paper.



back to top