Abstract

In neuroimaging, generally, brain CT is more cost-effective and accessible imaging option compared to MRI. Nevertheless, CT exhibits inferior soft-tissue contrast and higher noise levels, yielding less precise structural clarity. In response, leveraging more readily available CT to construct its counterpart MRI, namely, medical image-to-image translation (I2I), serves as a promising solution. Particularly, while diffusion models (DMs) have recently risen as a powerhouse, they also come with a few practical caveats for medical I2I. First, DMs’ inherent stochasticity from random noise sampling cannot guarantee consistent MRI generation that faithfully reflects its CT. Second, for 3D volumetric images which are prevalent in medical imaging, naively using 2D DMs leads to slice inconsistency, e.g., abnormal structural and brightness changes. While 3D DMs do exist, significant training costs and data dependency bring hesitation. As a solution, we propose novel style key conditioning (SKC) and inter-slice trajectory alignment (ISTA) sampling for the 2D Brownian bridge diffusion model. Specifically, SKC ensures a consistent imaging style (e.g., contrast) across slices, and ISTA interconnects the independent sampling of each slice, deterministically achieving style and shape consistent 3D CT-to-MRI translation. To the best of our knowledge, this study is the first to achieve high-quality 3D medical I2I based only on a 2D DM with no extra architectural models. Our experimental results show superior 3D medical I2I than existing 2D and 3D baselines, using in-house CT-MRI dataset and BraTS2023 FLAIR-T1 MRI dataset.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0531_paper.pdf

SharedIt Link: https://rdcu.be/dV5Fr

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72104-5_63

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0531_supp.pdf

Link to the Code Repository

https://github.com/MICV-yonsei/CT2MRI

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Cho_SliceConsistent_MICCAI2024,
        author = { Choo, Kyobin and Jun, Youngjun and Yun, Mijin and Hwang, Seong Jae},
        title = { { Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {657 -- 667}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed two mechanisms for CT-and-MRI translation using 2D BBDM, with a goal to guarantee the 3D consistency in histogram and shape. The first SKC ensures the histogram consistency, the second ISTA ensures the shape consistency between slices. The experiment evaluation shows its advantages over other comparison methods with small margins. An ablation study was also included.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well organized and the presentation is clear

    2. Using BBDM for image-to-image translation with guarantee of shape and histogram consistency is interesting

    3. The author claimed to release the code after the acceptance

    4. An ablation was provided to show the effectiveness of SKC and ISTM

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The main issue I have for this paper is the evaluation part. The authors’ claim is about the 3D consistency in terms of the histogram and shape, however, from the metrics used and the results presented, it is difficult for me to see the better histogram and shape consistency using these metrics, except visually checking them. One suggestion is to use a segmentation task as a proxy task to evaluate the shape consistency or the structure correspondence, as shown in this paper [1].

    (2) Also the performance seems to be limited compared with other 3D methods ( RevGAN ). Are these results statistically significant? In paper [2], its performance is better (Table 3 FLARI->T1), can you clarify?

    (3) Can you provide more details on how 2D methods were evaluated? Do you translate each 2D slice and then group them into a 3D volume, then do the evaluation? can you clarify?

    [1] Wang, Weilun, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, and Houqiang Li. “Semantic image synthesis via diffusion models.” arXiv preprint arXiv:2207.00050 (2022). [2] Özbey, M., S. U. Dar, H. A. Bedel, O. Dalmaz, Ş. Özturk, A. Güngör, and T. Çukur. “Unsupervised medical image translation with adversarial diffusion models. arXiv.” arXiv preprint arXiv:2207.08208 (2022).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Why were 100 DDIM sampling steps used? How about the results using 1000 steps? What is the tradeoff between inference time and sample qualities for the proposed method and other comparison methods?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the weakness section

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is a good paper with organized presentation. The performance improvement is relatively limited compared with the 3D method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposed two novel methods, Style Key Conditioning (SKC) and Inter-Slice Trajectory Alignment (ISTA), to address the slice inconsistency issue in general 2D models which is a very important problem for 3D volumetric images processing. The experiments on two datasets including in-house CT-MRI dataset and BraTS2023 FLAIR-T1 MRI dataset demonstrate that the proposed method’s effectiveness and robustness through ablation studies

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
      • This paper is clearly written and easy to follow. The proposed framework is technically sound.
      • The topic of Slice inconsistency issue caused by 2D model is very interesting and common, which is also very challenging and notoriously difficult to address.
      • Extensive ablation study demonstrating the effectiveness of the proposed contributions
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. – It would be more convincing if more pathology cases are demonstrated and compared between different methods.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It would be great if the CT-MRI dataset can be shared to medical image-to-image translation community.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The final translation result by proposed method seems a little bit over-smooth compared to target image in both Fig. 4 and Fig. 1 in the supplementary materials, especially for the cerebellum region.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is well written and easy to read. The contributions are well verified through extensive experiments and the proposed method outperforms other SOTA methods.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes an approach to synthesize 3D MRI volumes from CT source volumes using a 2D Brownian bridge diffusion model. The model was trained on 2D slices but can be effectively applied to volumes ensuring inter-slice consistency due to a sampling scheme that average over the slice-adjacent predictions and employs a subsequent correction step, similar to PC sampling. The style consistency is enforced using a conditioning mechanisms based on the target volume histogram.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The effectiveness of the method for CT-MRI translation is shown qualitatively and quantitatively. Moreover, the contribution of the individual blocks (SKC and ISTA) is shown to be beneficial, which is also revealed by the ablation. The introduced ISTA can in general be very important when memory limitations do not allow to work in the 3D volume domain.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The style key during sampling is set to the averaged histograms in the training set. A comparison with different strategies would be very interesting, as averaging over many training histograms might smooth out relevant features of the histogram. Other approaches such as choosing the N nearest neighbours of the source CT training images and averaging over their MRI histograms could be explored.

    The obtained translated MRI images, e.g. in Figure 1, appear smoothed and high-frequency information gets lost when compared to the ground truth. Could that be due to the ISTA sampling approach that includes an averaging operation? If so, are there strategies to avoid this?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Some clarifications would be very helpful, e.g. discussing more the smoothing effect of the proposed method as well as explaining more on some choices such as why 3 slices were chosen for ISTA.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In general, the paper is well written and the results undermine the proposed approach. Some minor clarification would be very helpful in addition.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

R1: More pathology cases? Although not explicitly shown, our superior results are based on the datasets already containing pathological cases: CT-MRI (45% AD patients) and BraTS (tumor cases).

R1/R3: Resulting MRI shows smoothing. Is this due to ISTA? We agree that the synthetic MRI may appear smoother compared to real MRI. However, on closer look, this level of smoothing is present in any diffusion-based MRI synthesis study we know of. Further, despite our strategic choice to improve structural reliability at the cost of sharpness (aka, perception-distortion tradeoff), we still show the best structural clarity with anatomical fidelity (Fig. 4). Also, we confirmed there is no smoothing effect from ISTA (Supp. D).

R3: Instead of averaged MRI histogram, how about MRI histogram of CT’s nearest neighbors for the style key? We agree that a detailed study on selecting the style key would be beneficial. As in Sec. 2.2, the histogram is largely independent of anatomy, and since CT and MRI are acquired separately, similar CTs do not necessarily have similar MRI histograms. Instead, using a single MRI histogram as the style key produced satisfactory results reflecting that style (Table 2 for Ours_best & Supp. E).

R3: Why chose 3 slices for ISTA? As in Sec. 2.3, we chose 3 slices to maintain the same computational load as a pure 2D BBDM, which is the minimum number for overlapping inference. Increasing the number of slices would aid consistency between distant slices, but local shape consistency is sufficient, and SKC already ensures global style consistency. Therefore, increasing the input slices did not significantly change performance. Moreover, as ISTA sampling step is repeated, the guided consistency between adjacent slices spreads to more distant slices, making 3 slices reasonable.

R4: A direct metric for slice consistency is needed (e.g., [1]). The suggestion to apply [1]’s evaluation method indicates there may be a misunderstanding of our main contribution. R4 seems to refer to consistency between synthetic and target volumes. Of course, BBDM+SKC also ensures that consistency, but ISTA and SKC focus on shape and style consistency between slices in synthetic volume from 2D slice-wise inference. Thus, the synthetic volume mimicking the style key histogram (which R4 seems to think is SKC’s main purpose) is just a byproduct of SKC. Further, the suggested metric [1] (direct mask comparison) can measure the “identity” between slice i of volume A and slice i of volume B, but this is not suitable for measuring structural “continuity” between slice i and i+1 of volume A. Instead, our metrics naturally measure the volume-level errors caused by slice inconsistency within the volume. Also, our method clearly shows dramatic improvements in slice consistency (Fig. 1 & Supp. D), demonstrating its effectiveness.

R4: For 2D methods, did you evaluate after stacking each slice into a volume? Yes, since our focus is on volume level, we naturally evaluated in this manner.

R4: The metric shows a slight increase over the 3D method. Our 2D method outperformed all 2D and 3D baselines in both visual quality and metrics, showing statistically significant improvements across all metrics on both datasets (paired t-test, P<0.05).

R4: FLAIR-to-T1 performance of [2] is better. This is not true. Metrics improve as the proportion of image background (value 0) increases. Due to 3D baselines’ memory footprint, we cropped the background more tightly to 176x176, which is disadvantageous for metrics. When we re-measured Ours_avg’s results only with zero-padding to 256x256 (as in [2]), our performance was clearly better. This can be easily reproduced (code will be released).

R4: Why 100 DDIM steps? Tradeoff between sampling time and quality? All diffusion baselines used the default time steps in their papers. For our method, 100 steps showed a marginal difference compared to 1000. Since 100 steps already outperformed baselines, we prioritized efficiency.




Meta-Review

Meta-review not available, early accepted paper.



back to top