Abstract

Remarkable progress has been made by data-driven machine-learning methods in the analysis of MRI scans. However, most existing MRI analysis approaches are crafted for specific MR pulse sequences (MR contrasts) and usually require nearly isotropic acquisitions. This limits their applicability to the diverse, real-world clinical data, where scans commonly exhibit variations in appearances due to being obtained with varying sequence parameters, resolutions, and orientations – especially in the presence of pathology. In this paper, we propose PEPSI, the first pathology-enhanced, and pulse-sequence-invariant feature representation learning model for brain MRI. PEPSI is trained entirely on synthetic images with a novel pathology encoding strategy, and enables co-training across datasets with diverse pathologies and missing modalities. Despite variations in pathology appearances across different MR pulse sequences or the quality of acquired images (e.g., resolution, orientation, artifacts, etc), PEPSI produces a high-resolution image of reference contrast (MP-RAGE) that captures anatomy, along with an image specifically highlighting the pathology. Our experiments demonstrate PEPSI’s remarkable capability for image synthesis compared with the state-of-the-art, contrast-agnostic synthesis models, as it accurately reconstructs anatomical structures while differentiating between pathology and normal tissue. We further illustrate the efficiency and effectiveness of PEPSI features for downstream pathology segmentation on five public datasets covering white matter hyperintensities and stroke lesions.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0253_paper.pdf

SharedIt Link: https://rdcu.be/dY6k1

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_63

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0253_supp.pdf

Link to the Code Repository

https://github.com/peirong26/PEPSI

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Liu_PEPSI_MICCAI2024,
        author = { Liu, Peirong and Puonti, Oula and Sorby-Adams, Annabel and Kimberly, W. Taylor and Iglesias, Juan E.},
        title = { { PEPSI: Pathology-Enhanced Pulse-Sequence-Invariant Representations for Brain MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {676 -- 686}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents PEPSI, a method for learning pathology-enhanced, contrast-invariant representations on brain MRIs. Unlike existing works, the focus of this work is to utilize pathological information (lesion/stroke hyperintensities) to generate synthetic images encoded with pathology. Some novel aspects of the work include incorporating pathological information during image synthesis through anomaly probability maps and co-training with different pathology datasets with/without direct supervision from label maps. Results on downstream MS lesion segmentation tasks on open-source datasets demonstrate the effectiveness of PEPSI-pretrained weights compared to the standard, random-initialization approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem of learning contrast-agnostic features is an interesting and potentially impactful one. There is a need for a single/universal model which works/generalizes well on several contrasts as opposed to one model per contrast.

    The emphasis on encoding pathology in synthetic images is interesting, although a challenging problem.

    PEPSI considers both T1w and FLAIR contrasts to balance the synthesis of both healthy tissues and white matter hyperintensities as it is difficult to encode both healthy and pathological information from a single contrast. As a consequence, missing contrasts in datasets are also synthesized.

    The Dice score improvements on lesion segmentation when using PEPSI-based pretrained weights over random initializations is quite compelling.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the approach taken in this paper to learn contrast-agnostic features is quite interesting, there are several avenues where more clarity is required along with some justifications for specific design choices, and the improvements claimed over the existing baselines.

    It is unclear if the approach used for encoding pathology information in the synthetic images accounts for the partial volume effects at the border of the healthy tissue and the (newly-enhanced) pathology region.

    Figure captions are not clear. It is hard to understand the figure based on what’s described (e.g. Figures 2 and 3)

    The authors claim that their approach (PEPSI) is robust against performance drops during the cross-modality (FLAIR → T1w) synthesis compared to the baselines. However, the improvements over baselines are not substantial and without any more information on how many models were trained, seeds, etc., make it difficult to appropriately judge PEPSI’s performance.

    More details are in point 10.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No more comments. Anonymized link to the code is provided and the authors claim to release it to the public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Section 2.1: What was the rationale behind using this formulation for P(x) for generating the anomaly probabilities? Is it that the pathologies are typically darker/hypointense in T1w and are hyperintense in FLAIR contrasts?

    When incorporating the anomaly probabilities into the S0 image, how are the partial volume effects accounted for? Are the intensities of the healthy tissue regions surrounding the anomaly maps modified in some way to ensure smooth transition between healthy and hyperintense tissue?

    Caption is missing for the Figure in Section 2.2.

    Table 1: It is quite surprising to note that “Pepsi (No-Seg)” performs better than Pepsi (Dir-Seg) despite having direct supervision from label maps. While the ground-truth labels in the datasets might not be non-exhaustive, it is still surprising that supervision does not result in better-informed synthetic images. Can the authors comment on this aspect?

    In Section 3.1 Paragraph 1, the authors mention “…. other variants suffer from larger performance drops for FLAIR-to-T1w synthesis”. I don’t see how that’s the case, here is an example comparison with SynthSR on ADNI3 dataset:

    SynthSR L1:
    T1w –> T1w: 0.023
    FLAIR –> T1w: 0.027 Drop: (-)0.004 PSNR: T1w –> T1w: 23.51 FLAIR –> T1w: 23.25 drop: (-)0.26

    PEPSI (proposed) L1 metric: T1w –> T1w: 0.02 FLAIR –> T1w: 0.023 drop: (-)0.003 PSNR: T1w –> T1w: 26.67 FLAIR –> T1w: 25.62 drop: (-)1.05

    While PEPSI only improves in L1 metric (by 0.001), the magnitude of performance drop in PSNR and SSIM is larger than SynthSR. Moreover, the drops/improvements are in 3rd or 4th decimal for a few metrics which might be a rounding error when multiple models/seeds are trained.

    The caption for Figure 3 is incomplete and it is hard to understand it without more information on what the individual rows mean and the (several) arrows on the images mean.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Concerns over some quantitative results, lack of clarity in the figures. More clarifications needed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    review appropriately addressed



Review #2

  • Please describe the contribution of the paper

    The paper proposes a new method, called PEPSI, that aims to learn MR sequence-invariant representations for brain MRI with a special focus on pathologies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The topic of the paper is very important (i.e., learn invariant representations with a focus on pathologies), and the authors support their proposal wilth extensive experiments using 3 public datasets (ADNI3, ATLAS, ISLES).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    To understand the paper, we need to read reference [21] with care. It is unclear how much the quality of the results is dependent on the pre-trained segmentation models that produces the masks for the loss term in equation 4.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    More explanations/documentation in the anonymized repository would be useful.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is relevant and proposes an interesting new methodology. However, the writing could be clearer. I struggled to understand what was being used as input and ground truth for training the model.

    After equation 3, there is a figure with no caption.

    Fig 3 is very hard to see (6x9 mosaic of brain images), and its caption could be improved.

    A brief discussion of training times would be useful as well.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an interesting new method that shows improvement for downstream segmentation tasks. However, the methodology description could be clearer and issues, such as a figure with no caption, should be fixed.

    I would like to see the authors’ rebuttal before increasing my score to accept.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I believe this work is very interesting and with the answer from the authors stating that they will fix the main issues from my previous review, such as missing figure caption and clearer explanation, I have increased my score to accept.

    I believe this paper will be of interest to the MICCAI community.



Review #3

  • Please describe the contribution of the paper

    This paper proposes an MRI standardization framework, PEPSI, based on image synthesis to handle the MR data variability in both pathology and acquisition (contrast, resolution, and image quality). The proposed method incorporates an existing data augmentation pipeline to generate synthetic images of the same subject with varying appearances. A dual-learning model is then applied to undo the data-augmentation process by restoring the original high-quality image with an additional consistency supervision in pathology. The authors evaluated their method in image synthesis tasks and downstream segmentation tasks on multiple public datasets. Results are convincing and demonstrate promising performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper overall holds a strong promise in clinical application. The authors have considered several key aspect to make the method more clinically applicable.
      • The method is implemented in 3D.
      • The authors have considered the heterogeneity and scarcity nature of manual delineations and proposed implicit pathology supervision to circumvent this issue.
      • The authors considered various factors such as resolution and image quality to improve the robustness of the method in clinical settings, where data are highly variable.
    2. The idea of generating data from labels is well-motivated, especially after incorporating pathology cases. By doing so, the authors decoupled multiple factors of variability in real world datasets, so the proposed data synthesis model can be trained on more fair and balanced datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed method requires generating synthetic images from labels, which were obtained from automatic segmentation algorithms on high quality datasets. The impact of segmentation inaccuracy and systematic bias in these reference datasets (e.g., healthy vs different health conditions) on the final results of the proposed PEPSI method remains unclear.

    2. The data augmentation process clearly has an impact on the overall performance of PEPSI. The authors did not mention the limitation or “boundary” of their method. The proposed method augments training batches by introducing pre-selected augmentations, such as degrading resolution, adding noise, etc. These augmentations are unlikely to cover the full spectrum of real world cases. It is unclear how the proposed method generalizes to unseen cases of data variability.

    3. The assumption of Gaussian image intensity with a fixed mean and standard deviation within the pathology mask oversimplifies the problem. Brain pathology is highly heterogeneous in shape and image intensity. For example, various diseases (cardiovascular disease and multiple sclerosis) can cause white matter hyperintensity but they have distinctive features. This oversimplified Gaussian assumption is a limitation and it may hinder the proposed method to be extended to fine-grained synthesis with more complicated pathological cases.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper has great reproducibility by publishing code (anonymized), experimenting with public datasets, and providing key implementation details.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors are strongly encouraged to perform statistical tests in Table 1 to demonstrate if the performance gain between PEPSI and other comparison methods is statistically significant.

    2. Some notations in the paper are confusing. Efforts can be done to improve clarity of the presentation.

      • It is confusing to use $I^{anat}$ and $I^{pathol}$ to denote T1w and T2w-FLAIR images. While these two modalities mainly capture general structural and pathological information, they should not be labeled in such way.
      • Some notations are not defined in the paper, for example, $\tilde{S}$.

    === Below are general feedback for future work, not for rebuttal (if applicable) ===

    1. While implicit pathology supervision provides a way to encourage consistency of segmentation between synthetic and reference images, it seems that training PEPSI also requires a pathology segmentation algorithm. In their future work, the authors may consider exploring:
      • The impact of different choices of pathology segmentation algorithms on the final PEPSI performance.
      • If it makes sense to incorporate PEPSI during segmentation training.
    2. The authors may consider extending their method to more variable pathology cases such as brain tumor, as well as fine grained structures within pathology region.

    3. The paper currently only compares to methods focusing on training with synthetic data. The authors are encouraged to compare their method with a broader spectrum of methods that are all tackling the data variability issue in MRI:
      • Domain generalization methods [1-3]. While PEPSI shows improved Dice coefficient in Table 2, it is still relatively low compared to other lesion segmentation algorithms trained with domain generalization.
      • MRI standardization/harmonization methods [4-6].

    [1] Zhang et al. Domain generalization for robust MS lesion segmentation. SPIE-MI 2023. [2] Zhang et al. Harmonization-enriched domain adaptation with light fine-tuning for multiple sclerosis lesion segmentation. SPIE-MI 2024 [3] Hu et al. “Mixture of calibrated networks for domain generalization in brain tumor segmentation.” Knowledge-Based Systems 2023. [4] Zuo et al. HACA3: A unified approach for multi-site MR image harmonization. CMIG 2023. [5] Moyer et al. Scanner invariant representations for diffusion MRI harmonization. MRI 2020. [6] Jeong et al. BlindHarmony:” Blind” Harmonization for MR Images via Flow model. ICCV 2023.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes an interesting contribution to mitigate the variability in MR imaging. This work presents a well-motivated extension to an existing literature [1], with a special focus on brain pathology. The method is overall novel and intriguing. The authors have made considerable efforts to improve the clinical applicability. While the paper faces some inherent limitations that may impact its future extension, it bears enough merits to intrigue valuable discussion at MICCAI.

    [1] Liu et al. Brain-ID: Learning robust feature representations for brain imaging. 2023.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed my critiques about notations, clarified their Gaussian assumptions on intensity modeling, and softened their claims about PEPSI’s performance. However, some issues remain, such as the boundary of the limitation and statistical tests. Therefore, I stick to my original decision, and agree that the people has merits to publish at MICCAI.




Author Feedback

We appreciate all reviewers’ valuable comments and suggestions. All reviewers (R1/3/4) agree that (1) “the problem of contrast-agnostic learning with pathologies is interesting, important, and impactful”; (2) “the idea of data generation from labels and pathology encoding is novel, interesting and well-motivated”. R1/3 appreciate that “extensive experiments support PEPSI’s superiority” on various public datasets and pathologies. R4 values that PEPSI has “great reproducibility”, “strong promises in clinical application”, and can “intrigue valuable discussion at MICCAI”.

First, we would like to address two comments from R1:

  • Modeling of partial volume effects: the “corruption pipeline [15]” referenced in the last paragraph of Sec. 2.1 includes a model of partial voluming from [2]. We will clarify this in the camera-ready version. Partial voluming is further simulated by the continuous mixing of pathology and healthy tissue proposed in our work.

  • Results: ⚬ “No-Seg” outperforms “Dir-Seg”?: While the extra segmentation loss in Dir-Seg enables better representations overall, it penalizes metrics in Table 1 which only evaluate image synthesis (this is consistent with Table 3 in [21]). ⚬ General performance of PEPSI: We agree that although PEPSI consistently outperforms competing methods in Table 1, it does not achieve large improvements in every setup. We will soften our claims in Sec. 3.1 in the camera-ready version.

We would also like to clarify other points made by the reviewers:

  • Rationale for pathology probability generation (R1): We use a heuristic that was shown in [18] to improve performance in modeling white matter lesions, where pathology is typically darker in T1w and brighter in T2w/FLAIR. In the camera-ready version, we will rephrase the description above Eq. (1) (and cite [18]) to clarify this.

  • Segmentation network in Eq. (4) (R3/4): We want the segmentation model to work well only if the inputs (both healthy and pathological tissue) are realistic and of good quality; if the segmentation network provides good labels for any image, it would be uninformative for the synthesis. Thus, we train a segmentation network using data with minimal corruption. We will add a footnote under Eq. (4) to clarify this.

  • Appearance model of pathology class (R4): The intensities of the voxels of this class are modeled not only by Gaussian (which has been proven successful in prior work [18]) but also by a voxel-wise probability map (Eq. (1)) that modulates the mix of pathological and healthy tissue. This allows us to simulate a wider range of lesion appearances, including the transition from healthy tissue to lesions, while retaining full control of the generative model - which is crucial for domain randomization.

  • Model generalizability (R4): while our model takes advantage of domain randomization techniques to yield models with state-of-the-art generalizability [2,15,18], we acknowledge that it may falter on images with lesion patterns that are very different from those seen in training. We are actually working on a lesion generator that helps models generalize better. We will add a brief discussion of future work in the camera-ready version.

Finally, there were some comments for clarity:

  • Figure caption on Page 5 (R1/3): We apologize for this. We will add the caption: “Left: an axial slice of a FLAIR scan from ISLES dataset, with WMH marked in red. Right: corresponding gold-standard segmentation of abnormalities, which only includes stroke lesions (no WMH).”

  • Notations (R4): We will change $I^{anat}$ & $I^{pathol}$ to $I^{T1}$ & $I^{T2/FLAIR}$; We will add definition for $\tilde{S}$: “the predicted pathology”.

  • Fig. 3 caption (R3): We will rephrase to: “Qualitative comparisons on T1w and FLAIR synthesis (↔ highlights pathologies). Rows (columns) refer to datasets (compared methods).”

  • Training time (R3): We will add a short sentence to Implementation Details: “Training took ~5 days on an NVIDIA RTX8000 GPU.”




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    After the rebuttal phase all reviewers have modified their scores to Accept or Weak Accept which demonstrates the authors have successfully addressed the concerns of the reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    After the rebuttal phase all reviewers have modified their scores to Accept or Weak Accept which demonstrates the authors have successfully addressed the concerns of the reviewers.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have sufficiently addressed the concerns of the reviewers, who agreed on the work’s value to the MICCAI community.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have sufficiently addressed the concerns of the reviewers, who agreed on the work’s value to the MICCAI community.



back to top