Abstract

Neuroimage modalities acquired by longitudinal studies often provide complementary information regarding disease progression. For example, amyloid PET visualizes the build-up of amyloid plaques that appear in earlier stages of Alzheimer’s disease (AD), while structural MRIs depict brain atrophy appearing in the later stages of the disease. To accurately model multi-modal longitudinal data, we propose an interpretable self-supervised model called Self-Organized Multi-Modal Longitudinal Maps (SOM2LM). SOM2LM encodes each modality as a 2D self-organizing map (SOM) so that one dimension of each modality-specific SOMs corresponds to disease abnormality. The model also regularizes across modalities to depict their temporal order of capturing abnormality. When applied to longitudinal T1w MRIs and amyloid PET of the Alzheimer’s Disease Neuroimaging Initiative (ADNI, N=741), SOM2LM generates interpretable latent spaces that characterize disease abnormality. When compared to state-of-art models, it achieves higher accuracy for the downstream tasks of cross-modality prediction of amyloid status from T1w-MRI and joint-modality prediction of individuals with mild cognitive impairment converting to AD using both MRI and amyloid PET. The code is available at https://github.com/ouyangjiahong/longitudinal-som-multi-modality.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0405_paper.pdf

SharedIt Link: https://rdcu.be/dV1Ov

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72069-7_38

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0405_supp.pdf

Link to the Code Repository

https://github.com/ouyangjiahong/longitudinal-som-multi-modality

Link to the Dataset(s)

https://adni.loni.usc.edu/

BibTex

@InProceedings{Ouy_SOM2LM_MICCAI2024,
        author = { Ouyang, Jiahong and Zhao, Qingyu and Adeli, Ehsan and Zaharchuk, Greg and Pohl, Kilian M.},
        title = { { SOM2LM: Self-Organized Multi-Modal Longitudinal Maps } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {400 -- 410}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The work presented proposes generation of self-organizing maps for longitudinal multimodal data in a self-supervised manner. The maps are generated separately (following LSOR methodology) for each of the modalities (MRI and PET), with an increase in the disease abnormality in the scans taken later in time being driven by a multimodal regularization. The performance of the method is primarily influenced by this regularization (highlighted in supplementary Table S4), which forms the main contribution of the work differentiating it from LSOR.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • The paper presents the methodology and experimental setup in a clear and concise manner. • The strength of the work lies on leveraging prior medical knowledge for building the methodology, which is highlighted by the emphasis on being placed on PET in the weightage (Table S2) in overall loss function. • The evaluation performed in Table 1 is fair in the feature extraction stage (manually obtained from ROI processing) and encoder-decoder for the self-supervised manner being kept the same, which highlight the main contribution of the regularization when compared to other SSL methods. • A methodology for generating latent self-supervised representations of longitudinal multimodal data is presented, and relevant downstream clinical applications are showcased. • Difference in performance of MRI and PET in the estimated disease abnormality is highlighted in Fig. 3, allowing clearer understanding of the task to model the data longitudinally.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • The method is heavily dependent on priori medical knowledge of PET imaging being more clinically relevant than MRI, and encourages a very high importance being placed on the PET images compared to MRI images. In Fig. 3, PET imaging exhibits a greater ability to cluster cases which developed into Dementia early on. Experiments showcase if the multimodality approach is beneficial in this aspect for the downstream tasks, since the Amyloid PET is significantly more successful than MRI, are not present in the paper and would benefit the work strongly. Further, as the SOMs are generated for each modality separately with the models also being trained separately, the effect of generating a singular SOM is not presented. This would further support the choice of treating the two modalities separately. Comparison with methods where the modalities are simply stacked for downstream tasks would also strengthen the reasoning on why accurately modeling longitudinal multimodal data is beneficial for the downstream tasks. • ROI features are extracted manually for processing, and their respective z-scores were utilized for training the network. The pipeline relies on extensive generation of ROI features, and is not as robust as the prior work it is based on (LSOR) where features are generated using CNNs. • Comparison with other referred works [1][2][3] where longitudinal multimodality imaging has been used to detect disease progression is missing, and other available task-specific models [4] are not compared for the downstream tasks presented in Table 1. Such comparisons with the task-specific longitudinal multimodal methods would highlight the capability of the proposed method for the specific downstream tasks. Further justification regarding the choice of evaluation in Table 1 would help further clarify the clinical significance of the method.

    [1] El-Sappagh, S., Abuhmed, T., Islam, S.R., Kwak, K.S.: Multimodal multitask deep learning model for alzheimer’s disease progression detection based on time series data. Neurocomputing 412, 197–215 (2020) [2] Lu, L., Elbeleidy, S., Baker, L.Z., Wang, H.: Learning multi-modal biomarker representations via globally aligned longitudinal enrichments. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 817–824 (2020) [3] Tabarestani, S., Aghili, M., Eslami, M., Cabrerizo, M., Barreto, A., Rishe, N., Curiel, R.E., Loewenstein, D., Duara, R., Adjouadi, M.: A distributed multitask multimodal approach for the prediction of alzheimer’s disease in a longitudinal study. NeuroImage 206, 116317 (2020) [4] Wu, C., Guo, S., Hong, Y., Xiao, B., Wu, Y., Zhang, Q., & Alzheimer’s Disease Neuroimaging Initiative (2018). Discrimination and conversion prediction of mild cognitive impairment using convolutional neural networks. Quantitative imaging in medicine and surgery, 8(10), 992–1003. https://doi.org/10.21037/qims.2018.10.17

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The steps used to generate the features are extensive, making reproducibility on different datasets less straightforward.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    • The pipeline currently relies on features extracted manually from ROI. Generation of features automatically might benefit the work in its robustness, and make reproducibility of the work and its application to other datasets simpler. • Comparison with methods designed specifically for the downstream tasks evaluated would be of great benefit in highlighting the clinical significance of the proposed method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    • The preprocessing involved in feature generation is quite extensive, making reproducibility of the work a long process. • The contribution appears to be limited to the multimodal regularization loss only in comparison to the prior existing LSOR which is builds heavily upon. • The evaluation in Table 1, while highlighting the clinical application of the method in downstream tasks, does not compare with methods designed for such tasks specifically.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The paper has an interesting contribution, but some of the concerns are still unanswered in the rebuttal.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel self-supervised method for analyzing multimodal longitudinal medical images (PET and MRI in this work). Modality-Specific self-organizing maps are designed to capture representations for each modality. Cross-modality regularization is applied to incorporate prior clinical knowledge. Experiments demonstrated that the proposed methods learned interpretable representations across modalities and the longitudinal development of Alzheimer’s Disease, which enhanced the performance of downstream tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The proposed method represents a novel advancement in using SOM representations to model multimodal longitudinal data. This work aligns with MIC targets and can inspire future unsupervised and self-supervised methods. (2) Experiments qualitatively demonstrated the interoperable representations and also quantitatively showed that the learned representations can be useful by improving the performance of downstream tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The discussion and evaluation of the effects and choices of the SOM’s size and neighboring function are limited, as are the choices of the hyperparameters in Eq. (1). Readers may find it difficult to apply the proposed method with their own data.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) The authors are encouraged to investigate more general multi-modal regularization both with and without prior clinical knowledge. After all, in many real-world cases, the relationship between modalities is unknown. It would be very interesting to see if the methods can discover some new relationships across modalities. Adding an ablation study could further strengthen the paper in the future.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is an inspiring paper with novel methods and solid experimental performance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks authors for their rebuttal. Some of my concerns are addressed and I will maintain my score.



Review #3

  • Please describe the contribution of the paper

    This paper proposes SOM2LM, a representation learning strategy that models multi-modal longitudinal neuroimages to study disease development. The authors have a specific focus on longitudinal amyloid PET and MRI for Alzheimer’s disease. The paper is built upon an existing literature [1] by incorporating multi-modality information during the representation learning process. Experiments conducted on publicly available ADNI dataset show promising performance in modeling AD progression and determining amyloid status. [1] Ouyang et al. “Longitudinally-consistent self-organized representation learning.” MICCAI 2023.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The study is very well-motivated and novel. MRI and PET are two of the most commonly used imaging modalities in studying Alzheimer’s Disease (AD). Incorporating multi-modal information (MRI and PET) into an existing LSOR framework makes a lot of sense. The method distinguishes itself from existing representation learning literature by considering longitudinal, multi-modal, and disentangling disease anomaly simultaneously. The authors cleverly conducted a learnable latent space of AD signatures by injecting important inductive biases of multi-modal imaging signs (e.g., PET shows early signs of AD than MR) and longitudinal imaging signs (e.g., abnormality accumulates overtime).

    2. The paper is well-written. The authors clearly presented the hypothesis of the study with rigorously defined notations and problem setup.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed method has a lot of hyperparameters, such as the hinge loss and multiple loss weighting factors. It is unclear how sensitive the predictive power is to the choice of these hyperparameters.

    2. It is unclear how the method handles site effects. Both MR contrasts and SUVR in PET can vary significantly across different imaging platforms, and this variability could cause distribution shift in the extract ROIs.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    === Below are general comments for future work, not for rebuttal ===

  1. The authors are encouraged to explore the impact of hyperparamters on the final results of the proposed method. Specifically, the margin in the hinge loss, loss balancing parameters, etc.
  1. The authors are encouraged to apply the proposed method to other cohorts and investigate site effects in terms of both acquisition and population. For example, a natural extension would be to extend the study to other cohorts of ADSP and investigate the site effects. It would also be interesting to compare ADNI and BLSA, which focuses on normal aging people.
  1. The proposed method bears a lot of potential to be applied to other diseases than AD. For example, multi-modal longitudinal imaging is also commonly applied in lung cancer diagnosis and risk assessment [1]. Including more diseases would improve the overall impact and generalizability of the proposed research.

[1] Li et al. Longitudinal multimodal transformer integrating imaging and latent clinical signatures from routine EHRs for pulmonary nodule classification.” MICCAI 2023.”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a strong extension to an existing framework by incorporating longitudinal multi-modal imaging information into representation learning for studying Alzheimer’s disease. The overall methodology design is well-motivated with in-depth thought about the unique aspects of medical imaging in AD. The presentation is clear and the experiments are thorough and convincing. Overall, this paper is a strong submission and will intrigue valuable discussion at MICCAI.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed my critiques by providing insights on the impact of hyper parameters. The authors are encouraged to conduct detailed experiments in their journal version to further investigate these design choices. After going through reviews of all the reviewers and authors feedback, my decision does not change. This paper has enough merits and is a strong submission to MICCAI.




Author Feedback

We thank the reviewers for the positive feedback on the clear writing & organization (R1, R3, R5), novel method (R3, R5), and convincing evaluations (R3, R5). We will update the manuscript according to the responses below: Q1 (R1): The contribution is limited to the multi-modal regularization compared to LSOR [10]. Re: As stated in the introduction, we embed domain knowledge via 2 novelties: 1) A longitudinal regularization that enforces one direction of SOM to represent disease abnormality. Beyond giving a neuroscientific meaning, it allows us to align SOMs of different modalities, which eases interpretation across modalities compared to SOMs by LSOR (as they are generally not aligned); 2) A regularization that embeds knowledge on disease progression across modalities. These result in more accurate predictors than LSOR. Q2 (R3 & R5): The effect and choice of hyperparameters: 1) size of SOMs, 2) neighboring function, 3) thresholds in losses, 4) loss weighing factors. Re: Here is our intuition: 1) Too small SOMs cannot capture enough information while too large SOMs are noisy. We chose 4x8 which gave the best interpretation. For reference, it leads to each cluster on average containing 32 samples (i.e., 1172/(4*8)) and each column representing a 5-yr age range (i.e., (95 yr-55 yr)/8 columns). 2) Gaussian function is the common choice in SOM. 3) Too small \alpha_(o) (i.e., threshold in longitudinal regularization) results in disease abnormality not being linked to any directions of the SOM, while a too large value can cause overfitting. A guideline is to set it based on the ratio between the minimum time between visits (i.e., 1 yr) and the age range of each column, leading to 0.2 (i.e., 1 yr / 5 yr). However, as the age range of columns can be less than 5yr, we set \alpha_(o) to 0.1. Regarding the multi-modal regularization threshold \alpha_(m,p), the larger its value, the stronger the temporal ordering across modalities is enforced. We set \alpha_(m,p)=0.01 as the difference of abnormality across modalities depends on the disease stage, and a too large value can cause overfitting. 4) Loss weighing factors reweight loss components to similar scales. Too large factors for longitudinal and multi-modal regularizations can lead to less informative representations, while too small values risk the loss of the directionality of SOM and the cross-modal temporal ordering. Q3 (R1): Comparisons with other multi-modal longitudinal methods. Re: Our method aims to get interpretable representations while those fully supervised methods given by the reviewer aim at optimizing predictors. Though we evaluate our method on downstream tasks, our training setting is too different from theirs to make fair comparisons. Q4 (R1): 1) The method relies on and demonstrates (Fig. 3) “Amyloid PET being more clinically relevant than MRI”. Thus, experiments on whether multi-modality is beneficial should be included, i.e., 2) using single modality SOMs, and 3) simply stacking modalities. Re: 1) We are not relying on or demonstrating that amyloid PET is more clinically relevant than MRI. Our model relies on amyloid PET detecting abnormality at an earlier disease stage of AD than MRI. In Fig.3, AD cases are clustered on top in the amyloid PET plot, while the orange lines in the MRI plot reveal a large change, which suggests that both modalities together are better at modeling AD progression than either one alone. 2) Due to the page limit, we did not include the results as it is not surprising that using multi-modalities does have superior accuracy in predicting MCI converter than single modality SOM. 3) It is shown in the “No pretrain” row. Q5 (R1): The method uses ROI features instead of images, which is less robust. Re: We agree it is a limitation of this work and plan to adapt the method to images in future works. We will mention this in the final paper. Q6 (R3): Ablation study on removing multi-modal regularization. Re: It was presented in Table S4.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposes a novel self-supervised method for analyzing multimodal longitudinal medical images. Modality-specific self-organizing maps are designed to capture representations for each modality. The reviewers recognize the technical contributionand the effectiveness of the proposed model. While Reviewer #1 still expresses some concerns, the majority of the issues have been addressed in the rebuttal. Given the study’s well-motivated and the effectiveness of the proposed method, the AC votes to accept this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper proposes a novel self-supervised method for analyzing multimodal longitudinal medical images. Modality-specific self-organizing maps are designed to capture representations for each modality. The reviewers recognize the technical contributionand the effectiveness of the proposed model. While Reviewer #1 still expresses some concerns, the majority of the issues have been addressed in the rebuttal. Given the study’s well-motivated and the effectiveness of the proposed method, the AC votes to accept this paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Although the reviews are not all positive, I think this paper is a nice contribution to MICCAI. Given its rank in my stack, I recommend acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Although the reviews are not all positive, I think this paper is a nice contribution to MICCAI. Given its rank in my stack, I recommend acceptance.



back to top