Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Medical records often consist of different modalities, such as images, text, and tabular information. Integrating all modalities offers a holistic view of a patient’s condition, while analyzing them longitudinally provides a better understanding of disease progression. However, real-world longitudinal medical records present challenges: 1) patients may lack some or all of the data for a specific timepoint, and 2) certain modalities or views might be absent for all patients during a particular period. In this work, we introduce a unified model for longitudinal multi-modal multi-view prediction with missingness. Our method allows as many timepoints as desired for input, and aims to leverage all available data, regardless of their availability. We conduct extensive experiments on the knee osteoarthritis dataset from the Osteoarthritis Initiative (OAI) for pain and Kellgren-Lawrence grade (KLG) prediction at a future timepoint. We demonstrate the effectiveness of our method by comparing results from our unified model to specific models that use the same modality and view combinations during training and evaluation. We also show the benefit of having extended temporal data and provide post-hoc analysis for a deeper understanding of each modality/view’s importance for different tasks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2390_paper.pdf

SharedIt Link: https://rdcu.be/dY6f8

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_39

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2390_supp.pdf

Link to the Code Repository

https://github.com/uncbiag/UniLMMV

Link to the Dataset(s)

https://nda.nih.gov/oai

BibTex

@InProceedings{Che_AUnified_MICCAI2024,
        author = { Chen, Boqi and Oliva, Junier and Niethammer, Marc},
        title = { { A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {410 -- 420}
}

Reviews

Review #1

Please describe the contribution of the paper

In this work, a multi-modal (image and text) predictive model that utilizes temporal data. The model can handle missing time-points and modalities. The missing modality is handled through a learnable token within the attention block. A decoder block is used for processing the summary of each time step to generate predictions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- It addresses common problem in medical dataset: missing time points or modalities
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The pipeline consists of multiple encoder backbones, and a time decoder can accomodate very long time differences. There should be a cost/parameter analysis.
- In general, there is a lack of ablation studies in this work. The paper introduces multiple components, but there is no investigation of each components effect.
- Transformer based approaches such as MultiMAE: Multi-modal Multi-task Masked Autoencoders - ECCV’22 can handle missing modalities without relying on multiple encoders. The encoder is shared unlike the proposed method
- It is claimed that [PAD] is better than using fixed values, although [PAD] is learnable, it is independant from the input and fixed. There should be an ablation study showing the gain from using it
- There should be comparisons between different aggragation options, such as using an MLP, concatanation, average pooling and so on.
- In mask indicator, what is the dimension n?
- In Figure 2, the results are very close, making bar plot a very poor choice. It is almost impossible to compare different models.
- MICCAI focuses on medical imaging, thus it cannot be assumed that SAINT model used for tabular data is commonly known, unlike Resnet. There needs to be at least a short description.
- There is no comparison against other existing models.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- Did authors consider Graph Neural Networks to handle missing data? They can inherently handle missing modalities (arxiv 1905.03053)
- As a main suggestion, each individual components contribution need to be investigated. Such that readers can understand the methodological novelty
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Reject — should be rejected, independent of rebuttal (2)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- Ablation studies and lack of comparisons against existing methods are the main concern
- Using multiple encoders raises the question of computational cost.
- Bar-plots are not good choice when reporting single values to compare models.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Reject — should be rejected, independent of rebuttal (2)
[Post rebuttal] Please justify your decision
I thank the authors for their effort in the rebuttal. I will discuss the answers from the authors, and explain the score I gave post-rebuttal
- Multi-MAE uses a single ViT based encoder. Even though the authors claimed it is multi encoder. I suggest them to read the paper carefully. The authors claimed that they infer from multiple images which make a single encoder inappropriate, I strongly disagree with that, MultiMae showed that with an appropriate reconstruction scheme you can feed multiple data types (aka modalities) to a single encoder.
- With ablation I meant the introduced modules of the models, not only data modalities. This part is crucial and it is completely missing such as ablation for [PAD]
- Regarding to Figure 2, my objection was in terms of wrong choice of visualization, the authors failed to answer it.
- Regarding to SAINT model, I will reiterate my point. It is a specific Text model, which needs to be described at least shortly in a submission to a Medical imaging conference. It is not appropriate to write “go and check the reference”, a paper needs to be self-contained.
- Additionally, the authors ignored my comments on the lack of comparisons against other models. I know that the authors cannot provide additional experiments in the rebuttal, but the issue still persists, at least it could be reasoned why there is no comparisons. In the light of the rebuttal answers, the authors failed to address my major concerns (lack of architectural ablation (such as introduction of [PAD}) and the absence of comparisons against existing models.) Thus I keep my score, and I cannot recommend the paper for an acceptance in MICCAI

Review #2

Please describe the contribution of the paper

The authors proposed a masking-based learning method for leveraging longitudinal multi-view and -modal data enhancing the flexibility of using different number of timepoints and data types. Knee pain prediction and KL grade prediction tasks were successfully demonstrated the applicability of the approach.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The model is capable of accepting complex missing data structures. This enables to use all available data for training. Also inference can be done for subjects with various type of missing data
- The use of longitudinal data improved the mean average precision score compared to using the last available data timepoints. This highlights the importance of using all available data for prognosis prediction.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- the implementation neglects the data when the prediction labels are missing for any timepoints. The prediction labels for datasets like OAI are mostly available, however the prediction labels for every patient visit may not be available in general practice. This could hinder the applicability of the approach in a wide range of datasets.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
The authors successfully demonstrated the proposed approach using a longitudinal dataset. I have the following comments:
- Regarding KL grade prediction task, the reason for combining KL grades 0 and 1 but not 2-4 is not clear.
- The reason of using the cartilage maps (extracted from DESS using a specific tool) instead of different contrasts of MR images is not clear. Please elucidate
- Current experiments are designed to predict the 96 month outcomes using data from 72 months or earlier measurements. It would be informative for disease prognosis, how would the performance of the models change when the time difference between the outcome definition and the last scan increases-especially for the pain prediction task.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I recommend acceptance for the paper. The paper introduces a novel concept and demonstrate its applicability on a public dataset using carefully designed experiments.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Accept — should be accepted, independent of rebuttal (5)
[Post rebuttal] Please justify your decision

The authors successfully answered my comments/suggestions.

Review #3

Please describe the contribution of the paper

The paper proposes an attention-based model for multi-modal multi-view predictions using longitudinal knee data while considering the missingness at certain timepoints.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

– The paper addresses the missing data issue in longitudinal data with a clear experimental design. – The proposed methodology is sound and sensible. It is an interesting approach developed by effectively using tranformers and feature embeddings. The paper includes considerable amount of implementation details about preprocessing and network design.. – I think the paper provides good evaluation with 2 classification tasks on an open-source data. It shows the performance for different input combinations and the contributions of each modality/view. – The paper is well-written and easy to understand.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

– I think the main weakness of the paper is that the paper does not provide any evaluation results on the performance of a model which is trained with input data imputed with a common imputation technique. This would give us a further comparison in the validation step. – Can the authors clarify how they filter the tabular data exactly? The paper says “… and keep only those that can be easily captured, …” but what does that mean? – Some results (ROC metrics for pain and KLG predictions or AP results when inputing either P or K, etc..) are quite mixed. Although the authors provide explanations for some of them, I would highly recommend adding a separate section talking about the general limitations of their approach and comments on the mixed results.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

I look forward to seeing their responses to the points in the weakness section during the rebuttal.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation is based on the fact that the paper is well-written with a sensible metholodogy and provides extensive evalulation and comparisons with an open-source data.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

We would like to thank the reviewers for their time and insightful comments!

Reviewer 3: Label Missingness: Thank you for the suggestion! While our focus was on missing data, we recognized the challenge of label missingness and will explore it in future work.

KLG: We combined KLG 0&1 due to their difficulty in distinguishing, while the other categories have clearer distinctions. This is a common setup in previous works.

Different contrast of MR images: Utilizing different contrasts of MR images could be an alternative to cartilage maps, requiring a shift from a 2D to a 3D feature extractor. This would increase computation but is feasible.

Predict with longer period: As the prediction period extends, the task becomes more difficult, potentially reducing performance. Additionally, losing training data could make the model prone to overfitting.

Reviewer 4: Thank you for your suggestions and questions. We believe this work represents an innovative ML solution that will be of interest to the MICCAI community, and thus we hope that you can recommend it for publication.

Multiencoder/Design/Comparisons: There may be a misunderstanding between our setting and MultiMAE. MultiMAE “optionally accept additional modalities of information in the input besides the RGB.” In contrast, we infer across multiple images and aggregate various data types, making a single encoder inappropriate. Additionally, MultiMAE uses multiple encoders (linear proj. as illustrated in its Fig. 2) for different modalities. Thanks for the suggestion on alternative model designs. RE aggregation: it has been shown that attention-based aggregation outperforms mean/sum-based aggregation (e.g., Set Transformer, ICML19) and is more general as they can learn mean/sum through constant keys/queries. RE graph NNs: GNN-based methods are appropriate when edges exist between nodes. In our case, there are no priori existing edges between modalities, leaving us with either fully connected nodes or learned connections, which is what our attention-based approach achieves.

Open source: As mentioned in the abstract, we will open-source our code upon acceptance.

Cost concern: We only consider tabular data and 2D images with either 5 or 6 timepoints based on the task. There are ~113M trainable parameters.

Ablation: In our model, we compared results using different numbers of views (Attention Block) in Appendix C and different timepoints (Decoder Block) in Table 1.

Mask dimension: The “n” in the mask indicator represents the number of views.

Figure 2: Figure 2 aims to show that our unified model can perform view drop during evaluation, and the results match with those from specific models, demonstrating the efficiency of our unified model. Detailed results can be found in Appendix C.

SAINT model: The SAINT model uses transformers to extract features from tabular data, considering both column-wise and row-wise attention. Details can be found in our reference.

Reviewer 6: Data imputation: For data imputation, we can compare either zero/mean (likely ineffective) or impute from other views. However, imputing from other views is difficult for our multi-body-part setup, and hence, we chose to ignore the missing views.

Tabular filter: We remove tabular data requiring extra measurements, e.g., 400m walk time, flexion/extension speed. Except for easily obtainable blood pressure, all attributes can be collected through questioning.

Additional result explanation: In Appendix C, ROC is more mixed than AP and Marco ACC, mainly due to the imbalance of our dataset. Additionally, the pelvis contributes little as it offers minimal information for knee conditions. Similarly, tabular is less effective for KLG prediction since KLG is based only on knee radiography. This highlights a limitation: non-informative views provide limited contribution while increasing the computation cost. In the future, we aim to develop methods for automatically predicting useful views.

Meta-Review

Meta-review #1

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

N/A

back to top

A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness

Author(s):