Abstract

Imaging modalities such as Computed Tomography (CT) and Positron Emission Tomography (PET) are key in cancer detection, inspiring Deep Neural Networks (DNN) models that merge these scans for tumor segmentation. When both CT and PET scans are available, it is common to combine them as two channels of the input to the segmentation model. However, this method requires both scan types during training and inference, posing a challenge due to the limited availability of PET scans, thereby sometimes limiting the process to CT scans only. Hence, there is a need to develop a flexible DNN architecture that can be trained/updated using only CT scans but can effectively utilize PET scans when they become available. In this work, we propose a parameter-efficient multi-modal adaptation (PEMMA) framework for lightweight upgrading of a transformer-based segmentation model trained only on CT scans to also incorporate PET scans. The benefits of the proposed approach are two-fold. Firstly, we leverage the inherent modularity of the transformer architecture and perform low-rank adaptation (LoRA) of the attention weights to achieve parameter-efficient adaptation. Secondly, since the PEMMA framework attempts to minimize cross-modal entanglement, it is possible to subsequently update the combined model using only one modality, without causing catastrophic forgetting of the other modality. Our proposed method achieves comparable results with the performance of early fusion techniques with just 8% of the trainable parameters, especially with a remarkable +28% improvement on the average dice score on PET scans when trained on a single modality.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3528_paper.pdf

SharedIt Link: https://rdcu.be/dY6fT

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_25

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3528_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Saa_PEMMA_MICCAI2024,
        author = { Saadi, Nada and Saeed, Numan and Yaqub, Mohammad and Nandakumar, Karthik},
        title = { { PEMMA: Parameter-Efficient Multi-Modal Adaptation for Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {262 -- 271}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces new mechanisms to efficiently fine-tune a model on an image modality which is different from the modality on which it has been originally trained. The proposed approach avoids the forgetting of the original modality after fine tuning and does not introduce many additional parameters.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors have made a significant effort to clarity the content of their contribution despite the complexity of the proposed architecture.
    • The introduced approach is based on the addition of a small number of parameters compare to alternatives (early and late branching).
    • The shown example in PET-CT is indeed showing the advantage of their approach compared to the two other ones.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper tackles a fairly narrow task : fine-tuning a model on a different modality than the one it has been trained for.
    • Besides it relies on 1 single example (PET-CT) which is not widely encountered. Therefore, at this stage, the paper is not likely to interest many readers. In addition, the authors do not specify if the developed code will be shared and therefore the impact of the paper is likely to be (very) limited.
    • The readability of the paper can be improved . Understanding the changes in the network architecture is not easy even based on the figure 1.
    • The notion of efficiency which is used as an argument for the proposed approach is not defined and is not quantified. Does it mean that the network requires less time for training ? What are the tangible benefits for the user (or society) to use those efficient parameterizations ?
    • The authors do not present the existing state of the art around the topic of cross-modality fine tuning of models. The authors rely on visual prompting to achieve this task but there are many other alternatives. For instance, ORCA (Cross-Modal Fine-Tuning: Align then Refine, Shen et al., 2023) is one such method and no mention or comparison is provided.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Nothing specifically mentioned about the reproducibility of the paper. I am not sure that the paper provides enough details to allow for its reproducibility

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The authors should discuss why having efficiently parameterized model is important for the user or for society. Does it lead to less training time, increased robustness ? Provide quantitative ways to assess these benefits.
    • Since the authors freeze the previous parameters in PEMMA, shouldn’t it be the same for the early and late fusion strategies ? I am questionning whether there is a fair comparison of performance and number of parameters in Table 1.
    • Table 1 should be improved by telling the reader where to focus in this table. For me it should be in the columns “p”of the new datasets.
    • In fig 1, the notations introduced in the text should be added like the skip connections \theta_{SK}^C . Also specify are the existing components and what are the additional ones in the proposed architecture.

    • There are few typos that should be corrected.
    • p5 Discover ST -> Discovery ST
    • p 7 Table 1 1-> Table 1
    • P7 dice score -> Dice score
    • p7 more comprehensively For further-> more comprehensively. For further
    • p 8 Table 1 1-> Table 1
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a potentially interesting idea for cross-modality fine tuning but i) it is not sufficiently justified by enough experiments and ii) it does not discuss or compare itself with other existing approaches

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces a parameter multimodal adaptation framework designed for CT and PET modalities to facilitate medical segmentation. Employing a low rank adaptation (LoRA) technique, this framework enables the updating of the combined model utilizing only one modality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A novel framework adaptable to both CT and PET modalities, PET alone, and CT for tumor segmentation, without augmenting the model’s parameter count. This versatility is achieved without significant retraining or parameter expansion.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The study compares its approach against a UNETR model with a specified parameter count for the task. However, the final parameters of the their model are not provided; only a reduction of trainable parameters by 92% is reported without specifying the resultant count. Table 2 lacks clarity in how results are presented, particularly regarding the distinction between Late Fusion and Early Fusion, whether it pertains to their model or the UNETR. Additionally, details regarding the new datasets used for validation, such as the number of data points and specific characteristics, are not clearly elucidated.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper effectively elucidates its methodology and references the public database utilized. However, it lacks a link to the code, and the utilization of other datasets, along with their characteristics, poses challenges in assessing their validity.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A well written and understandable methodology.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors have proposed a parameter-efficient multi-modal adaption, called “PEMMA”, to train models that are pre-trained only on CT to incorporate PET scans. This is a very interesting topic as it helps to leverage pre-trained models by different modality to be used/upgraded by incorporating other modalities, leading to better prediction power.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Topic of multimodal adaptation can help to leverage the previously trained models on different image modality. The authors have clearly described the objective, methodology, their experiment, and paper is well written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The reviewer doesn’t a weakness point.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Paper is well written and interesting. My suggestion is that paper could benefit from some visualization of results/outcome of experiment.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Topic is very interesting and the study is well described, designed, and conducted.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We would like to thank the reviewers for their valuable feedback. We are committed to ensuring the full reproducibility of our work - the code repository will be made available and the dataset used in our study is publicly accessible at https://hecktor.grand-challenge.org/Data/. All typographical errors identified will be corrected in the final camera-ready version.
[R1.1] Our work addresses a specific challenge in cancer diagnosis and prognosis - integrating two primary imaging modalities namely, CT and PET scans. The need for publicly available registered multimodal data makes the HECKTOR dataset particularly suitable for our study.
[R1.2] According to [1], PET/CT fusion is a rapidly growing technique in medical imaging, underscoring the relevance of our focus on these modalities. The potential of our work could even be extended to the integration of a different type of medical imaging modalities, such as PET scans captured with different tracers https://autopet-iii.grand-challenge.org.
[R1.3] We will revise the PEMMA framework description to clarify the changes in the network architecture. [R1.4] As indicated in the title, our goal is to improve parameter efficiency, i.e., reduce the number of parameters to be modified. The main real-world benefit of PEFT is the reduced cost of storing model parameters. Since only a small fraction of parameters in a large model are modified, it is possible to store all the fine tuned models, thereby providing more flexibility at inference time to pick the most appropriate model.
[R1.5] Most existing cross-modal fine-tuning methods including ORCA are de- signed for a single static fine-tuning task. In our work, we consider a scenario where additional batches of data from diverse medical centers need to be incorporated dynamically. Since this new data could be from one or more modalities, we need to avoid cross-entanglement so that both modalities can be updated independently.
[R1.6] Since early fusion requires both modalities at the input level, all the model parameters need to be modified to handle this change. For late fusion, a completely separate model is required to handle the new modality and the existing model (for first modality) also needs to be finetuned based on new data. Thus, freezing the model parameters will lead to significant performance degradation in both early and late fusion. Therefore, the comparison between the three fusion approaches is fair.
[R1.7] It is difficult to find many large multi-modal and multi-center datasets, but in future, we plan to test on other modalities and datasets too.
We thank the [R3] concern for pointing out where exactly to focus in Table 2, where we consolidate all findings to demonstrate continuous performance across different testing scenarios—initially training and testing on both modalities, then on each independently. We highlight these key outcomes: There is a significant reduction in trainable parameters, with PEMMA showing a 92% reduction compared to the 92.5 million parameters of the standard U-NETR architecture. (highlighted in red in Table 2). The efficacy of LoRA in preserving information across modalities is evidenced by comparative results between early fusion and PEMMA approaches on new datasets from HGJ and HMR centers. (Here, we compare the results of the last rows of early and PEMMA fusion in column P. New dataset 1 and 2 correspond to HGJ and HMR centers, respectively.) [1] Fahim-Ul-Hassan, Cook GJ. PET/CT in oncology. Clin Med (Lond). 2012 Aug;12(4):368-72. doi: 10.7861/clinmedicine.12-4-368. PMID: 22930885; PMCID: PMC4952129.




Meta-Review

Meta-review not available, early accepted paper.



back to top