Abstract

Deep neural networks (DNNs) have demonstrated superior performance compared to humans across various tasks. However, DNNs often face the challenge of domain shift, where their performance notably deteriorates when applied to medical images with distributions differing from those seen during training. To address this issue and achieve high performance in new target domains under zero-shot settings, we leverage the ability of self-attention mechanisms to capture global dependencies. We introduce a novel MLP-like model designed for superior efficiency and zero-shot robustness. Specifically, we propose an adaptive fully-connected (AdaFC) layer to overcome the fundamental limitation of traditional fully-connected layers in adapting to inputs of various sizes while maintaining GPU efficiency. Building upon AdaFC, we present a new MLP-based network architecture named MedMLP. Through our proposed training pipeline, we achieve a significant 20.1% increase in model testing accuracy on an out-of-distribution dataset, surpassing the widely used ResNet-50 model.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1626_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1626_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zho_MedMLP_MICCAI2024,
        author = { Zhou, Menghan and Xu, Yanyu and Soh, Zhi Da and Fu, Huazhu and Goh, Rick Siow Mong and Cheng, Ching-Yu and Liu, Yong and Zhen, Liangli},
        title = { { MedMLP: An Efficient MLP-like Network for Zero-shot Retinal Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a new adaptive fully connected layer, called AdaFC, that can adjust to various input resolutions. This overcomes the limitations of traditional, fully connected layers. Additionally, the paper proposes a MedMLP model-based approach that enhances performance on out-of-distribution datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The AdaFC layer can be applied to dynamic resolutions and the MedMLP-B0 model performs well compared to MobileNetV2 when tested on lower resolution images.
    2. This model has fewer parameters than other MLP-based models, making it suitable for use on mobile devices.
    3. Another strength of the proposed layer is scalability. The layer is designed to easily scale the model, and the authors provided four variants in this study.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) It is unclear why only MedMLP-B2 and MedMLP-B3 are presented in the TABLE 4. Adding B0 and B1 variants would give better understanding of low cost models performance. 2) The testing on the retinal images is limited to SCES and SINDI datasets. The authors also included ICC, ISF datasets in section 3.3 ,but the performance on these datasets are not included in the Table 4. 3) The proposed models are tested on PACS disease using retinal images for domain generalization study, this weakens the study. To better understand the domain generalization of AdaFC based model, need to evaluate on multiple retinal based diseases. 4) Supplementary material discussed in the paper is not available. 5) Qualitative analysis of the model is missing.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The information about the availability of data and code is not available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please review the section on weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed layer in the research paper is unique as it allows the model to process dynamic input resolutions, which is the main strength of the study. Additionally, the model was tested on varying images and performed better than MobileNetV2. However, the authors claimed that the models developed by AdaFC are capable of domain generalization, but the study provided very limited evidence related to this claim, which raises some concerns.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I am convinced by the author’s reply, so I am upgrading my assessment.



Review #2

  • Please describe the contribution of the paper

    This paper introduces an adaptive fully-connected (AdaFC) layer designed to address the inherent constraints of traditional fully-connected layers by accommodating inputs of varying sizes while preserving GPU efficiency.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Novel Formulation: The authors introduce a novel adaptive fully-connected (AdaFC) layer, which is a significant contribution. This layer addresses the limitation of traditional fully-connected layers which struggle with adapting to inputs of varying sizes. The AdaFC layer’s ability to dynamically adjust while maintaining GPU efficiency is innovative because it enhances the flexibility and scalability of MLP-like models without reducing computational performance.

    (2) Original Usage of MLP-like Models: The paper is the first to propose the using of MLP-like models in mobile settings.

    (3) Demonstration of Clinical Feasibility: The proposed MedMLP model outperforms existing MLP-like and CNN-based models, demonstrates its strong robustness, which is crucial for mobile health applications. This robustness makes MedMLP particularly suitable for real-world clinical applications.

    (4) Novel Application: The application of MLP-like models to mobile platforms is novel and reveals its potential in clinical applications.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Innovation in Adaptive Layer: Although the AdaFC layer is described as novel, the concept of adaptive layers that can handle various input sizes isn’t entirely new. For example, adaptive pooling layers in neural networks have addressed similar issues, though in different contexts. The paper could benefit from doing an extensive literature review of how AdaFC’s approach is different from these previous works and why it might be novel.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) The reviewer recommends the authors to introduce the full name of MLP at least once at the abstract: multilayer perceptron (MLP) . (2) The reviewer recommends the authors to increase the font size and resolution of Fig. 1, 3, and Table 1.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors proposed a novel improvement to the MLP, however, based on the evaluation results, the improvement is not significant, compare to previous work.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors revised the paper to better explain the novelty of this paper by describing the difference between MedMLP and Other Adaptative Operators. The final decision depends on how other reviewers score the rebuttal.



Review #3

  • Please describe the contribution of the paper

    The author propose an efficient module AdaFC which serves as a component in the MLP-like model architecture MedMLP. MedMLP shows better efficiency, performance, or robustness to resolution or domain shift.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed AdaFC that works with various resolutions.
    2. Extending MLP-like structure to mobile and medical setting which can help low-resource deployment.
    3. Achieve similar performance to other structures while having additional advantages like robustness and efficiency.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. the evaluations are selective. For example, MobileNet is excluded from Tab 4 without justification, ResNet50 does not use pre-training. ViP-small/7 (25M) or other original implementations are excluded.

    2. The name “zero-shot” is misleading. Zero-shot usually refer to generalization to new class that is unknown during training. The authors may refer to “domain generalization”.

    3. SINDI has only 250 eyes which may be too small, the variation may be large which makes the result unreliable. Additionally, how the subset is selected is unknown.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In addition to the weakness, it would be helpful to explain the rationale behind the difference between the mixing strategy of ViP and AdaFC as they are similar but not enough comparisons are given. Authors mentioned analysis in the supplementary but the the author guideline does not allow analysis there. It is recommended that the authors move some important informant to the main text.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the experiments do not fully distinguish the proposed method from the previous methods in terms of performance, it is obvious the proposed method introduces a novel perspective such as variation in resolution. Thus, this paper can be accepted if the experiments or other design choices are properly explained.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Although not all the concerns are addressed, the authors have shown the advantages of the proposed method with justifications.




Author Feedback

We thank the reviewers for the high-quality reviews and will carefully revise our submission accordingly, correct typos, enhance our qualitative analysis and discussions, upload the supplementary material, and release the codes and models in the revision.

Q1. Difference between MedMLP and Other Adaptative Operators(R1) A1. We thank the reviewer for the insightful suggestion. AdaFC operates on the weight tensor, while previous adaptive operations focus on the feature maps. The primary difference lies in the efficiency of GPU utilization. Previous operators, such as pooling and convolution, employ a sliding window operation, which can reduce GPU utilization efficiency due to a mismatch with the GEMM CUDA core implementation. In contrast, the MLP layer requires only matrix multiplication, aligning better with the GEMM CUDA implementation. However, due to the simplicity of the MLP operation, it struggles with feature maps of variable spatial resolution, as the weight tensor’s dimensions are fixed and sensitive to transformations. The main novelty of AdaMLP is its ability to operate on the weight tensor to adapt to variable feature map dimensions. We will include this analysis in the revision, along with the corresponding references.

Q2. Experimental results on Tab 4 (R3 & R4) A2. We thank the reviewers for their comments. MobileNet was excluded from Table 4 due to its significantly smaller model size. As the performance of MedMLP-B2 is already comparable to ResNet, we initially chose not to include smaller models. However, we agree that adding the performance of smaller models would be informative. We will include results for models with mobile settings in the revision.

Q3. Zero-shot and domain generalization (R3 & R4) A3. (1) The term “zero-shot” in our paper refers to MedMLP’s ability to adapt to images with different resolutions without requiring fine-tuning. This characteristic sets it apart from other MLP-based networks. We will clarify this in the revision. (2) Due to time constraints, we are currently testing on PACD. We plan to explore other diseases in the future to better understand the domain generalization of the AdaFC-based model.

Q4. Experimental datasets (R3 & R4) A4. (1) We selected SCES and SINDI due to their larger domain gap, which allows us to better test our model’s robustness. (2) By combining SCES, SINDI, ICC, and ISF, we achieved a significant 20.1% increase in model testing accuracy on an out-of-distribution dataset. For the purpose of our analysis, we chose two datasets for illustration. In the revision, we will clarify the dataset selection process and include results for all datasets in the supplementary material.

Q5. Differences between MedMLP and ViP (R3) A5. The Vision Permutator (ViP) does not support variable input dimensions, limiting its use to images with the same spatial dimensions as those used for model training. In contrast, MedMLP, through its implementation of the AdaFC layer, can handle images with variable dimensions, providing greater flexibility and adaptability.

Q6: Justification of the contributions. A6: The main contributions of this paper are two-fold: (1) We propose a novel operator termed AdaFC, which can process images with variable spatial dimensions. This capability is not possible with previous MLP-based models. (2) We introduce a new family of pre-trained foundation models based on the AdaFC operator, specifically designed for medical image datasets. These models demonstrate better generalization capabilities and faster processing speeds. Both claims are supported by rigorous experimental results. As shown in Figure 1, the MedMLP model family exhibits a significantly better performance curve, balancing model accuracy and running speed more effectively than previous models.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The author propose an efficient module AdaFC which serves as a component in the MLP-like model architecture MedMLP. The reviewers are generally in favor of the paper, especially with major concerns addressed by the rebuttal. The authors shall carefully polish the paper to address the remaining concerns in their final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The author propose an efficient module AdaFC which serves as a component in the MLP-like model architecture MedMLP. The reviewers are generally in favor of the paper, especially with major concerns addressed by the rebuttal. The authors shall carefully polish the paper to address the remaining concerns in their final version.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper introduces an adaptive fully-connected (AdaFC) layer that addresses the limitations of traditional fully-connected layers by accommodating inputs of varying sizes while maintaining GPU efficiency. This innovation is applied to the MedMLP model for mobile health applications. Strengths of this paper include the novel formulation of the AdaFC layer, the use of MLP-like models in mobile and medical settings for the first time, and the robust performance of MedMLP in clinical applications. However, weaknesses of this paper include selective evaluations, misleading terminology regarding “zero-shot” generalization, limited testing on retinal image datasets, and missing qualitative analysis. Overall, given the strengths and weaknesses, I suggest accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper introduces an adaptive fully-connected (AdaFC) layer that addresses the limitations of traditional fully-connected layers by accommodating inputs of varying sizes while maintaining GPU efficiency. This innovation is applied to the MedMLP model for mobile health applications. Strengths of this paper include the novel formulation of the AdaFC layer, the use of MLP-like models in mobile and medical settings for the first time, and the robust performance of MedMLP in clinical applications. However, weaknesses of this paper include selective evaluations, misleading terminology regarding “zero-shot” generalization, limited testing on retinal image datasets, and missing qualitative analysis. Overall, given the strengths and weaknesses, I suggest accepting this paper.



back to top