Abstract

The design of activation functions constitutes a cornerstone for deep learning (DL) applications, exerting a profound influence on the performance and capabilities of neural networks. This influence stems from their ability to introduce non-linearity into the network architecture. By doing so, activation functions empower the network to learn and model intricate data patterns and relationships, surpassing the limitations of linear models. In this study, we propose a new activation function, called {Adaptive Smooth Activation Unit \textit{(\textbf{ASAU})}}, tailored for optimized gradient propagation, thereby enhancing the proficiency of deep networks in medical image analysis. We apply this new activation function to two important and commonly used general tasks in medical image analysis: automatic disease diagnosis and organ segmentation in CT and MRI scans. Our rigorous evaluation on the \textit{RadImageNet} abdominal/pelvis (CT and MRI) demonstrates that our ASAU-integrated classification frameworks achieve a substantial improvement of 4.80\% over ReLU based frameworks in classification accuracy for disease detection. Also, the proposed framework on Liver Tumor Segmentation (LiTS) 2017 Benchmarks obtains 1\%-to-3\% improvement in dice coefficient compared to widely used activations for segmentation tasks. The superior performance and adaptability of ASAU highlight its potential for integration into a wide range of image classification and segmentation tasks. The code is available at \href{https://github.com/koushik313/ASAU}{https://github.com/koushik313/ASAU}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3846_paper.pdf

SharedIt Link: https://rdcu.be/dV5Kb

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72114-4_7

Supplementary Material: N/A

Link to the Code Repository

https://github.com/koushik313/ASAU

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Bis_Adaptive_MICCAI2024,
        author = { Biswas, Koushik and Jha, Debesh and Tomar, Nikhil Kumar and Karri, Meghana and Reza, Amit and Durak, Gorkem and Medetalibeyoglu, Alpay and Antalek, Matthew and Velichko, Yury and Ladner, Daniela and Borhani, Amir and Bagci, Ulas},
        title = { { Adaptive Smooth Activation Function for Improved Organ Segmentation and Disease Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {65 -- 74}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors proposed a novel smooth activation function for abdominal organ segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed smooth actvation function can approximate to the general activation fucntions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The activation function proposed in this paper was similar to the work [1][2], and it was adapted for use in the abdominal image task. No specific contributions to the issues of organ segmentation and disease diagnosis.

    [1] Biswas, K., Kumar, S., Banerjee, S., & Kumar Pandey, A. (2022, October). SAU: Smooth activation function using convolution with approximate identities. In European Conference on Computer Vision (pp. 313-329). Cham: Springer Nature Switzerland. [2] Biswas, K., Kumar, S., Banerjee, S., & Pandey, A. K. (2022). Smooth maximum unit: Smooth activation function for deep networks using the smoothing maximum technique. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 794-803).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The proposed activation funtion had a clear expression.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    My comments are organized as Comment** as follows. Comment 1: The authors used neural network embedding with an adaptive smooth activation function to perform the abdominal organ segmentation and disease diagnosis. The smooth activation function came from a published article Mish [3]. The authors claimed that the proposed activation function could embody a methodological shift towards smoother, more continuous transitions, offering refined gradients that promote the intricate learning necessary for high-fidelity classification and segmentation tasks.

    However, as a research article, the novelties of the activation function were not fully and clearly presented and validated. In the current contents, the authors adapted a published work, i.e., Mish, to formulate the proposed activate function. This proposed smooth activation function can indeed approximate the traditional ReLU function, but its obvious advantages on the challenges of multiclass disease classification and liver segmentation tasks are not clearly presented and explained. Moreover, as a basic module for neural networks, the activation function and its newest variants should be comprehensively reviewed.

    Comment 2: In the introduction part, the authors clearly present the background from computer -aided diagnosis, organ segmentation, etc. In the penultimate paragraph, the authors titled “Domain specific activation functions are needed”, which I really appreciate. However, as for abdominal diseases detection and organ segmentation tasks, no domain specific information was considered in this work. In this section, the authors claimed that “These functions are susceptible to information loss in regions with negative inputs and often struggle to capture the subtle nuances crucial for delineating intricate anatomical structures.” This claim was very important, however, no references and further explanations were provided here. The study of smooth activation function is a classic topic, but the references were not enough in this part. For example, the work named “SAU: Smooth Activation Function Using Convolution with Approximate Identities” and the work named “Smooth Maximum Unit: Smooth Activation Function for Deep Networks using Smoothing Maximum Technique”. Comment 3: Compared with other smooth functions, the authors described the activation function with the activation function in Mish. The reasons why the authors employed this activation function here were not provided. It seems like just replacing the Gaussian error function in [2] with another, which was technical. As for medical image analysis tasks, the authors should pay attention to the specific issues in the challenges and the proposed method can alleviate the issues effectively.

    Comment 4: From Equations (5) to Equations (6) and (7), the values of \alpha and \beta seemed to be 1. What was the definition of C(K) in Proposition 1? This proposition was almost the same as the proposition in [1].

    Comment 5: In the experiments, the authors compared the performances of the proposed method with other activation functions, such as original ReLU, Leaky ReLU and PeLU. But as for a novel smooth activation function, no other smooth activation functions were compared here, which weakens the effectiveness of the experiments.

    [1] Biswas, K., Kumar, S., Banerjee, S., & Kumar Pandey, A. (2022, October). SAU: Smooth activation function using convolution with approximate identities. In European Conference on Computer Vision (pp. 313-329). Cham: Springer Nature Switzerland. [2] Biswas, K., Kumar, S., Banerjee, S., & Pandey, A. K. (2022). Smooth maximum unit: Smooth activation function for deep networks using the smoothing maximum technique. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 794-803). [3] Misra, D.: Mish: A self regularized non-monotonic activation function (2020)

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1) The novelty was not enough. 2) The similar work was not cited. 3) No specific contributions to solve the issues of organ segmentation and disease diagnosis.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    1, The article picks traditional activation functions for comparison, while the smooth activation function, which is its own core innovation, is neither compared nor cited by the authors. 2, The authors did not provide explanations about my main concerns.



Review #2

  • Please describe the contribution of the paper

    The authors present a new activation function design, called Adaptive Smooth Activation Unit (ASAU), with the aim to improve the generalization capabilities of NN models specifically in the radiological domain (classification and segmentation cases are discussed).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strong point of the paper is a consistent selection of models taken in consideration for measuring the effects of the ASAU activation function in 2 different tasks. Residual and Transformers based solutions are selected for the segmentation scenario. That shows a dominant gain over the other activation function choices.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness consists in:

    • Mising comparison with other similar SOTA activation functions like GELU (https://arxiv.org/abs/1606.08415) or SiLU, LEAF (10.47839/ijc.22.3.3225) etc…

    • in the sub paragraph “Domain specific activation functions are needed:”:

    “[…] existing activation functions are chosen mostly due to efficiency reasons.” This claim misses a reference. “[…] These functions are susceptible to information lossin regions with negative inputs and often struggle to capture the subtle nuances crucial for delineating intricate anatomical structures.”. This claim needs a reference or a better explanation if it’s a hypothesis from the authors.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Dataset are Public and available. Missing activation function substitution policy in the model architectures: each occurrence of ANY activation function is substituted? Also in Residual and Transformer blocks?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Tables should not be splitted in order to show all the methods and reporting the same caption. A better visualizazion can be found. The testing metodology is good but can be improved by a kfold strategy in order to show statistically significative gains.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The provided results are overall interesting with an evident gain on more popular Activation Function choices, however:

    • Missing comparison with key SOTA activaction function with similar behavuoir, like GELU or SiLU, LEAF.
    • Missing discussion/assumptions on ASAU performances on a different not-radiological use case.
    • Missing conclusion/interpretation of why ASAU derivative should provide benefits to the training
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I will maintain my choice, as the answers agree to the proposed paper changes/fix.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a domain-specific activation function called the Adaptive Smooth Activation Unit (ASAU) to avoid loss of information in negative inputs, as in the case of ReLU/leaky ReLU/PReLU. The results show a ~5% improvement over ReLU in classification and 1-3% improvement in segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Thorough experimentation on classification and segmentation, with CNN/Transformers on CT/MRI images from RadImageNet and LiTS open source datasets, using ResNet18/50 MobileNet V2, ShuffleNet for classification and UNet, DoubleUNet, ColonSegNet, TransNetR, TransResUNet, ResUNet++, NanoNet-A, UNext for segmentation.
    • The authors also compare the performance of all said models trained on their proposed method ASAU to that of ReLU, leakyReLU, and parametric ReLU.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • In the introduction, the authors mention that domain-specific activation functions are needed where, for example, “… information loss in regions with negative inputs and often struggle to capture the subtle nuances crucial for delineating intricate anatomical structures….lead to segmentation inaccuracies…” how does the selection of the liver segmentation task prove this point? Wouldn’t a more nuanced and complex task such as lesion segmentation be more suitable? Would the proposed activation do better at boundary segmentation / fine structures or poor boundary definition, such as lesion segmentation?
    • What type of normalization was applied to the data? Do the authors expect that the type of data normalization impacts the difference in performance between models trained with different activation functions?
    • Was there any statistical testing to verify whether the performance difference is significant?
    • How does this ASAU activation compare to the exponential linear unit (ALU)? The two functions are comparable; as ALU is smooth, it would have been imperative to compare them to the proposed activation function.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • In Figure 1, what is the c parameter for the ASAU plots? According to equation 5 for the ASAU function, there’s no c factor, only a and b. Can the authors please clarify?

    • Regarding the hyperparameters a, b, alpha, and beta, how are they set? Is there a guideline on how they should be set and their impact on model learning? I may have missed this in the text, but what hyperparameters were set in the experiments shown? Or were a/b set as trainable parameters? If the hyperparameters were hand-picked, how were they picked? 

    • Similarly, for the leaky ReLU, what was alpha set to?

    • Page 4: In the first order derivatives of ASAU w.r.t. x (eqn 6), an (eqn 7), b (eqn b), the alpha and beta factors are missing in equations. Were they omitted for a reason, or was this an erroneous oversight?   Minor comments:
    • Page 7 - Tables with inconsistent font size.
    • Page 8 - MCC not defined.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A few details regarding the methodology, especially with regard to the hyperparameter selection, need to be clarified. Moreover, the comparative activation functions are non-smooth, while the proposed activation unit is, so it would have been necessary to compare them to smooth-like functions.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed many of the initial comments. However, I had additional concerns that remain unaddressed regarding data normalization and its impact on performance when using different activation functions. This issue is particularly important as the authors emphasize in both the paper and their response that the smooth activation function is crucial for avoiding information loss in negative inputs within the images (such as background noise or different tissue types). This potential information loss can be exacerbated by non-smooth activation functions like ReLU, depending on how the data was preprocessed and whether it was normalized to a range of 0-1 or -1 to 1.

    Another critical point is the discussion of domain-specific activation functions. The authors should elaborate on how the proposed activation function addresses this, especially in the context of nuanced segmentation tasks, such as distinguishing between lesions and healthy tissue.

    Considering the authors address these additional comments in the camera-ready version and future journal extension, this work should be accepted, as it provides an important investigation into the applicability of different activation functions in medical image segmentation.




Author Feedback

We sincerely thank the reviewers for their thoughtful feedback and constructive suggestions. R1: Why ASAU derivative provides benefits to the training? Because they keep the gradient stable and differentiable, i.e., essential for backpropagation algorithms that use gradient descent. It also helps prevent large oscillations during training, making the convergence towards the minimum of the loss function more effective. R1, R2: Missing comparison with other smooth functions (GELU or SiLU, LEAF). Due to space limitations, we did not put every available activation function result as a comparison, but please note that we have rich results presented already. It is also important to note that ASAU is derived from a family of maximum functions, and it has better results than other SOTA functions, including GELU, SiLU, LEAF, SMU, and Mish. This year, MICCAI does not allow the addition of new experiments in the rebuttal stage; therefore, the journal version of this article will include more applications and more comparisons. R1: Missing discussions on ASAU performance on a different not-radiological use case. We will include a few discussion points to clarify ASAU’s generalized role in analyzing both radiologic and non-radiologic examples. R2: Applications (multiclass classification and segmentation) are not clearly explained, and the literature review should be updated with comprehension. We will include a few more recent activation functions to address reviewers’ concerns in the related works. Further, we will paraphrase the application motivation sentences to make it clear that CAD systems require both classification and segmentation to be performed. We chose liver segmentation as a task due to its clinical relevance in cancer and chronic liver disease identification. Multiclass disease classification (with dozens of potential diseases) further emphasizes the improved learning facilitated by smooth activation functions, even in complex scenarios. This paves the way for developing more generic CAD systems in the near future. R2: The claim of activation function susceptible to information loss in regions with negative inputs was very important, however, no references and further explanations were provided. In medical images (such as MRI/CT scans), negative inputs can hold important information (tissue types, background noise).  Non-smooth activation functions (ReLU family) can eliminate this data, hindering tissue differentiation and noise handling. The model might not learn to differentiate those tissues or properly account for background noise, potentially leading to inaccurate analysis. While pre-processing techniques exist (scaling/mirroring), they’re suboptimal and limited. Our focus on smooth activation functions offers a more robust solution, providing valuable insights into learning at the neuron level. We’ll update the text to address this concern. R2: What was the definition of C(K) in Proposition 1? C(K) is the space of all continuous functions in the Real line. R3: In Figure 1, what is the c parameter for the ASAU plots? No c in equation 5. We apologize for this typo. “c” is “beta” in equation 5. We will correct Figure 1 with beta. R3: Regarding the hyperparameters a, b, alpha, and beta, how are they set? Similarly, for the leaky ReLU, what was alpha set to? In our experiment, we set a=0.01 and b=1.0 (trainable parameters/hyperparams). For ReLU, alpha was set to 1.0. We will better clarify these parameters in the text. R3: Page 4: In (eqn 6), an (eqn 7), the alpha and beta factors are missing in equations. Were they omitted for a reason? We presented the derivatives when alpha =1.0 and beta=1.0; that was the reason we dropped them in eqs 6 and 7. If suggested, we can include a more generalized form of the derivatives. R3: Minor: tables with inconsistent font size; MCC is not defined. We will fix the size of the table fonts; thank you for your attention. MCC stands for Matthews Correlation Coefficient. We will update it.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top