Activation function vs Squashing function












3












$begingroup$


This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.



Today in a video explaining deep neural networks I came across the term Squashing function. This is a term that I have never heard used, instead our professor always used the term Activation function. Given the definitions I've been able to find, the two seem to be interchangeable terms.



Are they really synonymous or is there a difference?










share|improve this question









$endgroup$












  • $begingroup$
    Yes...relu is an activation function but not a squashing function
    $endgroup$
    – DuttaA
    Aug 6 '18 at 12:56










  • $begingroup$
    Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
    $endgroup$
    – n1k31t4
    Aug 6 '18 at 15:18










  • $begingroup$
    @n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
    $endgroup$
    – DuttaA
    Aug 6 '18 at 15:48












  • $begingroup$
    @DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
    $endgroup$
    – n1k31t4
    Aug 6 '18 at 16:08










  • $begingroup$
    @n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
    $endgroup$
    – DuttaA
    Aug 6 '18 at 16:38
















3












$begingroup$


This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.



Today in a video explaining deep neural networks I came across the term Squashing function. This is a term that I have never heard used, instead our professor always used the term Activation function. Given the definitions I've been able to find, the two seem to be interchangeable terms.



Are they really synonymous or is there a difference?










share|improve this question









$endgroup$












  • $begingroup$
    Yes...relu is an activation function but not a squashing function
    $endgroup$
    – DuttaA
    Aug 6 '18 at 12:56










  • $begingroup$
    Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
    $endgroup$
    – n1k31t4
    Aug 6 '18 at 15:18










  • $begingroup$
    @n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
    $endgroup$
    – DuttaA
    Aug 6 '18 at 15:48












  • $begingroup$
    @DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
    $endgroup$
    – n1k31t4
    Aug 6 '18 at 16:08










  • $begingroup$
    @n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
    $endgroup$
    – DuttaA
    Aug 6 '18 at 16:38














3












3








3


1



$begingroup$


This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.



Today in a video explaining deep neural networks I came across the term Squashing function. This is a term that I have never heard used, instead our professor always used the term Activation function. Given the definitions I've been able to find, the two seem to be interchangeable terms.



Are they really synonymous or is there a difference?










share|improve this question









$endgroup$




This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.



Today in a video explaining deep neural networks I came across the term Squashing function. This is a term that I have never heard used, instead our professor always used the term Activation function. Given the definitions I've been able to find, the two seem to be interchangeable terms.



Are they really synonymous or is there a difference?







neural-network activation-function






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Aug 6 '18 at 12:48









Mate de VitaMate de Vita

184




184












  • $begingroup$
    Yes...relu is an activation function but not a squashing function
    $endgroup$
    – DuttaA
    Aug 6 '18 at 12:56










  • $begingroup$
    Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
    $endgroup$
    – n1k31t4
    Aug 6 '18 at 15:18










  • $begingroup$
    @n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
    $endgroup$
    – DuttaA
    Aug 6 '18 at 15:48












  • $begingroup$
    @DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
    $endgroup$
    – n1k31t4
    Aug 6 '18 at 16:08










  • $begingroup$
    @n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
    $endgroup$
    – DuttaA
    Aug 6 '18 at 16:38


















  • $begingroup$
    Yes...relu is an activation function but not a squashing function
    $endgroup$
    – DuttaA
    Aug 6 '18 at 12:56










  • $begingroup$
    Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
    $endgroup$
    – n1k31t4
    Aug 6 '18 at 15:18










  • $begingroup$
    @n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
    $endgroup$
    – DuttaA
    Aug 6 '18 at 15:48












  • $begingroup$
    @DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
    $endgroup$
    – n1k31t4
    Aug 6 '18 at 16:08










  • $begingroup$
    @n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
    $endgroup$
    – DuttaA
    Aug 6 '18 at 16:38
















$begingroup$
Yes...relu is an activation function but not a squashing function
$endgroup$
– DuttaA
Aug 6 '18 at 12:56




$begingroup$
Yes...relu is an activation function but not a squashing function
$endgroup$
– DuttaA
Aug 6 '18 at 12:56












$begingroup$
Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
$endgroup$
– n1k31t4
Aug 6 '18 at 15:18




$begingroup$
Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
$endgroup$
– n1k31t4
Aug 6 '18 at 15:18












$begingroup$
@n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
$endgroup$
– DuttaA
Aug 6 '18 at 15:48






$begingroup$
@n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
$endgroup$
– DuttaA
Aug 6 '18 at 15:48














$begingroup$
@DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
$endgroup$
– n1k31t4
Aug 6 '18 at 16:08




$begingroup$
@DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
$endgroup$
– n1k31t4
Aug 6 '18 at 16:08












$begingroup$
@n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
$endgroup$
– DuttaA
Aug 6 '18 at 16:38




$begingroup$
@n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
$endgroup$
– DuttaA
Aug 6 '18 at 16:38










3 Answers
3






active

oldest

votes


















1












$begingroup$

An activation function



This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.



We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
activation function can also be linear e.g. the identity function.



A squashing function



This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.



The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).



The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:




We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.




That description makes it sound vert similar indeed to the softmax!



This squashing function is defined as follows:



$$
v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
$$




where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.




If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.






share|improve this answer









$endgroup$





















    1












    $begingroup$

    Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.






    share|improve this answer









    $endgroup$





















      0












      $begingroup$

      So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.



      The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
      So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .



      NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.






      share|improve this answer








      New contributor




      Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













        Your Answer





        StackExchange.ifUsing("editor", function () {
        return StackExchange.using("mathjaxEditing", function () {
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        });
        });
        }, "mathjax-editing");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "557"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36533%2factivation-function-vs-squashing-function%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        1












        $begingroup$

        An activation function



        This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.



        We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
        activation function can also be linear e.g. the identity function.



        A squashing function



        This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.



        The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).



        The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:




        We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
        current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.




        That description makes it sound vert similar indeed to the softmax!



        This squashing function is defined as follows:



        $$
        v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
        $$




        where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.




        If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.






        share|improve this answer









        $endgroup$


















          1












          $begingroup$

          An activation function



          This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.



          We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
          activation function can also be linear e.g. the identity function.



          A squashing function



          This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.



          The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).



          The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:




          We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
          current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.




          That description makes it sound vert similar indeed to the softmax!



          This squashing function is defined as follows:



          $$
          v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
          $$




          where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.




          If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.






          share|improve this answer









          $endgroup$
















            1












            1








            1





            $begingroup$

            An activation function



            This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.



            We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
            activation function can also be linear e.g. the identity function.



            A squashing function



            This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.



            The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).



            The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:




            We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
            current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.




            That description makes it sound vert similar indeed to the softmax!



            This squashing function is defined as follows:



            $$
            v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
            $$




            where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.




            If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.






            share|improve this answer









            $endgroup$



            An activation function



            This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.



            We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
            activation function can also be linear e.g. the identity function.



            A squashing function



            This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.



            The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).



            The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:




            We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
            current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.




            That description makes it sound vert similar indeed to the softmax!



            This squashing function is defined as follows:



            $$
            v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
            $$




            where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.




            If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Aug 6 '18 at 15:16









            n1k31t4n1k31t4

            5,7912318




            5,7912318























                1












                $begingroup$

                Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.






                share|improve this answer









                $endgroup$


















                  1












                  $begingroup$

                  Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.






                  share|improve this answer









                  $endgroup$
















                    1












                    1








                    1





                    $begingroup$

                    Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.






                    share|improve this answer









                    $endgroup$



                    Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Aug 6 '18 at 15:04









                    Rajat GuptaRajat Gupta

                    765




                    765























                        0












                        $begingroup$

                        So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.



                        The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
                        So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .



                        NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.






                        share|improve this answer








                        New contributor




                        Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$


















                          0












                          $begingroup$

                          So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.



                          The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
                          So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .



                          NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.






                          share|improve this answer








                          New contributor




                          Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






                          $endgroup$
















                            0












                            0








                            0





                            $begingroup$

                            So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.



                            The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
                            So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .



                            NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.






                            share|improve this answer








                            New contributor




                            Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            $endgroup$



                            So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.



                            The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
                            So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .



                            NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.







                            share|improve this answer








                            New contributor




                            Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            share|improve this answer



                            share|improve this answer






                            New contributor




                            Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            answered 14 hours ago









                            Clumsy catClumsy cat

                            1011




                            1011




                            New contributor




                            Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





                            New contributor





                            Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            Clumsy cat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Data Science Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36533%2factivation-function-vs-squashing-function%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How to label and detect the document text images

                                Vallis Paradisi

                                Tabula Rosettana