Activation function vs Squashing function
$begingroup$
This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.
Today in a video explaining deep neural networks I came across the term Squashing function. This is a term that I have never heard used, instead our professor always used the term Activation function. Given the definitions I've been able to find, the two seem to be interchangeable terms.
Are they really synonymous or is there a difference?
neural-network activation-function
$endgroup$
|
show 3 more comments
$begingroup$
This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.
Today in a video explaining deep neural networks I came across the term Squashing function. This is a term that I have never heard used, instead our professor always used the term Activation function. Given the definitions I've been able to find, the two seem to be interchangeable terms.
Are they really synonymous or is there a difference?
neural-network activation-function
$endgroup$
$begingroup$
Yes...relu is an activation function but not a squashing function
$endgroup$
– DuttaA
Aug 6 '18 at 12:56
$begingroup$
Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
$endgroup$
– n1k31t4
Aug 6 '18 at 15:18
$begingroup$
@n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
$endgroup$
– DuttaA
Aug 6 '18 at 15:48
$begingroup$
@DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
$endgroup$
– n1k31t4
Aug 6 '18 at 16:08
$begingroup$
@n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
$endgroup$
– DuttaA
Aug 6 '18 at 16:38
|
show 3 more comments
$begingroup$
This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.
Today in a video explaining deep neural networks I came across the term Squashing function. This is a term that I have never heard used, instead our professor always used the term Activation function. Given the definitions I've been able to find, the two seem to be interchangeable terms.
Are they really synonymous or is there a difference?
neural-network activation-function
$endgroup$
This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.
Today in a video explaining deep neural networks I came across the term Squashing function. This is a term that I have never heard used, instead our professor always used the term Activation function. Given the definitions I've been able to find, the two seem to be interchangeable terms.
Are they really synonymous or is there a difference?
neural-network activation-function
neural-network activation-function
asked Aug 6 '18 at 12:48
Mate de VitaMate de Vita
184
184
$begingroup$
Yes...relu is an activation function but not a squashing function
$endgroup$
– DuttaA
Aug 6 '18 at 12:56
$begingroup$
Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
$endgroup$
– n1k31t4
Aug 6 '18 at 15:18
$begingroup$
@n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
$endgroup$
– DuttaA
Aug 6 '18 at 15:48
$begingroup$
@DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
$endgroup$
– n1k31t4
Aug 6 '18 at 16:08
$begingroup$
@n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
$endgroup$
– DuttaA
Aug 6 '18 at 16:38
|
show 3 more comments
$begingroup$
Yes...relu is an activation function but not a squashing function
$endgroup$
– DuttaA
Aug 6 '18 at 12:56
$begingroup$
Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
$endgroup$
– n1k31t4
Aug 6 '18 at 15:18
$begingroup$
@n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
$endgroup$
– DuttaA
Aug 6 '18 at 15:48
$begingroup$
@DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
$endgroup$
– n1k31t4
Aug 6 '18 at 16:08
$begingroup$
@n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
$endgroup$
– DuttaA
Aug 6 '18 at 16:38
$begingroup$
Yes...relu is an activation function but not a squashing function
$endgroup$
– DuttaA
Aug 6 '18 at 12:56
$begingroup$
Yes...relu is an activation function but not a squashing function
$endgroup$
– DuttaA
Aug 6 '18 at 12:56
$begingroup$
Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
$endgroup$
– n1k31t4
Aug 6 '18 at 15:18
$begingroup$
Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
$endgroup$
– n1k31t4
Aug 6 '18 at 15:18
$begingroup$
@n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
$endgroup$
– DuttaA
Aug 6 '18 at 15:48
$begingroup$
@n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
$endgroup$
– DuttaA
Aug 6 '18 at 15:48
$begingroup$
@DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
$endgroup$
– n1k31t4
Aug 6 '18 at 16:08
$begingroup$
@DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
$endgroup$
– n1k31t4
Aug 6 '18 at 16:08
$begingroup$
@n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
$endgroup$
– DuttaA
Aug 6 '18 at 16:38
$begingroup$
@n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
$endgroup$
– DuttaA
Aug 6 '18 at 16:38
|
show 3 more comments
3 Answers
3
active
oldest
votes
$begingroup$
An activation function
This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.
We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
activation function can also be linear e.g. the identity function.
A squashing function
This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.
The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).
The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:
We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.
That description makes it sound vert similar indeed to the softmax!
This squashing function is defined as follows:
$$
v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
$$
where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.
If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.
$endgroup$
add a comment |
$begingroup$
Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.
$endgroup$
add a comment |
$begingroup$
So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.
The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .
NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36533%2factivation-function-vs-squashing-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
An activation function
This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.
We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
activation function can also be linear e.g. the identity function.
A squashing function
This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.
The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).
The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:
We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.
That description makes it sound vert similar indeed to the softmax!
This squashing function is defined as follows:
$$
v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
$$
where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.
If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.
$endgroup$
add a comment |
$begingroup$
An activation function
This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.
We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
activation function can also be linear e.g. the identity function.
A squashing function
This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.
The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).
The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:
We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.
That description makes it sound vert similar indeed to the softmax!
This squashing function is defined as follows:
$$
v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
$$
where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.
If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.
$endgroup$
add a comment |
$begingroup$
An activation function
This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.
We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
activation function can also be linear e.g. the identity function.
A squashing function
This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.
The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).
The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:
We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.
That description makes it sound vert similar indeed to the softmax!
This squashing function is defined as follows:
$$
v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
$$
where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.
If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.
$endgroup$
An activation function
This the name given to a function, which is applied to a neuron that just had a weight update as a result of new information. It can refer to any of the well known activation funtions, such as the Rectified Linear Unit (ReLU), the hyperbolic tangent function (tanh) or even the identity function! Have a look at somewhere like the Keras documentation for a nice little list of examples.
We usually define the activation function as being a non-linear function, as it is that property, which gives a neural network its ability to approximate any equation (given a few constraints). However, an
activation function can also be linear e.g. the identity function.
A squashing function
This can mean one of two things, as far as I know, in the context of a neural network - the tag you added to the question - and they are close, just differently applied.
The first and most commonplace example, is when people refer to the softmax function, which squashes the final layer's activations/logits into the range [0, 1]. This has the effect of allowing final outputs to be directly interpreted as probabilities (i.e. they must sum to 1).
The second and newest usage of this words in the context of neural networks is from the reletively recent papers (one and two) from Sara Sabour, Geoffrey Hinton and Nicholas Frosst, which presented the idea of Capsule Netoworks. What these are and how they work is beyond the scope of this question; however, the term "squashing function" deserves special mention. Paper number one introduces it followingly:
We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the
current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.
That description makes it sound vert similar indeed to the softmax!
This squashing function is defined as follows:
$$
v_j = frac{||s_j||^2}{1 + ||s_j||^2} cdot frac{s_j}{||s_j||}
$$
where $v_j$ is the vector output of capsule $j$ and $s_j$ is its total input.
If this is all new to you and you'd like to learn more, I'd recommend having a read of those two papers, as well as perhaps a nice overview blog, like this one.
answered Aug 6 '18 at 15:16
n1k31t4n1k31t4
5,7912318
5,7912318
add a comment |
add a comment |
$begingroup$
Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.
$endgroup$
add a comment |
$begingroup$
Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.
$endgroup$
add a comment |
$begingroup$
Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.
$endgroup$
Activation functions like sigmoid function, hyperbolic tangent function, etc. are also called squashing function because they squash the input into a small range like in sigmoid function output is in range of [-1,1]. But you cannot call ReLU as a squashing function because for a positive input value it returns the output as same.
answered Aug 6 '18 at 15:04
Rajat GuptaRajat Gupta
765
765
add a comment |
add a comment |
$begingroup$
So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.
The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .
NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.
New contributor
$endgroup$
add a comment |
$begingroup$
So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.
The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .
NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.
New contributor
$endgroup$
add a comment |
$begingroup$
So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.
The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .
NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.
New contributor
$endgroup$
So there is a formal definition of squashing function used in the paper by Hornik, (1989), see definition 2.3. The paper demonstrates that any neural net with a single layer of sufficient number of nodes where the activation function is a 'squashing' function is a universal approximator. Given the context I think this is what is meant by squashing function.
The definition given there is any function that is non decreasing, $ textrm{lim}_{xrightarrow infty} f(x) = 1$ and $ textrm{lim}_{xrightarrow -infty} f(x) = 0$.
So we have that ReLU is not a squashing function because $ textrm{lim}_{xrightarrow infty} ReLU(x) = infty neq 1$ .
NB. a net with ReLU activation functions is a universal approximator, but the proof in that paper dosn't apply to it.
New contributor
New contributor
answered 14 hours ago
Clumsy catClumsy cat
1011
1011
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36533%2factivation-function-vs-squashing-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Yes...relu is an activation function but not a squashing function
$endgroup$
– DuttaA
Aug 6 '18 at 12:56
$begingroup$
Which video were you watching, in which these terms where used? @DuttaA - could one not say that a ReLU squashes all negative values to zero?
$endgroup$
– n1k31t4
Aug 6 '18 at 15:18
$begingroup$
@n1k31t4 math.stackexchange.com/questions/838939/…. Why do you think a function can be called squashing by just squashing an interval even when it is defined over a larger interval?
$endgroup$
– DuttaA
Aug 6 '18 at 15:48
$begingroup$
@DuttaA - Why not? I mean, squashing is not a technical term with a definition requiring it to squash from one asymptote all the way to another, rather just within given bounds, I'd say. I would be happy if there were such a definition, something more akin to normalisation. I don't mean to argue, just point out that the term is a little slang-like, and therefore the definition somewhat subjective.
$endgroup$
– n1k31t4
Aug 6 '18 at 16:08
$begingroup$
@n1k31t4 all functions I have encountered in mathematics are called a "name" only if they satisy the "name_condition' over the whole interval.. Although no formal definition exists I don't think it is satisying the condition here
$endgroup$
– DuttaA
Aug 6 '18 at 16:38