normalizing data and avoiding dividing by zero
$begingroup$
I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.
I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.
However, when I normalized with
x(i)-xmean/(xmax-xmin)
I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?
neural-network normalization
$endgroup$
add a comment |
$begingroup$
I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.
I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.
However, when I normalized with
x(i)-xmean/(xmax-xmin)
I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?
neural-network normalization
$endgroup$
$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37
add a comment |
$begingroup$
I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.
I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.
However, when I normalized with
x(i)-xmean/(xmax-xmin)
I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?
neural-network normalization
$endgroup$
I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.
I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.
However, when I normalized with
x(i)-xmean/(xmax-xmin)
I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?
neural-network normalization
neural-network normalization
asked Sep 28 '18 at 14:32
zipline86zipline86
202
202
$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37
add a comment |
$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37
$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37
$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.
Assume we have an array of 200 values for variables s and t:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
mu, sigma = 20, 10 # mean and standard deviation
s = np.random.normal(mu, sigma, 200)
t = np.random.normal(mu, sigma, 200)
Reshape your variables if necessary:
s=np.reshape(s,(-1,1))
t=np.reshape(t,(-1,1))
Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.
scaler = MinMaxScaler()
print(scaler.fit(s))
print(scaler.fit(s))
snew=scaler.transform(s)
tnew=scaler.transform(t)
Here is a sample of our new variables:
>>> snew
array([[0.24896606],
[0.63121206],
[0.60448469],
.......
[0.49044733],
[0.28131596],
[0.32909155]
>>> tnew
array([[0.91224005],
[0.74540598],
[0.3938718 ],
.......
[0.75749275],
[0.80709325],
[0.19440844]
$endgroup$
add a comment |
$begingroup$
As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.
Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.
To implement normalization, follow the steps below:
from sklearn.datasets import load_iris
from sklearn import preprocessing
iris = load_iris()
print(iris.data.shape)
X_data = iris.data
y_labels = iris.target
normalized_X_data = preprocessing.normalize(X_data)
$endgroup$
add a comment |
$begingroup$
You should subtract the xmin
from x, not xmean
.
Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):
def rescale(nums, new_min=0, new_max=1):
"Rescale values to be between new min and max"
return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38913%2fnormalizing-data-and-avoiding-dividing-by-zero%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.
Assume we have an array of 200 values for variables s and t:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
mu, sigma = 20, 10 # mean and standard deviation
s = np.random.normal(mu, sigma, 200)
t = np.random.normal(mu, sigma, 200)
Reshape your variables if necessary:
s=np.reshape(s,(-1,1))
t=np.reshape(t,(-1,1))
Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.
scaler = MinMaxScaler()
print(scaler.fit(s))
print(scaler.fit(s))
snew=scaler.transform(s)
tnew=scaler.transform(t)
Here is a sample of our new variables:
>>> snew
array([[0.24896606],
[0.63121206],
[0.60448469],
.......
[0.49044733],
[0.28131596],
[0.32909155]
>>> tnew
array([[0.91224005],
[0.74540598],
[0.3938718 ],
.......
[0.75749275],
[0.80709325],
[0.19440844]
$endgroup$
add a comment |
$begingroup$
While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.
Assume we have an array of 200 values for variables s and t:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
mu, sigma = 20, 10 # mean and standard deviation
s = np.random.normal(mu, sigma, 200)
t = np.random.normal(mu, sigma, 200)
Reshape your variables if necessary:
s=np.reshape(s,(-1,1))
t=np.reshape(t,(-1,1))
Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.
scaler = MinMaxScaler()
print(scaler.fit(s))
print(scaler.fit(s))
snew=scaler.transform(s)
tnew=scaler.transform(t)
Here is a sample of our new variables:
>>> snew
array([[0.24896606],
[0.63121206],
[0.60448469],
.......
[0.49044733],
[0.28131596],
[0.32909155]
>>> tnew
array([[0.91224005],
[0.74540598],
[0.3938718 ],
.......
[0.75749275],
[0.80709325],
[0.19440844]
$endgroup$
add a comment |
$begingroup$
While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.
Assume we have an array of 200 values for variables s and t:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
mu, sigma = 20, 10 # mean and standard deviation
s = np.random.normal(mu, sigma, 200)
t = np.random.normal(mu, sigma, 200)
Reshape your variables if necessary:
s=np.reshape(s,(-1,1))
t=np.reshape(t,(-1,1))
Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.
scaler = MinMaxScaler()
print(scaler.fit(s))
print(scaler.fit(s))
snew=scaler.transform(s)
tnew=scaler.transform(t)
Here is a sample of our new variables:
>>> snew
array([[0.24896606],
[0.63121206],
[0.60448469],
.......
[0.49044733],
[0.28131596],
[0.32909155]
>>> tnew
array([[0.91224005],
[0.74540598],
[0.3938718 ],
.......
[0.75749275],
[0.80709325],
[0.19440844]
$endgroup$
While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.
Assume we have an array of 200 values for variables s and t:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
mu, sigma = 20, 10 # mean and standard deviation
s = np.random.normal(mu, sigma, 200)
t = np.random.normal(mu, sigma, 200)
Reshape your variables if necessary:
s=np.reshape(s,(-1,1))
t=np.reshape(t,(-1,1))
Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.
scaler = MinMaxScaler()
print(scaler.fit(s))
print(scaler.fit(s))
snew=scaler.transform(s)
tnew=scaler.transform(t)
Here is a sample of our new variables:
>>> snew
array([[0.24896606],
[0.63121206],
[0.60448469],
.......
[0.49044733],
[0.28131596],
[0.32909155]
>>> tnew
array([[0.91224005],
[0.74540598],
[0.3938718 ],
.......
[0.75749275],
[0.80709325],
[0.19440844]
answered Sep 29 '18 at 14:11
Michael GroganMichael Grogan
1863
1863
add a comment |
add a comment |
$begingroup$
As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.
Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.
To implement normalization, follow the steps below:
from sklearn.datasets import load_iris
from sklearn import preprocessing
iris = load_iris()
print(iris.data.shape)
X_data = iris.data
y_labels = iris.target
normalized_X_data = preprocessing.normalize(X_data)
$endgroup$
add a comment |
$begingroup$
As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.
Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.
To implement normalization, follow the steps below:
from sklearn.datasets import load_iris
from sklearn import preprocessing
iris = load_iris()
print(iris.data.shape)
X_data = iris.data
y_labels = iris.target
normalized_X_data = preprocessing.normalize(X_data)
$endgroup$
add a comment |
$begingroup$
As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.
Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.
To implement normalization, follow the steps below:
from sklearn.datasets import load_iris
from sklearn import preprocessing
iris = load_iris()
print(iris.data.shape)
X_data = iris.data
y_labels = iris.target
normalized_X_data = preprocessing.normalize(X_data)
$endgroup$
As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.
Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.
To implement normalization, follow the steps below:
from sklearn.datasets import load_iris
from sklearn import preprocessing
iris = load_iris()
print(iris.data.shape)
X_data = iris.data
y_labels = iris.target
normalized_X_data = preprocessing.normalize(X_data)
answered 16 mins ago
Full ArrayFull Array
1263
1263
add a comment |
add a comment |
$begingroup$
You should subtract the xmin
from x, not xmean
.
Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):
def rescale(nums, new_min=0, new_max=1):
"Rescale values to be between new min and max"
return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]
$endgroup$
add a comment |
$begingroup$
You should subtract the xmin
from x, not xmean
.
Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):
def rescale(nums, new_min=0, new_max=1):
"Rescale values to be between new min and max"
return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]
$endgroup$
add a comment |
$begingroup$
You should subtract the xmin
from x, not xmean
.
Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):
def rescale(nums, new_min=0, new_max=1):
"Rescale values to be between new min and max"
return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]
$endgroup$
You should subtract the xmin
from x, not xmean
.
Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):
def rescale(nums, new_min=0, new_max=1):
"Rescale values to be between new min and max"
return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]
edited Sep 29 '18 at 23:00
answered Sep 28 '18 at 14:59
Brian SpieringBrian Spiering
3,6831028
3,6831028
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38913%2fnormalizing-data-and-avoiding-dividing-by-zero%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37