normalizing data and avoiding dividing by zero

I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.

I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.

However, when I normalized with

x(i)-xmean/(xmax-xmin)

I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?

asked Sep 28 '18 at 14:32

zipline86

202

$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37

add a comment |

However, when I normalized with

x(i)-xmean/(xmax-xmin)

I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?

asked Sep 28 '18 at 14:32

zipline86

202

$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37

add a comment |

However, when I normalized with

x(i)-xmean/(xmax-xmin)

I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?

asked Sep 28 '18 at 14:32

zipline86

202

However, when I normalized with

x(i)-xmean/(xmax-xmin)

I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?

neural-network normalization

asked Sep 28 '18 at 14:32

zipline86

202

asked Sep 28 '18 at 14:32

zipline86

202

asked Sep 28 '18 at 14:32

zipline86

202

asked Sep 28 '18 at 14:32

zipline86

202

asked Sep 28 '18 at 14:32

zipline86

202

$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37

add a comment |

$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37

I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.

– zipline86
Sep 28 '18 at 16:37

add a comment |

3 Answers
3

active

oldest

votes

While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.

Assume we have an array of 200 values for variables s and t:

import numpy as np

import pandas as pd

from sklearn.preprocessing import MinMaxScaler



mu, sigma = 20, 10 # mean and standard deviation

s = np.random.normal(mu, sigma, 200)

t = np.random.normal(mu, sigma, 200)

Reshape your variables if necessary:

s=np.reshape(s,(-1,1))

t=np.reshape(t,(-1,1))

Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.

scaler = MinMaxScaler()

print(scaler.fit(s))

print(scaler.fit(s))

snew=scaler.transform(s)

tnew=scaler.transform(t)

Here is a sample of our new variables:

>>> snew

array([[0.24896606],

       [0.63121206],

       [0.60448469],

       .......

       [0.49044733],

       [0.28131596],

       [0.32909155]



>>> tnew

array([[0.91224005],

       [0.74540598],

       [0.3938718 ],

       .......

       [0.75749275],

       [0.80709325],

       [0.19440844]

answered Sep 29 '18 at 14:11

Michael Grogan

1863

add a comment |

As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.

Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.

To implement normalization, follow the steps below:

from sklearn.datasets import load_iris

from sklearn import preprocessing



iris = load_iris()

print(iris.data.shape)



X_data = iris.data

y_labels = iris.target



normalized_X_data = preprocessing.normalize(X_data)

answered 16 mins ago

Full Array

1263

add a comment |

-1

You should subtract the xmin from x, not xmean.

Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):

def rescale(nums, new_min=0, new_max=1):

      "Rescale values to be between new min and max"

      return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]

edited Sep 29 '18 at 23:00

answered Sep 28 '18 at 14:59

Brian Spiering

3,6831028

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38913%2fnormalizing-data-and-avoiding-dividing-by-zero%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.

Assume we have an array of 200 values for variables s and t:

import numpy as np

import pandas as pd

from sklearn.preprocessing import MinMaxScaler



mu, sigma = 20, 10 # mean and standard deviation

s = np.random.normal(mu, sigma, 200)

t = np.random.normal(mu, sigma, 200)

Reshape your variables if necessary:

s=np.reshape(s,(-1,1))

t=np.reshape(t,(-1,1))

Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.

scaler = MinMaxScaler()

print(scaler.fit(s))

print(scaler.fit(s))

snew=scaler.transform(s)

tnew=scaler.transform(t)

Here is a sample of our new variables:

>>> snew

array([[0.24896606],

       [0.63121206],

       [0.60448469],

       .......

       [0.49044733],

       [0.28131596],

       [0.32909155]



>>> tnew

array([[0.91224005],

       [0.74540598],

       [0.3938718 ],

       .......

       [0.75749275],

       [0.80709325],

       [0.19440844]

answered Sep 29 '18 at 14:11

Michael Grogan

1863

add a comment |

While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.

Assume we have an array of 200 values for variables s and t:

import numpy as np

import pandas as pd

from sklearn.preprocessing import MinMaxScaler



mu, sigma = 20, 10 # mean and standard deviation

s = np.random.normal(mu, sigma, 200)

t = np.random.normal(mu, sigma, 200)

Reshape your variables if necessary:

s=np.reshape(s,(-1,1))

t=np.reshape(t,(-1,1))

Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.

scaler = MinMaxScaler()

print(scaler.fit(s))

print(scaler.fit(s))

snew=scaler.transform(s)

tnew=scaler.transform(t)

Here is a sample of our new variables:

>>> snew

array([[0.24896606],

       [0.63121206],

       [0.60448469],

       .......

       [0.49044733],

       [0.28131596],

       [0.32909155]



>>> tnew

array([[0.91224005],

       [0.74540598],

       [0.3938718 ],

       .......

       [0.75749275],

       [0.80709325],

       [0.19440844]

answered Sep 29 '18 at 14:11

Michael Grogan

1863

add a comment |

While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.

Assume we have an array of 200 values for variables s and t:

import numpy as np

import pandas as pd

from sklearn.preprocessing import MinMaxScaler



mu, sigma = 20, 10 # mean and standard deviation

s = np.random.normal(mu, sigma, 200)

t = np.random.normal(mu, sigma, 200)

Reshape your variables if necessary:

s=np.reshape(s,(-1,1))

t=np.reshape(t,(-1,1))

Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.

scaler = MinMaxScaler()

print(scaler.fit(s))

print(scaler.fit(s))

snew=scaler.transform(s)

tnew=scaler.transform(t)

Here is a sample of our new variables:

>>> snew

array([[0.24896606],

       [0.63121206],

       [0.60448469],

       .......

       [0.49044733],

       [0.28131596],

       [0.32909155]



>>> tnew

array([[0.91224005],

       [0.74540598],

       [0.3938718 ],

       .......

       [0.75749275],

       [0.80709325],

       [0.19440844]

answered Sep 29 '18 at 14:11

Michael Grogan

1863

While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.

Assume we have an array of 200 values for variables s and t:

import numpy as np

import pandas as pd

from sklearn.preprocessing import MinMaxScaler



mu, sigma = 20, 10 # mean and standard deviation

s = np.random.normal(mu, sigma, 200)

t = np.random.normal(mu, sigma, 200)

Reshape your variables if necessary:

s=np.reshape(s,(-1,1))

t=np.reshape(t,(-1,1))

Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.

scaler = MinMaxScaler()

print(scaler.fit(s))

print(scaler.fit(s))

snew=scaler.transform(s)

tnew=scaler.transform(t)

Here is a sample of our new variables:

>>> snew

array([[0.24896606],

       [0.63121206],

       [0.60448469],

       .......

       [0.49044733],

       [0.28131596],

       [0.32909155]



>>> tnew

array([[0.91224005],

       [0.74540598],

       [0.3938718 ],

       .......

       [0.75749275],

       [0.80709325],

       [0.19440844]

answered Sep 29 '18 at 14:11

Michael Grogan

1863

answered Sep 29 '18 at 14:11

Michael Grogan

1863

answered Sep 29 '18 at 14:11

Michael Grogan

1863

answered Sep 29 '18 at 14:11

Michael Grogan

1863

add a comment |

As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.

To implement normalization, follow the steps below:

from sklearn.datasets import load_iris

from sklearn import preprocessing



iris = load_iris()

print(iris.data.shape)



X_data = iris.data

y_labels = iris.target



normalized_X_data = preprocessing.normalize(X_data)

answered 16 mins ago

Full Array

1263

add a comment |

As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.

To implement normalization, follow the steps below:

from sklearn.datasets import load_iris

from sklearn import preprocessing



iris = load_iris()

print(iris.data.shape)



X_data = iris.data

y_labels = iris.target



normalized_X_data = preprocessing.normalize(X_data)

answered 16 mins ago

Full Array

1263

add a comment |

As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.

To implement normalization, follow the steps below:

from sklearn.datasets import load_iris

from sklearn import preprocessing



iris = load_iris()

print(iris.data.shape)



X_data = iris.data

y_labels = iris.target



normalized_X_data = preprocessing.normalize(X_data)

answered 16 mins ago

Full Array

1263

As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.

To implement normalization, follow the steps below:

from sklearn.datasets import load_iris

from sklearn import preprocessing



iris = load_iris()

print(iris.data.shape)



X_data = iris.data

y_labels = iris.target



normalized_X_data = preprocessing.normalize(X_data)

answered 16 mins ago

Full Array

1263

answered 16 mins ago

Full Array

1263

answered 16 mins ago

Full Array

1263

answered 16 mins ago

Full Array

1263

add a comment |

-1

You should subtract the xmin from x, not xmean.

Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):

def rescale(nums, new_min=0, new_max=1):

      "Rescale values to be between new min and max"

      return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]

edited Sep 29 '18 at 23:00

answered Sep 28 '18 at 14:59

Brian Spiering

3,6831028

add a comment |

-1

You should subtract the xmin from x, not xmean.

Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):

def rescale(nums, new_min=0, new_max=1):

      "Rescale values to be between new min and max"

      return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]

edited Sep 29 '18 at 23:00

answered Sep 28 '18 at 14:59

Brian Spiering

3,6831028

add a comment |

-1

You should subtract the xmin from x, not xmean.

Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):

def rescale(nums, new_min=0, new_max=1):

      "Rescale values to be between new min and max"

      return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]

edited Sep 29 '18 at 23:00

answered Sep 28 '18 at 14:59

Brian Spiering

3,6831028

You should subtract the xmin from x, not xmean.

Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):

def rescale(nums, new_min=0, new_max=1):

      "Rescale values to be between new min and max"

      return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]

edited Sep 29 '18 at 23:00

answered Sep 28 '18 at 14:59

Brian Spiering

3,6831028

edited Sep 29 '18 at 23:00

answered Sep 28 '18 at 14:59

Brian Spiering

3,6831028

answered Sep 28 '18 at 14:59

Brian Spiering

3,6831028

answered Sep 28 '18 at 14:59

Brian Spiering

3,6831028

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk