normalizing data and avoiding dividing by zero












2












$begingroup$


I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.



I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.



However, when I normalized with



x(i)-xmean/(xmax-xmin) 


I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?










share|improve this question









$endgroup$












  • $begingroup$
    I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
    $endgroup$
    – zipline86
    Sep 28 '18 at 16:37
















2












$begingroup$


I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.



I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.



However, when I normalized with



x(i)-xmean/(xmax-xmin) 


I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?










share|improve this question









$endgroup$












  • $begingroup$
    I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
    $endgroup$
    – zipline86
    Sep 28 '18 at 16:37














2












2








2


1



$begingroup$


I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.



I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.



However, when I normalized with



x(i)-xmean/(xmax-xmin) 


I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?










share|improve this question









$endgroup$




I have data that I'm compressing with AutoEncoders (3-layer neural network) and I would like to normalize my data first. I would like to try to use the coded latent vector and feed it into an anomaly detection algorithm and see what happens.



I would like to normalize the data for the autoencoder so my values are either between 0,1 or -1,-1 because my output activation function will either be a sigmoid or tanh. This way my algorithm can train and the input will be in the same range as the output values of the NN.



However, when I normalized with



x(i)-xmean/(xmax-xmin) 


I ended up dividing by 0 in several features of the data which gave NaN. Is is possible to normalize my data so it is between -1,1 or 0,1 while avoiding dividing by 0 for my data?







neural-network normalization






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Sep 28 '18 at 14:32









zipline86zipline86

202




202












  • $begingroup$
    I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
    $endgroup$
    – zipline86
    Sep 28 '18 at 16:37


















  • $begingroup$
    I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
    $endgroup$
    – zipline86
    Sep 28 '18 at 16:37
















$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37




$begingroup$
I just realized that if my max and min are the same value, which is why I would get zero in thd denominator then I should just remove those columns.
$endgroup$
– zipline86
Sep 28 '18 at 16:37










3 Answers
3






active

oldest

votes


















0












$begingroup$

While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.



Assume we have an array of 200 values for variables s and t:



import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

mu, sigma = 20, 10 # mean and standard deviation
s = np.random.normal(mu, sigma, 200)
t = np.random.normal(mu, sigma, 200)


Reshape your variables if necessary:



s=np.reshape(s,(-1,1))
t=np.reshape(t,(-1,1))


Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.



scaler = MinMaxScaler()
print(scaler.fit(s))
print(scaler.fit(s))
snew=scaler.transform(s)
tnew=scaler.transform(t)


Here is a sample of our new variables:



>>> snew
array([[0.24896606],
[0.63121206],
[0.60448469],
.......
[0.49044733],
[0.28131596],
[0.32909155]

>>> tnew
array([[0.91224005],
[0.74540598],
[0.3938718 ],
.......
[0.75749275],
[0.80709325],
[0.19440844]





share|improve this answer









$endgroup$





















    0












    $begingroup$

    As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.



    Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.



    To implement normalization, follow the steps below:



    from sklearn.datasets import load_iris
    from sklearn import preprocessing

    iris = load_iris()
    print(iris.data.shape)

    X_data = iris.data
    y_labels = iris.target

    normalized_X_data = preprocessing.normalize(X_data)





    share|improve this answer









    $endgroup$





















      -1












      $begingroup$

      You should subtract the xmin from x, not xmean.



      Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):



      def rescale(nums, new_min=0, new_max=1):
      "Rescale values to be between new min and max"
      return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]





      share|improve this answer











      $endgroup$













        Your Answer





        StackExchange.ifUsing("editor", function () {
        return StackExchange.using("mathjaxEditing", function () {
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        });
        });
        }, "mathjax-editing");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "557"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38913%2fnormalizing-data-and-avoiding-dividing-by-zero%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        0












        $begingroup$

        While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.



        Assume we have an array of 200 values for variables s and t:



        import numpy as np
        import pandas as pd
        from sklearn.preprocessing import MinMaxScaler

        mu, sigma = 20, 10 # mean and standard deviation
        s = np.random.normal(mu, sigma, 200)
        t = np.random.normal(mu, sigma, 200)


        Reshape your variables if necessary:



        s=np.reshape(s,(-1,1))
        t=np.reshape(t,(-1,1))


        Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.



        scaler = MinMaxScaler()
        print(scaler.fit(s))
        print(scaler.fit(s))
        snew=scaler.transform(s)
        tnew=scaler.transform(t)


        Here is a sample of our new variables:



        >>> snew
        array([[0.24896606],
        [0.63121206],
        [0.60448469],
        .......
        [0.49044733],
        [0.28131596],
        [0.32909155]

        >>> tnew
        array([[0.91224005],
        [0.74540598],
        [0.3938718 ],
        .......
        [0.75749275],
        [0.80709325],
        [0.19440844]





        share|improve this answer









        $endgroup$


















          0












          $begingroup$

          While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.



          Assume we have an array of 200 values for variables s and t:



          import numpy as np
          import pandas as pd
          from sklearn.preprocessing import MinMaxScaler

          mu, sigma = 20, 10 # mean and standard deviation
          s = np.random.normal(mu, sigma, 200)
          t = np.random.normal(mu, sigma, 200)


          Reshape your variables if necessary:



          s=np.reshape(s,(-1,1))
          t=np.reshape(t,(-1,1))


          Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.



          scaler = MinMaxScaler()
          print(scaler.fit(s))
          print(scaler.fit(s))
          snew=scaler.transform(s)
          tnew=scaler.transform(t)


          Here is a sample of our new variables:



          >>> snew
          array([[0.24896606],
          [0.63121206],
          [0.60448469],
          .......
          [0.49044733],
          [0.28131596],
          [0.32909155]

          >>> tnew
          array([[0.91224005],
          [0.74540598],
          [0.3938718 ],
          .......
          [0.75749275],
          [0.80709325],
          [0.19440844]





          share|improve this answer









          $endgroup$
















            0












            0








            0





            $begingroup$

            While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.



            Assume we have an array of 200 values for variables s and t:



            import numpy as np
            import pandas as pd
            from sklearn.preprocessing import MinMaxScaler

            mu, sigma = 20, 10 # mean and standard deviation
            s = np.random.normal(mu, sigma, 200)
            t = np.random.normal(mu, sigma, 200)


            Reshape your variables if necessary:



            s=np.reshape(s,(-1,1))
            t=np.reshape(t,(-1,1))


            Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.



            scaler = MinMaxScaler()
            print(scaler.fit(s))
            print(scaler.fit(s))
            snew=scaler.transform(s)
            tnew=scaler.transform(t)


            Here is a sample of our new variables:



            >>> snew
            array([[0.24896606],
            [0.63121206],
            [0.60448469],
            .......
            [0.49044733],
            [0.28131596],
            [0.32909155]

            >>> tnew
            array([[0.91224005],
            [0.74540598],
            [0.3938718 ],
            .......
            [0.75749275],
            [0.80709325],
            [0.19440844]





            share|improve this answer









            $endgroup$



            While you could do this manually, Python also has a handy little function called MinMaxScaler, which will automatically apply max-min normalization to scale data between 0 and 1.



            Assume we have an array of 200 values for variables s and t:



            import numpy as np
            import pandas as pd
            from sklearn.preprocessing import MinMaxScaler

            mu, sigma = 20, 10 # mean and standard deviation
            s = np.random.normal(mu, sigma, 200)
            t = np.random.normal(mu, sigma, 200)


            Reshape your variables if necessary:



            s=np.reshape(s,(-1,1))
            t=np.reshape(t,(-1,1))


            Now, you can see that we are forming two new variables, snew and tnew, which we are scaling using MinMaxScaler.



            scaler = MinMaxScaler()
            print(scaler.fit(s))
            print(scaler.fit(s))
            snew=scaler.transform(s)
            tnew=scaler.transform(t)


            Here is a sample of our new variables:



            >>> snew
            array([[0.24896606],
            [0.63121206],
            [0.60448469],
            .......
            [0.49044733],
            [0.28131596],
            [0.32909155]

            >>> tnew
            array([[0.91224005],
            [0.74540598],
            [0.3938718 ],
            .......
            [0.75749275],
            [0.80709325],
            [0.19440844]






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Sep 29 '18 at 14:11









            Michael GroganMichael Grogan

            1863




            1863























                0












                $begingroup$

                As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.



                Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.



                To implement normalization, follow the steps below:



                from sklearn.datasets import load_iris
                from sklearn import preprocessing

                iris = load_iris()
                print(iris.data.shape)

                X_data = iris.data
                y_labels = iris.target

                normalized_X_data = preprocessing.normalize(X_data)





                share|improve this answer









                $endgroup$


















                  0












                  $begingroup$

                  As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.



                  Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.



                  To implement normalization, follow the steps below:



                  from sklearn.datasets import load_iris
                  from sklearn import preprocessing

                  iris = load_iris()
                  print(iris.data.shape)

                  X_data = iris.data
                  y_labels = iris.target

                  normalized_X_data = preprocessing.normalize(X_data)





                  share|improve this answer









                  $endgroup$
















                    0












                    0








                    0





                    $begingroup$

                    As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.



                    Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.



                    To implement normalization, follow the steps below:



                    from sklearn.datasets import load_iris
                    from sklearn import preprocessing

                    iris = load_iris()
                    print(iris.data.shape)

                    X_data = iris.data
                    y_labels = iris.target

                    normalized_X_data = preprocessing.normalize(X_data)





                    share|improve this answer









                    $endgroup$



                    As others pointed out, you can normalize or standardize your data using the following steps. I'm sure other libraries have similar functions but I think this is efficient.



                    Since you requested normalization, I'll cover that topic in this post. As others alluded, data normalization is the process in which researchers or data science practitioners make all the values in a given dataset be proportionally spread between 0 and 1.



                    To implement normalization, follow the steps below:



                    from sklearn.datasets import load_iris
                    from sklearn import preprocessing

                    iris = load_iris()
                    print(iris.data.shape)

                    X_data = iris.data
                    y_labels = iris.target

                    normalized_X_data = preprocessing.normalize(X_data)






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered 16 mins ago









                    Full ArrayFull Array

                    1263




                    1263























                        -1












                        $begingroup$

                        You should subtract the xmin from x, not xmean.



                        Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):



                        def rescale(nums, new_min=0, new_max=1):
                        "Rescale values to be between new min and max"
                        return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]





                        share|improve this answer











                        $endgroup$


















                          -1












                          $begingroup$

                          You should subtract the xmin from x, not xmean.



                          Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):



                          def rescale(nums, new_min=0, new_max=1):
                          "Rescale values to be between new min and max"
                          return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]





                          share|improve this answer











                          $endgroup$
















                            -1












                            -1








                            -1





                            $begingroup$

                            You should subtract the xmin from x, not xmean.



                            Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):



                            def rescale(nums, new_min=0, new_max=1):
                            "Rescale values to be between new min and max"
                            return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]





                            share|improve this answer











                            $endgroup$



                            You should subtract the xmin from x, not xmean.



                            Here is a normalization function generalized to rescale any new minimum and maximum as parameters (e.g., 0,1 or -1,-1):



                            def rescale(nums, new_min=0, new_max=1):
                            "Rescale values to be between new min and max"
                            return [(new_max - new_min) / (max(nums)-min(nums)) * (value-max(nums)) + new_max for value in nums]






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Sep 29 '18 at 23:00

























                            answered Sep 28 '18 at 14:59









                            Brian SpieringBrian Spiering

                            3,6831028




                            3,6831028






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Data Science Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38913%2fnormalizing-data-and-avoiding-dividing-by-zero%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How to label and detect the document text images

                                Vallis Paradisi

                                Tabula Rosettana