Python Time series: extracting features on a rolling window basis












0












$begingroup$


I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.



As a quick example, for a window of size 10, I would like to calculate statistics like mean and std deviation for the first t=0:9 points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10, and so on and so forth, until the end of the data.



Is there an efficient way to do this in Python?










share|improve this question









$endgroup$

















    0












    $begingroup$


    I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.



    As a quick example, for a window of size 10, I would like to calculate statistics like mean and std deviation for the first t=0:9 points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10, and so on and so forth, until the end of the data.



    Is there an efficient way to do this in Python?










    share|improve this question









    $endgroup$















      0












      0








      0





      $begingroup$


      I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.



      As a quick example, for a window of size 10, I would like to calculate statistics like mean and std deviation for the first t=0:9 points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10, and so on and so forth, until the end of the data.



      Is there an efficient way to do this in Python?










      share|improve this question









      $endgroup$




      I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.



      As a quick example, for a window of size 10, I would like to calculate statistics like mean and std deviation for the first t=0:9 points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10, and so on and so forth, until the end of the data.



      Is there an efficient way to do this in Python?







      python time-series feature-extraction feature-engineering






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked yesterday









      Coolio2654Coolio2654

      1255




      1255






















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling that will cycle over your data in a given frame-size and compute whatever you like on that block.



          Let me show you an example - say we start with the following column of data:



          In [1]: import pandas as pd                                                     
          In [2]: import numpy as np
          In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
          "B": np.random.randn(20)})


          Look at the first 10 rows:



          In [4]: df.head(10)
          Out[4]:
          A B
          0 63 -0.003947
          1 55 0.442597
          2 6 0.684125
          3 17 0.968987
          4 33 -0.018640
          5 50 -0.579558
          6 71 0.563125
          7 31 1.417384
          8 8 0.607813
          9 36 0.186146


          We can compute the rolling average over each column and save it back to the dataframe like this:



          In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
          In [7]: df.head(10)
          In [9]: df
          Out[9]:
          A B rolling_a rolling_b
          0 63 -0.003947 NaN NaN
          1 55 0.442597 NaN NaN
          2 6 0.684125 NaN NaN
          3 17 0.968987 NaN NaN
          4 33 -0.018640 34.8 0.414624
          5 50 -0.579558 32.2 0.299502
          6 71 0.563125 35.4 0.323608
          7 31 1.417384 40.4 0.470260
          8 8 0.607813 38.6 0.398025
          9 36 0.186146 39.2 0.438982


          You might notice that the first 4 rows contain NaN values (Not a Number). This is because the rolling() method will let the mean() method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling() method that you can experiment with.



          You can do the same above for single column of a large dataframe like this:



          >>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()


          You can also apply just about any function to the rolling frame - not just mean().






          share|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46609%2fpython-time-series-extracting-features-on-a-rolling-window-basis%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling that will cycle over your data in a given frame-size and compute whatever you like on that block.



            Let me show you an example - say we start with the following column of data:



            In [1]: import pandas as pd                                                     
            In [2]: import numpy as np
            In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
            "B": np.random.randn(20)})


            Look at the first 10 rows:



            In [4]: df.head(10)
            Out[4]:
            A B
            0 63 -0.003947
            1 55 0.442597
            2 6 0.684125
            3 17 0.968987
            4 33 -0.018640
            5 50 -0.579558
            6 71 0.563125
            7 31 1.417384
            8 8 0.607813
            9 36 0.186146


            We can compute the rolling average over each column and save it back to the dataframe like this:



            In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
            In [7]: df.head(10)
            In [9]: df
            Out[9]:
            A B rolling_a rolling_b
            0 63 -0.003947 NaN NaN
            1 55 0.442597 NaN NaN
            2 6 0.684125 NaN NaN
            3 17 0.968987 NaN NaN
            4 33 -0.018640 34.8 0.414624
            5 50 -0.579558 32.2 0.299502
            6 71 0.563125 35.4 0.323608
            7 31 1.417384 40.4 0.470260
            8 8 0.607813 38.6 0.398025
            9 36 0.186146 39.2 0.438982


            You might notice that the first 4 rows contain NaN values (Not a Number). This is because the rolling() method will let the mean() method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling() method that you can experiment with.



            You can do the same above for single column of a large dataframe like this:



            >>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()


            You can also apply just about any function to the rolling frame - not just mean().






            share|improve this answer









            $endgroup$


















              1












              $begingroup$

              Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling that will cycle over your data in a given frame-size and compute whatever you like on that block.



              Let me show you an example - say we start with the following column of data:



              In [1]: import pandas as pd                                                     
              In [2]: import numpy as np
              In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
              "B": np.random.randn(20)})


              Look at the first 10 rows:



              In [4]: df.head(10)
              Out[4]:
              A B
              0 63 -0.003947
              1 55 0.442597
              2 6 0.684125
              3 17 0.968987
              4 33 -0.018640
              5 50 -0.579558
              6 71 0.563125
              7 31 1.417384
              8 8 0.607813
              9 36 0.186146


              We can compute the rolling average over each column and save it back to the dataframe like this:



              In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
              In [7]: df.head(10)
              In [9]: df
              Out[9]:
              A B rolling_a rolling_b
              0 63 -0.003947 NaN NaN
              1 55 0.442597 NaN NaN
              2 6 0.684125 NaN NaN
              3 17 0.968987 NaN NaN
              4 33 -0.018640 34.8 0.414624
              5 50 -0.579558 32.2 0.299502
              6 71 0.563125 35.4 0.323608
              7 31 1.417384 40.4 0.470260
              8 8 0.607813 38.6 0.398025
              9 36 0.186146 39.2 0.438982


              You might notice that the first 4 rows contain NaN values (Not a Number). This is because the rolling() method will let the mean() method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling() method that you can experiment with.



              You can do the same above for single column of a large dataframe like this:



              >>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()


              You can also apply just about any function to the rolling frame - not just mean().






              share|improve this answer









              $endgroup$
















                1












                1








                1





                $begingroup$

                Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling that will cycle over your data in a given frame-size and compute whatever you like on that block.



                Let me show you an example - say we start with the following column of data:



                In [1]: import pandas as pd                                                     
                In [2]: import numpy as np
                In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
                "B": np.random.randn(20)})


                Look at the first 10 rows:



                In [4]: df.head(10)
                Out[4]:
                A B
                0 63 -0.003947
                1 55 0.442597
                2 6 0.684125
                3 17 0.968987
                4 33 -0.018640
                5 50 -0.579558
                6 71 0.563125
                7 31 1.417384
                8 8 0.607813
                9 36 0.186146


                We can compute the rolling average over each column and save it back to the dataframe like this:



                In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
                In [7]: df.head(10)
                In [9]: df
                Out[9]:
                A B rolling_a rolling_b
                0 63 -0.003947 NaN NaN
                1 55 0.442597 NaN NaN
                2 6 0.684125 NaN NaN
                3 17 0.968987 NaN NaN
                4 33 -0.018640 34.8 0.414624
                5 50 -0.579558 32.2 0.299502
                6 71 0.563125 35.4 0.323608
                7 31 1.417384 40.4 0.470260
                8 8 0.607813 38.6 0.398025
                9 36 0.186146 39.2 0.438982


                You might notice that the first 4 rows contain NaN values (Not a Number). This is because the rolling() method will let the mean() method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling() method that you can experiment with.



                You can do the same above for single column of a large dataframe like this:



                >>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()


                You can also apply just about any function to the rolling frame - not just mean().






                share|improve this answer









                $endgroup$



                Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling that will cycle over your data in a given frame-size and compute whatever you like on that block.



                Let me show you an example - say we start with the following column of data:



                In [1]: import pandas as pd                                                     
                In [2]: import numpy as np
                In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
                "B": np.random.randn(20)})


                Look at the first 10 rows:



                In [4]: df.head(10)
                Out[4]:
                A B
                0 63 -0.003947
                1 55 0.442597
                2 6 0.684125
                3 17 0.968987
                4 33 -0.018640
                5 50 -0.579558
                6 71 0.563125
                7 31 1.417384
                8 8 0.607813
                9 36 0.186146


                We can compute the rolling average over each column and save it back to the dataframe like this:



                In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
                In [7]: df.head(10)
                In [9]: df
                Out[9]:
                A B rolling_a rolling_b
                0 63 -0.003947 NaN NaN
                1 55 0.442597 NaN NaN
                2 6 0.684125 NaN NaN
                3 17 0.968987 NaN NaN
                4 33 -0.018640 34.8 0.414624
                5 50 -0.579558 32.2 0.299502
                6 71 0.563125 35.4 0.323608
                7 31 1.417384 40.4 0.470260
                8 8 0.607813 38.6 0.398025
                9 36 0.186146 39.2 0.438982


                You might notice that the first 4 rows contain NaN values (Not a Number). This is because the rolling() method will let the mean() method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling() method that you can experiment with.



                You can do the same above for single column of a large dataframe like this:



                >>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()


                You can also apply just about any function to the rolling frame - not just mean().







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered yesterday









                n1k31t4n1k31t4

                6,2162319




                6,2162319






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46609%2fpython-time-series-extracting-features-on-a-rolling-window-basis%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to label and detect the document text images

                    Vallis Paradisi

                    Tabula Rosettana