Python Time series: extracting features on a rolling window basis

I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.

As a quick example, for a window of size 10, I would like to calculate statistics like mean and std deviation for the first t=0:9 points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10, and so on and so forth, until the end of the data.

Is there an efficient way to do this in Python?

asked yesterday

Coolio2654

1255

add a comment |

I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.

Is there an efficient way to do this in Python?

asked yesterday

Coolio2654

1255

add a comment |

I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.

Is there an efficient way to do this in Python?

asked yesterday

Coolio2654

1255

I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.

Is there an efficient way to do this in Python?

python time-series feature-extraction feature-engineering

asked yesterday

Coolio2654

1255

asked yesterday

Coolio2654

1255

asked yesterday

Coolio2654

1255

asked yesterday

Coolio2654

1255

asked yesterday

Coolio2654

1255

add a comment |

1 Answer
1

active

oldest

votes

Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling that will cycle over your data in a given frame-size and compute whatever you like on that block.

Let me show you an example - say we start with the following column of data:

In [1]: import pandas as pd                                                     

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),

                           "B": np.random.randn(20)})

Look at the first 10 rows:

In [4]: df.head(10)

Out[4]: 

     A         B

0   63 -0.003947

1   55  0.442597

2    6  0.684125

3   17  0.968987

4   33 -0.018640

5   50 -0.579558

6   71  0.563125

7   31  1.417384

8    8  0.607813

9   36  0.186146

We can compute the rolling average over each column and save it back to the dataframe like this:

In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()

In [7]: df.head(10)

In [9]: df

Out[9]: 

     A         B  rolling_a  rolling_b

0   63 -0.003947        NaN        NaN

1   55  0.442597        NaN        NaN

2    6  0.684125        NaN        NaN

3   17  0.968987        NaN        NaN

4   33 -0.018640       34.8   0.414624

5   50 -0.579558       32.2   0.299502

6   71  0.563125       35.4   0.323608

7   31  1.417384       40.4   0.470260

8    8  0.607813       38.6   0.398025

9   36  0.186146       39.2   0.438982

You might notice that the first 4 rows contain NaN values (Not a Number). This is because the rolling() method will let the mean() method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling() method that you can experiment with.

You can do the same above for single column of a large dataframe like this:

>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()

You can also apply just about any function to the rolling frame - not just mean().

answered yesterday

n1k31t4

6,2162319

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46609%2fpython-time-series-extracting-features-on-a-rolling-window-basis%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Let me show you an example - say we start with the following column of data:

In [1]: import pandas as pd                                                     

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),

                           "B": np.random.randn(20)})

Look at the first 10 rows:

In [4]: df.head(10)

Out[4]: 

     A         B

0   63 -0.003947

1   55  0.442597

2    6  0.684125

3   17  0.968987

4   33 -0.018640

5   50 -0.579558

6   71  0.563125

7   31  1.417384

8    8  0.607813

9   36  0.186146

We can compute the rolling average over each column and save it back to the dataframe like this:

In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()

In [7]: df.head(10)

In [9]: df

Out[9]: 

     A         B  rolling_a  rolling_b

0   63 -0.003947        NaN        NaN

1   55  0.442597        NaN        NaN

2    6  0.684125        NaN        NaN

3   17  0.968987        NaN        NaN

4   33 -0.018640       34.8   0.414624

5   50 -0.579558       32.2   0.299502

6   71  0.563125       35.4   0.323608

7   31  1.417384       40.4   0.470260

8    8  0.607813       38.6   0.398025

9   36  0.186146       39.2   0.438982

You can do the same above for single column of a large dataframe like this:

>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()

You can also apply just about any function to the rolling frame - not just mean().

answered yesterday

n1k31t4

6,2162319

add a comment |

Let me show you an example - say we start with the following column of data:

In [1]: import pandas as pd                                                     

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),

                           "B": np.random.randn(20)})

Look at the first 10 rows:

In [4]: df.head(10)

Out[4]: 

     A         B

0   63 -0.003947

1   55  0.442597

2    6  0.684125

3   17  0.968987

4   33 -0.018640

5   50 -0.579558

6   71  0.563125

7   31  1.417384

8    8  0.607813

9   36  0.186146

We can compute the rolling average over each column and save it back to the dataframe like this:

In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()

In [7]: df.head(10)

In [9]: df

Out[9]: 

     A         B  rolling_a  rolling_b

0   63 -0.003947        NaN        NaN

1   55  0.442597        NaN        NaN

2    6  0.684125        NaN        NaN

3   17  0.968987        NaN        NaN

4   33 -0.018640       34.8   0.414624

5   50 -0.579558       32.2   0.299502

6   71  0.563125       35.4   0.323608

7   31  1.417384       40.4   0.470260

8    8  0.607813       38.6   0.398025

9   36  0.186146       39.2   0.438982

You can do the same above for single column of a large dataframe like this:

>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()

You can also apply just about any function to the rolling frame - not just mean().

answered yesterday

n1k31t4

6,2162319

add a comment |

Let me show you an example - say we start with the following column of data:

In [1]: import pandas as pd                                                     

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),

                           "B": np.random.randn(20)})

Look at the first 10 rows:

In [4]: df.head(10)

Out[4]: 

     A         B

0   63 -0.003947

1   55  0.442597

2    6  0.684125

3   17  0.968987

4   33 -0.018640

5   50 -0.579558

6   71  0.563125

7   31  1.417384

8    8  0.607813

9   36  0.186146

We can compute the rolling average over each column and save it back to the dataframe like this:

In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()

In [7]: df.head(10)

In [9]: df

Out[9]: 

     A         B  rolling_a  rolling_b

0   63 -0.003947        NaN        NaN

1   55  0.442597        NaN        NaN

2    6  0.684125        NaN        NaN

3   17  0.968987        NaN        NaN

4   33 -0.018640       34.8   0.414624

5   50 -0.579558       32.2   0.299502

6   71  0.563125       35.4   0.323608

7   31  1.417384       40.4   0.470260

8    8  0.607813       38.6   0.398025

9   36  0.186146       39.2   0.438982

You can do the same above for single column of a large dataframe like this:

>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()

You can also apply just about any function to the rolling frame - not just mean().

answered yesterday

n1k31t4

6,2162319

Let me show you an example - say we start with the following column of data:

In [1]: import pandas as pd                                                     

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),

                           "B": np.random.randn(20)})

Look at the first 10 rows:

In [4]: df.head(10)

Out[4]: 

     A         B

0   63 -0.003947

1   55  0.442597

2    6  0.684125

3   17  0.968987

4   33 -0.018640

5   50 -0.579558

6   71  0.563125

7   31  1.417384

8    8  0.607813

9   36  0.186146

We can compute the rolling average over each column and save it back to the dataframe like this:

In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()

In [7]: df.head(10)

In [9]: df

Out[9]: 

     A         B  rolling_a  rolling_b

0   63 -0.003947        NaN        NaN

1   55  0.442597        NaN        NaN

2    6  0.684125        NaN        NaN

3   17  0.968987        NaN        NaN

4   33 -0.018640       34.8   0.414624

5   50 -0.579558       32.2   0.299502

6   71  0.563125       35.4   0.323608

7   31  1.417384       40.4   0.470260

8    8  0.607813       38.6   0.398025

9   36  0.186146       39.2   0.438982

You can do the same above for single column of a large dataframe like this:

>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()

You can also apply just about any function to the rolling frame - not just mean().

answered yesterday

n1k31t4

6,2162319

answered yesterday

n1k31t4

6,2162319

answered yesterday

n1k31t4

6,2162319

answered yesterday

n1k31t4

6,2162319

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk