Python Time series: extracting features on a rolling window basis
$begingroup$
I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.
As a quick example, for a window of size 10
, I would like to calculate statistics like mean and std deviation for the first t=0:9
points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10
, and so on and so forth, until the end of the data.
Is there an efficient way to do this in Python?
python time-series feature-extraction feature-engineering
$endgroup$
add a comment |
$begingroup$
I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.
As a quick example, for a window of size 10
, I would like to calculate statistics like mean and std deviation for the first t=0:9
points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10
, and so on and so forth, until the end of the data.
Is there an efficient way to do this in Python?
python time-series feature-extraction feature-engineering
$endgroup$
add a comment |
$begingroup$
I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.
As a quick example, for a window of size 10
, I would like to calculate statistics like mean and std deviation for the first t=0:9
points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10
, and so on and so forth, until the end of the data.
Is there an efficient way to do this in Python?
python time-series feature-extraction feature-engineering
$endgroup$
I have a long univariate time series, and before performing some machine learning models with it, I want to extract as many features as I can from the time series on a rolling-window basis.
As a quick example, for a window of size 10
, I would like to calculate statistics like mean and std deviation for the first t=0:9
points in my dataset, and have those two results occupy one row in a some new feature table, and the next row in the table will have mean and std deviation calculated on points t=1:10
, and so on and so forth, until the end of the data.
Is there an efficient way to do this in Python?
python time-series feature-extraction feature-engineering
python time-series feature-extraction feature-engineering
asked yesterday
Coolio2654Coolio2654
1255
1255
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling
that will cycle over your data in a given frame-size and compute whatever you like on that block.
Let me show you an example - say we start with the following column of data:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
"B": np.random.randn(20)})
Look at the first 10 rows:
In [4]: df.head(10)
Out[4]:
A B
0 63 -0.003947
1 55 0.442597
2 6 0.684125
3 17 0.968987
4 33 -0.018640
5 50 -0.579558
6 71 0.563125
7 31 1.417384
8 8 0.607813
9 36 0.186146
We can compute the rolling average over each column and save it back to the dataframe like this:
In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
In [7]: df.head(10)
In [9]: df
Out[9]:
A B rolling_a rolling_b
0 63 -0.003947 NaN NaN
1 55 0.442597 NaN NaN
2 6 0.684125 NaN NaN
3 17 0.968987 NaN NaN
4 33 -0.018640 34.8 0.414624
5 50 -0.579558 32.2 0.299502
6 71 0.563125 35.4 0.323608
7 31 1.417384 40.4 0.470260
8 8 0.607813 38.6 0.398025
9 36 0.186146 39.2 0.438982
You might notice that the first 4 rows contain NaN
values (Not a Number). This is because the rolling()
method will let the mean()
method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling()
method that you can experiment with.
You can do the same above for single column of a large dataframe like this:
>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()
You can also apply just about any function to the rolling frame - not just mean()
.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46609%2fpython-time-series-extracting-features-on-a-rolling-window-basis%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling
that will cycle over your data in a given frame-size and compute whatever you like on that block.
Let me show you an example - say we start with the following column of data:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
"B": np.random.randn(20)})
Look at the first 10 rows:
In [4]: df.head(10)
Out[4]:
A B
0 63 -0.003947
1 55 0.442597
2 6 0.684125
3 17 0.968987
4 33 -0.018640
5 50 -0.579558
6 71 0.563125
7 31 1.417384
8 8 0.607813
9 36 0.186146
We can compute the rolling average over each column and save it back to the dataframe like this:
In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
In [7]: df.head(10)
In [9]: df
Out[9]:
A B rolling_a rolling_b
0 63 -0.003947 NaN NaN
1 55 0.442597 NaN NaN
2 6 0.684125 NaN NaN
3 17 0.968987 NaN NaN
4 33 -0.018640 34.8 0.414624
5 50 -0.579558 32.2 0.299502
6 71 0.563125 35.4 0.323608
7 31 1.417384 40.4 0.470260
8 8 0.607813 38.6 0.398025
9 36 0.186146 39.2 0.438982
You might notice that the first 4 rows contain NaN
values (Not a Number). This is because the rolling()
method will let the mean()
method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling()
method that you can experiment with.
You can do the same above for single column of a large dataframe like this:
>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()
You can also apply just about any function to the rolling frame - not just mean()
.
$endgroup$
add a comment |
$begingroup$
Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling
that will cycle over your data in a given frame-size and compute whatever you like on that block.
Let me show you an example - say we start with the following column of data:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
"B": np.random.randn(20)})
Look at the first 10 rows:
In [4]: df.head(10)
Out[4]:
A B
0 63 -0.003947
1 55 0.442597
2 6 0.684125
3 17 0.968987
4 33 -0.018640
5 50 -0.579558
6 71 0.563125
7 31 1.417384
8 8 0.607813
9 36 0.186146
We can compute the rolling average over each column and save it back to the dataframe like this:
In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
In [7]: df.head(10)
In [9]: df
Out[9]:
A B rolling_a rolling_b
0 63 -0.003947 NaN NaN
1 55 0.442597 NaN NaN
2 6 0.684125 NaN NaN
3 17 0.968987 NaN NaN
4 33 -0.018640 34.8 0.414624
5 50 -0.579558 32.2 0.299502
6 71 0.563125 35.4 0.323608
7 31 1.417384 40.4 0.470260
8 8 0.607813 38.6 0.398025
9 36 0.186146 39.2 0.438982
You might notice that the first 4 rows contain NaN
values (Not a Number). This is because the rolling()
method will let the mean()
method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling()
method that you can experiment with.
You can do the same above for single column of a large dataframe like this:
>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()
You can also apply just about any function to the rolling frame - not just mean()
.
$endgroup$
add a comment |
$begingroup$
Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling
that will cycle over your data in a given frame-size and compute whatever you like on that block.
Let me show you an example - say we start with the following column of data:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
"B": np.random.randn(20)})
Look at the first 10 rows:
In [4]: df.head(10)
Out[4]:
A B
0 63 -0.003947
1 55 0.442597
2 6 0.684125
3 17 0.968987
4 33 -0.018640
5 50 -0.579558
6 71 0.563125
7 31 1.417384
8 8 0.607813
9 36 0.186146
We can compute the rolling average over each column and save it back to the dataframe like this:
In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
In [7]: df.head(10)
In [9]: df
Out[9]:
A B rolling_a rolling_b
0 63 -0.003947 NaN NaN
1 55 0.442597 NaN NaN
2 6 0.684125 NaN NaN
3 17 0.968987 NaN NaN
4 33 -0.018640 34.8 0.414624
5 50 -0.579558 32.2 0.299502
6 71 0.563125 35.4 0.323608
7 31 1.417384 40.4 0.470260
8 8 0.607813 38.6 0.398025
9 36 0.186146 39.2 0.438982
You might notice that the first 4 rows contain NaN
values (Not a Number). This is because the rolling()
method will let the mean()
method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling()
method that you can experiment with.
You can do the same above for single column of a large dataframe like this:
>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()
You can also apply just about any function to the rolling frame - not just mean()
.
$endgroup$
Yes, there are easy ways to do this in Python. My favourite would be to put the data into a Pandas DataFrame, which has a convenient method called rolling
that will cycle over your data in a given frame-size and compute whatever you like on that block.
Let me show you an example - say we start with the following column of data:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({"A": np.random.randint(0, 100, (20,)),
"B": np.random.randn(20)})
Look at the first 10 rows:
In [4]: df.head(10)
Out[4]:
A B
0 63 -0.003947
1 55 0.442597
2 6 0.684125
3 17 0.968987
4 33 -0.018640
5 50 -0.579558
6 71 0.563125
7 31 1.417384
8 8 0.607813
9 36 0.186146
We can compute the rolling average over each column and save it back to the dataframe like this:
In [6]: df[["rolling_a", "rolling_b"]] = df.rolling(5).mean()
In [7]: df.head(10)
In [9]: df
Out[9]:
A B rolling_a rolling_b
0 63 -0.003947 NaN NaN
1 55 0.442597 NaN NaN
2 6 0.684125 NaN NaN
3 17 0.968987 NaN NaN
4 33 -0.018640 34.8 0.414624
5 50 -0.579558 32.2 0.299502
6 71 0.563125 35.4 0.323608
7 31 1.417384 40.4 0.470260
8 8 0.607813 38.6 0.398025
9 36 0.186146 39.2 0.438982
You might notice that the first 4 rows contain NaN
values (Not a Number). This is because the rolling()
method will let the mean()
method work an a window-size smaller than 5 (in our example). There are a lot of options in the rolling()
method that you can experiment with.
You can do the same above for single column of a large dataframe like this:
>>> df["rolling_some_column_name"] = df.some_column_name.rolling(5).mean()
You can also apply just about any function to the rolling frame - not just mean()
.
answered yesterday
n1k31t4n1k31t4
6,2162319
6,2162319
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46609%2fpython-time-series-extracting-features-on-a-rolling-window-basis%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown