Tuning a sequence to sequence model
$begingroup$
I have written a variable length sequence to seqeunce autoencoder in keras using this tutorial as a guideline: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html. The idea to apply it to the problem of time series forecasting came from the winning entry of this kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting
The model is trained on 1831 different time series of products but they all come from the same product category. Each series has a minimum length of 156 data points and a maximum of 208 (So 3-4 years of historical data). The data points in each series are upscaled daily data aggregated to the Monday of each week. Some of the series are intermittent some are not
The forecast horizon for each time series is 26 points into the future. The data is log(x + 1) transformed and then normalised by subtracting the mean of the dataset and then multiplying by the standard deviation of the dataset.
The features fed into the model are:
- The time series
- Lags of the the time series using 13, 26, 13, & 52
- A yearly autocorrelation scaler per series which is tiled to be the length of each time series
- A quarterly autocorrelation scaler per series which is tiled to be the length of each time series
- A product popularity scaler per series which is tiled to the length of each time series.
The accuracy of the model is evaluated with two separate measures MASE and a variant of SMAPE:
Both of these metrics should be as close to zero as possible. The forecasts need to be made per series and thus I determine overall success by taking the median of all the MASE scores and calculating one SMAPE3 score by taking all the forecast and actual values and computing them in one sum for all the series instead of taking an average per series.
My model has a single encoder layer with 64 neurons and a single decoder layer also with 64 neurons. It uses Adam optimiser using all keras defaults and the mean absolute error as the loss function, as I believe this most closely reflects the error metrics it is evaluated with eventually. The model doesn't perform as well as I want it to and I am looking for ways to imporve it. Currently as measured on my validation set I have the two following scores:
Vol SMAPE3: 0.46 and MASE median: 0.75
The model is trained over 100 epochs with a batch size of 256. Below is the loss and some example plots of series and predictions the model makes:
To me the model doesn't seem to capture the variation in the series very well and almost looks like a linear learning. It appears to only really capture the level of the different series well. To improve it I would like to try and add the ability for the model to capture more of the variations in the series and predictions rather than the smooth kind of predictions it gives at the moment. Are there any suggestions for how to achieve this?
python keras time-series regression forecast
$endgroup$
add a comment |
$begingroup$
I have written a variable length sequence to seqeunce autoencoder in keras using this tutorial as a guideline: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html. The idea to apply it to the problem of time series forecasting came from the winning entry of this kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting
The model is trained on 1831 different time series of products but they all come from the same product category. Each series has a minimum length of 156 data points and a maximum of 208 (So 3-4 years of historical data). The data points in each series are upscaled daily data aggregated to the Monday of each week. Some of the series are intermittent some are not
The forecast horizon for each time series is 26 points into the future. The data is log(x + 1) transformed and then normalised by subtracting the mean of the dataset and then multiplying by the standard deviation of the dataset.
The features fed into the model are:
- The time series
- Lags of the the time series using 13, 26, 13, & 52
- A yearly autocorrelation scaler per series which is tiled to be the length of each time series
- A quarterly autocorrelation scaler per series which is tiled to be the length of each time series
- A product popularity scaler per series which is tiled to the length of each time series.
The accuracy of the model is evaluated with two separate measures MASE and a variant of SMAPE:
Both of these metrics should be as close to zero as possible. The forecasts need to be made per series and thus I determine overall success by taking the median of all the MASE scores and calculating one SMAPE3 score by taking all the forecast and actual values and computing them in one sum for all the series instead of taking an average per series.
My model has a single encoder layer with 64 neurons and a single decoder layer also with 64 neurons. It uses Adam optimiser using all keras defaults and the mean absolute error as the loss function, as I believe this most closely reflects the error metrics it is evaluated with eventually. The model doesn't perform as well as I want it to and I am looking for ways to imporve it. Currently as measured on my validation set I have the two following scores:
Vol SMAPE3: 0.46 and MASE median: 0.75
The model is trained over 100 epochs with a batch size of 256. Below is the loss and some example plots of series and predictions the model makes:
To me the model doesn't seem to capture the variation in the series very well and almost looks like a linear learning. It appears to only really capture the level of the different series well. To improve it I would like to try and add the ability for the model to capture more of the variations in the series and predictions rather than the smooth kind of predictions it gives at the moment. Are there any suggestions for how to achieve this?
python keras time-series regression forecast
$endgroup$
add a comment |
$begingroup$
I have written a variable length sequence to seqeunce autoencoder in keras using this tutorial as a guideline: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html. The idea to apply it to the problem of time series forecasting came from the winning entry of this kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting
The model is trained on 1831 different time series of products but they all come from the same product category. Each series has a minimum length of 156 data points and a maximum of 208 (So 3-4 years of historical data). The data points in each series are upscaled daily data aggregated to the Monday of each week. Some of the series are intermittent some are not
The forecast horizon for each time series is 26 points into the future. The data is log(x + 1) transformed and then normalised by subtracting the mean of the dataset and then multiplying by the standard deviation of the dataset.
The features fed into the model are:
- The time series
- Lags of the the time series using 13, 26, 13, & 52
- A yearly autocorrelation scaler per series which is tiled to be the length of each time series
- A quarterly autocorrelation scaler per series which is tiled to be the length of each time series
- A product popularity scaler per series which is tiled to the length of each time series.
The accuracy of the model is evaluated with two separate measures MASE and a variant of SMAPE:
Both of these metrics should be as close to zero as possible. The forecasts need to be made per series and thus I determine overall success by taking the median of all the MASE scores and calculating one SMAPE3 score by taking all the forecast and actual values and computing them in one sum for all the series instead of taking an average per series.
My model has a single encoder layer with 64 neurons and a single decoder layer also with 64 neurons. It uses Adam optimiser using all keras defaults and the mean absolute error as the loss function, as I believe this most closely reflects the error metrics it is evaluated with eventually. The model doesn't perform as well as I want it to and I am looking for ways to imporve it. Currently as measured on my validation set I have the two following scores:
Vol SMAPE3: 0.46 and MASE median: 0.75
The model is trained over 100 epochs with a batch size of 256. Below is the loss and some example plots of series and predictions the model makes:
To me the model doesn't seem to capture the variation in the series very well and almost looks like a linear learning. It appears to only really capture the level of the different series well. To improve it I would like to try and add the ability for the model to capture more of the variations in the series and predictions rather than the smooth kind of predictions it gives at the moment. Are there any suggestions for how to achieve this?
python keras time-series regression forecast
$endgroup$
I have written a variable length sequence to seqeunce autoencoder in keras using this tutorial as a guideline: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html. The idea to apply it to the problem of time series forecasting came from the winning entry of this kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting
The model is trained on 1831 different time series of products but they all come from the same product category. Each series has a minimum length of 156 data points and a maximum of 208 (So 3-4 years of historical data). The data points in each series are upscaled daily data aggregated to the Monday of each week. Some of the series are intermittent some are not
The forecast horizon for each time series is 26 points into the future. The data is log(x + 1) transformed and then normalised by subtracting the mean of the dataset and then multiplying by the standard deviation of the dataset.
The features fed into the model are:
- The time series
- Lags of the the time series using 13, 26, 13, & 52
- A yearly autocorrelation scaler per series which is tiled to be the length of each time series
- A quarterly autocorrelation scaler per series which is tiled to be the length of each time series
- A product popularity scaler per series which is tiled to the length of each time series.
The accuracy of the model is evaluated with two separate measures MASE and a variant of SMAPE:
Both of these metrics should be as close to zero as possible. The forecasts need to be made per series and thus I determine overall success by taking the median of all the MASE scores and calculating one SMAPE3 score by taking all the forecast and actual values and computing them in one sum for all the series instead of taking an average per series.
My model has a single encoder layer with 64 neurons and a single decoder layer also with 64 neurons. It uses Adam optimiser using all keras defaults and the mean absolute error as the loss function, as I believe this most closely reflects the error metrics it is evaluated with eventually. The model doesn't perform as well as I want it to and I am looking for ways to imporve it. Currently as measured on my validation set I have the two following scores:
Vol SMAPE3: 0.46 and MASE median: 0.75
The model is trained over 100 epochs with a batch size of 256. Below is the loss and some example plots of series and predictions the model makes:
To me the model doesn't seem to capture the variation in the series very well and almost looks like a linear learning. It appears to only really capture the level of the different series well. To improve it I would like to try and add the ability for the model to capture more of the variations in the series and predictions rather than the smooth kind of predictions it gives at the moment. Are there any suggestions for how to achieve this?
python keras time-series regression forecast
python keras time-series regression forecast
asked 3 mins ago
AesirAesir
1085
1085
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44565%2ftuning-a-sequence-to-sequence-model%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44565%2ftuning-a-sequence-to-sequence-model%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown