Tuning a sequence to sequence model












0












$begingroup$


I have written a variable length sequence to seqeunce autoencoder in keras using this tutorial as a guideline: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html. The idea to apply it to the problem of time series forecasting came from the winning entry of this kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting



The model is trained on 1831 different time series of products but they all come from the same product category. Each series has a minimum length of 156 data points and a maximum of 208 (So 3-4 years of historical data). The data points in each series are upscaled daily data aggregated to the Monday of each week. Some of the series are intermittent some are not



The forecast horizon for each time series is 26 points into the future. The data is log(x + 1) transformed and then normalised by subtracting the mean of the dataset and then multiplying by the standard deviation of the dataset.



The features fed into the model are:




  1. The time series

  2. Lags of the the time series using 13, 26, 13, & 52

  3. A yearly autocorrelation scaler per series which is tiled to be the length of each time series

  4. A quarterly autocorrelation scaler per series which is tiled to be the length of each time series

  5. A product popularity scaler per series which is tiled to the length of each time series.


The accuracy of the model is evaluated with two separate measures MASE and a variant of SMAPE:



SMAPE wikipedia



Both of these metrics should be as close to zero as possible. The forecasts need to be made per series and thus I determine overall success by taking the median of all the MASE scores and calculating one SMAPE3 score by taking all the forecast and actual values and computing them in one sum for all the series instead of taking an average per series.



My model has a single encoder layer with 64 neurons and a single decoder layer also with 64 neurons. It uses Adam optimiser using all keras defaults and the mean absolute error as the loss function, as I believe this most closely reflects the error metrics it is evaluated with eventually. The model doesn't perform as well as I want it to and I am looking for ways to imporve it. Currently as measured on my validation set I have the two following scores:



Vol SMAPE3: 0.46 and MASE median: 0.75



The model is trained over 100 epochs with a batch size of 256. Below is the loss and some example plots of series and predictions the model makes:



loss



best pred



worst pred



To me the model doesn't seem to capture the variation in the series very well and almost looks like a linear learning. It appears to only really capture the level of the different series well. To improve it I would like to try and add the ability for the model to capture more of the variations in the series and predictions rather than the smooth kind of predictions it gives at the moment. Are there any suggestions for how to achieve this?









share









$endgroup$

















    0












    $begingroup$


    I have written a variable length sequence to seqeunce autoencoder in keras using this tutorial as a guideline: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html. The idea to apply it to the problem of time series forecasting came from the winning entry of this kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting



    The model is trained on 1831 different time series of products but they all come from the same product category. Each series has a minimum length of 156 data points and a maximum of 208 (So 3-4 years of historical data). The data points in each series are upscaled daily data aggregated to the Monday of each week. Some of the series are intermittent some are not



    The forecast horizon for each time series is 26 points into the future. The data is log(x + 1) transformed and then normalised by subtracting the mean of the dataset and then multiplying by the standard deviation of the dataset.



    The features fed into the model are:




    1. The time series

    2. Lags of the the time series using 13, 26, 13, & 52

    3. A yearly autocorrelation scaler per series which is tiled to be the length of each time series

    4. A quarterly autocorrelation scaler per series which is tiled to be the length of each time series

    5. A product popularity scaler per series which is tiled to the length of each time series.


    The accuracy of the model is evaluated with two separate measures MASE and a variant of SMAPE:



    SMAPE wikipedia



    Both of these metrics should be as close to zero as possible. The forecasts need to be made per series and thus I determine overall success by taking the median of all the MASE scores and calculating one SMAPE3 score by taking all the forecast and actual values and computing them in one sum for all the series instead of taking an average per series.



    My model has a single encoder layer with 64 neurons and a single decoder layer also with 64 neurons. It uses Adam optimiser using all keras defaults and the mean absolute error as the loss function, as I believe this most closely reflects the error metrics it is evaluated with eventually. The model doesn't perform as well as I want it to and I am looking for ways to imporve it. Currently as measured on my validation set I have the two following scores:



    Vol SMAPE3: 0.46 and MASE median: 0.75



    The model is trained over 100 epochs with a batch size of 256. Below is the loss and some example plots of series and predictions the model makes:



    loss



    best pred



    worst pred



    To me the model doesn't seem to capture the variation in the series very well and almost looks like a linear learning. It appears to only really capture the level of the different series well. To improve it I would like to try and add the ability for the model to capture more of the variations in the series and predictions rather than the smooth kind of predictions it gives at the moment. Are there any suggestions for how to achieve this?









    share









    $endgroup$















      0












      0








      0





      $begingroup$


      I have written a variable length sequence to seqeunce autoencoder in keras using this tutorial as a guideline: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html. The idea to apply it to the problem of time series forecasting came from the winning entry of this kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting



      The model is trained on 1831 different time series of products but they all come from the same product category. Each series has a minimum length of 156 data points and a maximum of 208 (So 3-4 years of historical data). The data points in each series are upscaled daily data aggregated to the Monday of each week. Some of the series are intermittent some are not



      The forecast horizon for each time series is 26 points into the future. The data is log(x + 1) transformed and then normalised by subtracting the mean of the dataset and then multiplying by the standard deviation of the dataset.



      The features fed into the model are:




      1. The time series

      2. Lags of the the time series using 13, 26, 13, & 52

      3. A yearly autocorrelation scaler per series which is tiled to be the length of each time series

      4. A quarterly autocorrelation scaler per series which is tiled to be the length of each time series

      5. A product popularity scaler per series which is tiled to the length of each time series.


      The accuracy of the model is evaluated with two separate measures MASE and a variant of SMAPE:



      SMAPE wikipedia



      Both of these metrics should be as close to zero as possible. The forecasts need to be made per series and thus I determine overall success by taking the median of all the MASE scores and calculating one SMAPE3 score by taking all the forecast and actual values and computing them in one sum for all the series instead of taking an average per series.



      My model has a single encoder layer with 64 neurons and a single decoder layer also with 64 neurons. It uses Adam optimiser using all keras defaults and the mean absolute error as the loss function, as I believe this most closely reflects the error metrics it is evaluated with eventually. The model doesn't perform as well as I want it to and I am looking for ways to imporve it. Currently as measured on my validation set I have the two following scores:



      Vol SMAPE3: 0.46 and MASE median: 0.75



      The model is trained over 100 epochs with a batch size of 256. Below is the loss and some example plots of series and predictions the model makes:



      loss



      best pred



      worst pred



      To me the model doesn't seem to capture the variation in the series very well and almost looks like a linear learning. It appears to only really capture the level of the different series well. To improve it I would like to try and add the ability for the model to capture more of the variations in the series and predictions rather than the smooth kind of predictions it gives at the moment. Are there any suggestions for how to achieve this?









      share









      $endgroup$




      I have written a variable length sequence to seqeunce autoencoder in keras using this tutorial as a guideline: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html. The idea to apply it to the problem of time series forecasting came from the winning entry of this kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting



      The model is trained on 1831 different time series of products but they all come from the same product category. Each series has a minimum length of 156 data points and a maximum of 208 (So 3-4 years of historical data). The data points in each series are upscaled daily data aggregated to the Monday of each week. Some of the series are intermittent some are not



      The forecast horizon for each time series is 26 points into the future. The data is log(x + 1) transformed and then normalised by subtracting the mean of the dataset and then multiplying by the standard deviation of the dataset.



      The features fed into the model are:




      1. The time series

      2. Lags of the the time series using 13, 26, 13, & 52

      3. A yearly autocorrelation scaler per series which is tiled to be the length of each time series

      4. A quarterly autocorrelation scaler per series which is tiled to be the length of each time series

      5. A product popularity scaler per series which is tiled to the length of each time series.


      The accuracy of the model is evaluated with two separate measures MASE and a variant of SMAPE:



      SMAPE wikipedia



      Both of these metrics should be as close to zero as possible. The forecasts need to be made per series and thus I determine overall success by taking the median of all the MASE scores and calculating one SMAPE3 score by taking all the forecast and actual values and computing them in one sum for all the series instead of taking an average per series.



      My model has a single encoder layer with 64 neurons and a single decoder layer also with 64 neurons. It uses Adam optimiser using all keras defaults and the mean absolute error as the loss function, as I believe this most closely reflects the error metrics it is evaluated with eventually. The model doesn't perform as well as I want it to and I am looking for ways to imporve it. Currently as measured on my validation set I have the two following scores:



      Vol SMAPE3: 0.46 and MASE median: 0.75



      The model is trained over 100 epochs with a batch size of 256. Below is the loss and some example plots of series and predictions the model makes:



      loss



      best pred



      worst pred



      To me the model doesn't seem to capture the variation in the series very well and almost looks like a linear learning. It appears to only really capture the level of the different series well. To improve it I would like to try and add the ability for the model to capture more of the variations in the series and predictions rather than the smooth kind of predictions it gives at the moment. Are there any suggestions for how to achieve this?







      python keras time-series regression forecast





      share












      share










      share



      share










      asked 3 mins ago









      AesirAesir

      1085




      1085






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44565%2ftuning-a-sequence-to-sequence-model%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44565%2ftuning-a-sequence-to-sequence-model%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Tabula Rosettana

          Aureus (color)