Feature selection for time series prediction












3












$begingroup$


I'm working on an LSTM-based stock market forecasting problem and trying to figure out a way to select input variables.




  1. When calculating correlation between variables (e.g. Close price of Tesla vs Close price of Microsoft), would differentiating the curves give a more accurate (or correct) correlation index ? I'm finding values in the range 0.7-0.9 for non-differentiated variables, and lower values after differentiation.


  2. Once I have a correlation matrix of all my variables, is there a way to figure out which ones would add information to the neural net and which ones would just add noise ?











share|improve this question









$endgroup$




bumped to the homepage by Community yesterday


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.




















    3












    $begingroup$


    I'm working on an LSTM-based stock market forecasting problem and trying to figure out a way to select input variables.




    1. When calculating correlation between variables (e.g. Close price of Tesla vs Close price of Microsoft), would differentiating the curves give a more accurate (or correct) correlation index ? I'm finding values in the range 0.7-0.9 for non-differentiated variables, and lower values after differentiation.


    2. Once I have a correlation matrix of all my variables, is there a way to figure out which ones would add information to the neural net and which ones would just add noise ?











    share|improve this question









    $endgroup$




    bumped to the homepage by Community yesterday


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.


















      3












      3








      3


      1



      $begingroup$


      I'm working on an LSTM-based stock market forecasting problem and trying to figure out a way to select input variables.




      1. When calculating correlation between variables (e.g. Close price of Tesla vs Close price of Microsoft), would differentiating the curves give a more accurate (or correct) correlation index ? I'm finding values in the range 0.7-0.9 for non-differentiated variables, and lower values after differentiation.


      2. Once I have a correlation matrix of all my variables, is there a way to figure out which ones would add information to the neural net and which ones would just add noise ?











      share|improve this question









      $endgroup$




      I'm working on an LSTM-based stock market forecasting problem and trying to figure out a way to select input variables.




      1. When calculating correlation between variables (e.g. Close price of Tesla vs Close price of Microsoft), would differentiating the curves give a more accurate (or correct) correlation index ? I'm finding values in the range 0.7-0.9 for non-differentiated variables, and lower values after differentiation.


      2. Once I have a correlation matrix of all my variables, is there a way to figure out which ones would add information to the neural net and which ones would just add noise ?








      time-series feature-selection correlation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Aug 12 '18 at 14:07









      MOffMOff

      161




      161





      bumped to the homepage by Community yesterday


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community yesterday


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
























          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          You don’t need to select variables for feeding to network, deep neural networks (DNN) will do this automatically. Actually DNN gives more importance to relevant variables by setting its weights. After setting the weights, some of the hidden nodes take 0 and some of them take 1 (because of sigmoid function). You can think of this 1 and 0’s as choosing relevant variables, too.



          By the way, correlation matrix can not be used to select relevant variables directly. If you want to reduce the number of variables that are fed to DNN, you can use PCA. Actually PCA components are calculated by getting the Eigen-vectors of correlation matrix.






          share|improve this answer











          $endgroup$














            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36823%2ffeature-selection-for-time-series-prediction%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            You don’t need to select variables for feeding to network, deep neural networks (DNN) will do this automatically. Actually DNN gives more importance to relevant variables by setting its weights. After setting the weights, some of the hidden nodes take 0 and some of them take 1 (because of sigmoid function). You can think of this 1 and 0’s as choosing relevant variables, too.



            By the way, correlation matrix can not be used to select relevant variables directly. If you want to reduce the number of variables that are fed to DNN, you can use PCA. Actually PCA components are calculated by getting the Eigen-vectors of correlation matrix.






            share|improve this answer











            $endgroup$


















              0












              $begingroup$

              You don’t need to select variables for feeding to network, deep neural networks (DNN) will do this automatically. Actually DNN gives more importance to relevant variables by setting its weights. After setting the weights, some of the hidden nodes take 0 and some of them take 1 (because of sigmoid function). You can think of this 1 and 0’s as choosing relevant variables, too.



              By the way, correlation matrix can not be used to select relevant variables directly. If you want to reduce the number of variables that are fed to DNN, you can use PCA. Actually PCA components are calculated by getting the Eigen-vectors of correlation matrix.






              share|improve this answer











              $endgroup$
















                0












                0








                0





                $begingroup$

                You don’t need to select variables for feeding to network, deep neural networks (DNN) will do this automatically. Actually DNN gives more importance to relevant variables by setting its weights. After setting the weights, some of the hidden nodes take 0 and some of them take 1 (because of sigmoid function). You can think of this 1 and 0’s as choosing relevant variables, too.



                By the way, correlation matrix can not be used to select relevant variables directly. If you want to reduce the number of variables that are fed to DNN, you can use PCA. Actually PCA components are calculated by getting the Eigen-vectors of correlation matrix.






                share|improve this answer











                $endgroup$



                You don’t need to select variables for feeding to network, deep neural networks (DNN) will do this automatically. Actually DNN gives more importance to relevant variables by setting its weights. After setting the weights, some of the hidden nodes take 0 and some of them take 1 (because of sigmoid function). You can think of this 1 and 0’s as choosing relevant variables, too.



                By the way, correlation matrix can not be used to select relevant variables directly. If you want to reduce the number of variables that are fed to DNN, you can use PCA. Actually PCA components are calculated by getting the Eigen-vectors of correlation matrix.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Aug 12 '18 at 16:42

























                answered Aug 12 '18 at 16:37









                pythinkerpythinker

                7581212




                7581212






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36823%2ffeature-selection-for-time-series-prediction%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to label and detect the document text images

                    Tabula Rosettana

                    Aureus (color)