SVR is giving same prediction for all features












1












$begingroup$


I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn



A sample row in my dataframe looks like this (2000 rows)



       Open     Close    High     Low      Volume     
0 537.40 537.10 541.55 530.47 52877.98


Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.



Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate



I was defining my features and targets as so



features = df.loc[:,df.columns != 'Closing']
targets = df.loc[:,df.columns == 'Closing']


Which would return a df looking like this
features:



       Open      High      Low      Vol from  
29 670.02 685.11 661.09 92227.36


targets:



       Close
29 674.57


However I realised that the data needs to be in a numpy array, so I now get my features and targets like this



features = df.loc[:,df.columns != 'Closing'].values
targets = df.loc[:,df.columns == 'Closing'].values


So now my features look like this



[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
6.23944806e+07]
[7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
5.41391322e+08]


and my targets look like this



[  674.57]
[ 8042.64]


I then split up my data using



X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)


I tried to follow the Scikit-Learn documentation, which resulted in the following



svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
svr_rbf.fit(X_training, y_training)
predictions = svr_rbf.predict(X_testing)
print(predictions)


I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.



[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818


I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value



I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data










share|improve this question







New contributor




Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    1












    $begingroup$


    I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn



    A sample row in my dataframe looks like this (2000 rows)



           Open     Close    High     Low      Volume     
    0 537.40 537.10 541.55 530.47 52877.98


    Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.



    Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate



    I was defining my features and targets as so



    features = df.loc[:,df.columns != 'Closing']
    targets = df.loc[:,df.columns == 'Closing']


    Which would return a df looking like this
    features:



           Open      High      Low      Vol from  
    29 670.02 685.11 661.09 92227.36


    targets:



           Close
    29 674.57


    However I realised that the data needs to be in a numpy array, so I now get my features and targets like this



    features = df.loc[:,df.columns != 'Closing'].values
    targets = df.loc[:,df.columns == 'Closing'].values


    So now my features look like this



    [6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
    6.23944806e+07]
    [7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
    5.41391322e+08]


    and my targets look like this



    [  674.57]
    [ 8042.64]


    I then split up my data using



    X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)


    I tried to follow the Scikit-Learn documentation, which resulted in the following



    svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
    svr_rbf.fit(X_training, y_training)
    predictions = svr_rbf.predict(X_testing)
    print(predictions)


    I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.



    [3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818


    I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value



    I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data










    share|improve this question







    New contributor




    Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      1












      1








      1





      $begingroup$


      I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn



      A sample row in my dataframe looks like this (2000 rows)



             Open     Close    High     Low      Volume     
      0 537.40 537.10 541.55 530.47 52877.98


      Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.



      Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate



      I was defining my features and targets as so



      features = df.loc[:,df.columns != 'Closing']
      targets = df.loc[:,df.columns == 'Closing']


      Which would return a df looking like this
      features:



             Open      High      Low      Vol from  
      29 670.02 685.11 661.09 92227.36


      targets:



             Close
      29 674.57


      However I realised that the data needs to be in a numpy array, so I now get my features and targets like this



      features = df.loc[:,df.columns != 'Closing'].values
      targets = df.loc[:,df.columns == 'Closing'].values


      So now my features look like this



      [6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
      6.23944806e+07]
      [7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
      5.41391322e+08]


      and my targets look like this



      [  674.57]
      [ 8042.64]


      I then split up my data using



      X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)


      I tried to follow the Scikit-Learn documentation, which resulted in the following



      svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
      svr_rbf.fit(X_training, y_training)
      predictions = svr_rbf.predict(X_testing)
      print(predictions)


      I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.



      [3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818


      I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value



      I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data










      share|improve this question







      New contributor




      Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn



      A sample row in my dataframe looks like this (2000 rows)



             Open     Close    High     Low      Volume     
      0 537.40 537.10 541.55 530.47 52877.98


      Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.



      Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate



      I was defining my features and targets as so



      features = df.loc[:,df.columns != 'Closing']
      targets = df.loc[:,df.columns == 'Closing']


      Which would return a df looking like this
      features:



             Open      High      Low      Vol from  
      29 670.02 685.11 661.09 92227.36


      targets:



             Close
      29 674.57


      However I realised that the data needs to be in a numpy array, so I now get my features and targets like this



      features = df.loc[:,df.columns != 'Closing'].values
      targets = df.loc[:,df.columns == 'Closing'].values


      So now my features look like this



      [6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
      6.23944806e+07]
      [7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
      5.41391322e+08]


      and my targets look like this



      [  674.57]
      [ 8042.64]


      I then split up my data using



      X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)


      I tried to follow the Scikit-Learn documentation, which resulted in the following



      svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
      svr_rbf.fit(X_training, y_training)
      predictions = svr_rbf.predict(X_testing)
      print(predictions)


      I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.



      [3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818


      I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value



      I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data







      python regression pandas numpy svr






      share|improve this question







      New contributor




      Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 13 hours ago









      Ben WilliamsBen Williams

      82




      82




      New contributor




      Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          There are a couple of parts that I changing will help.



          First a general one for all model building: I would suggest you scale your data before putting it into the model.



          It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.



          Have a look at the StandardScaler in sklean for a way how to do that.





          Next a few suggestions of things to change, specifically because you are working with stock prices:



          I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
          Given this, you would need to shift your y value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date column and you only need to shift by one timestep anyway, you can just do this:



          features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step
          targets = df.loc[1:, df.columns == 'Closing'].values # start one step later


          You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.





          Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.



          EDIT



          You should also scale y_train and y_test, so that the model knows to predict within that range. Do this using the same StandardScaler instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.



          Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
            $endgroup$
            – Ben Williams
            10 hours ago










          • $begingroup$
            @BenWilliams - check my edit in my answer
            $endgroup$
            – n1k31t4
            9 hours ago










          • $begingroup$
            Thank you very much. My prediction now looks much better and so does my graph.
            $endgroup$
            – Ben Williams
            9 hours ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46575%2fsvr-is-giving-same-prediction-for-all-features%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          There are a couple of parts that I changing will help.



          First a general one for all model building: I would suggest you scale your data before putting it into the model.



          It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.



          Have a look at the StandardScaler in sklean for a way how to do that.





          Next a few suggestions of things to change, specifically because you are working with stock prices:



          I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
          Given this, you would need to shift your y value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date column and you only need to shift by one timestep anyway, you can just do this:



          features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step
          targets = df.loc[1:, df.columns == 'Closing'].values # start one step later


          You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.





          Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.



          EDIT



          You should also scale y_train and y_test, so that the model knows to predict within that range. Do this using the same StandardScaler instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.



          Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
            $endgroup$
            – Ben Williams
            10 hours ago










          • $begingroup$
            @BenWilliams - check my edit in my answer
            $endgroup$
            – n1k31t4
            9 hours ago










          • $begingroup$
            Thank you very much. My prediction now looks much better and so does my graph.
            $endgroup$
            – Ben Williams
            9 hours ago
















          0












          $begingroup$

          There are a couple of parts that I changing will help.



          First a general one for all model building: I would suggest you scale your data before putting it into the model.



          It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.



          Have a look at the StandardScaler in sklean for a way how to do that.





          Next a few suggestions of things to change, specifically because you are working with stock prices:



          I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
          Given this, you would need to shift your y value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date column and you only need to shift by one timestep anyway, you can just do this:



          features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step
          targets = df.loc[1:, df.columns == 'Closing'].values # start one step later


          You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.





          Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.



          EDIT



          You should also scale y_train and y_test, so that the model knows to predict within that range. Do this using the same StandardScaler instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.



          Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
            $endgroup$
            – Ben Williams
            10 hours ago










          • $begingroup$
            @BenWilliams - check my edit in my answer
            $endgroup$
            – n1k31t4
            9 hours ago










          • $begingroup$
            Thank you very much. My prediction now looks much better and so does my graph.
            $endgroup$
            – Ben Williams
            9 hours ago














          0












          0








          0





          $begingroup$

          There are a couple of parts that I changing will help.



          First a general one for all model building: I would suggest you scale your data before putting it into the model.



          It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.



          Have a look at the StandardScaler in sklean for a way how to do that.





          Next a few suggestions of things to change, specifically because you are working with stock prices:



          I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
          Given this, you would need to shift your y value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date column and you only need to shift by one timestep anyway, you can just do this:



          features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step
          targets = df.loc[1:, df.columns == 'Closing'].values # start one step later


          You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.





          Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.



          EDIT



          You should also scale y_train and y_test, so that the model knows to predict within that range. Do this using the same StandardScaler instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.



          Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.






          share|improve this answer











          $endgroup$



          There are a couple of parts that I changing will help.



          First a general one for all model building: I would suggest you scale your data before putting it into the model.



          It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.



          Have a look at the StandardScaler in sklean for a way how to do that.





          Next a few suggestions of things to change, specifically because you are working with stock prices:



          I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
          Given this, you would need to shift your y value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date column and you only need to shift by one timestep anyway, you can just do this:



          features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step
          targets = df.loc[1:, df.columns == 'Closing'].values # start one step later


          You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.





          Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.



          EDIT



          You should also scale y_train and y_test, so that the model knows to predict within that range. Do this using the same StandardScaler instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.



          Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 9 hours ago

























          answered 12 hours ago









          n1k31t4n1k31t4

          6,1662319




          6,1662319












          • $begingroup$
            Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
            $endgroup$
            – Ben Williams
            10 hours ago










          • $begingroup$
            @BenWilliams - check my edit in my answer
            $endgroup$
            – n1k31t4
            9 hours ago










          • $begingroup$
            Thank you very much. My prediction now looks much better and so does my graph.
            $endgroup$
            – Ben Williams
            9 hours ago


















          • $begingroup$
            Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
            $endgroup$
            – Ben Williams
            10 hours ago










          • $begingroup$
            @BenWilliams - check my edit in my answer
            $endgroup$
            – n1k31t4
            9 hours ago










          • $begingroup$
            Thank you very much. My prediction now looks much better and so does my graph.
            $endgroup$
            – Ben Williams
            9 hours ago
















          $begingroup$
          Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
          $endgroup$
          – Ben Williams
          10 hours ago




          $begingroup$
          Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
          $endgroup$
          – Ben Williams
          10 hours ago












          $begingroup$
          @BenWilliams - check my edit in my answer
          $endgroup$
          – n1k31t4
          9 hours ago




          $begingroup$
          @BenWilliams - check my edit in my answer
          $endgroup$
          – n1k31t4
          9 hours ago












          $begingroup$
          Thank you very much. My prediction now looks much better and so does my graph.
          $endgroup$
          – Ben Williams
          9 hours ago




          $begingroup$
          Thank you very much. My prediction now looks much better and so does my graph.
          $endgroup$
          – Ben Williams
          9 hours ago










          Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.













          Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.












          Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46575%2fsvr-is-giving-same-prediction-for-all-features%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Tabula Rosettana

          Aureus (color)