SVR is giving same prediction for all features
$begingroup$
I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn
A sample row in my dataframe looks like this (2000 rows)
Open Close High Low Volume
0 537.40 537.10 541.55 530.47 52877.98
Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.
Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate
I was defining my features and targets as so
features = df.loc[:,df.columns != 'Closing']
targets = df.loc[:,df.columns == 'Closing']
Which would return a df looking like this
features:
Open High Low Vol from
29 670.02 685.11 661.09 92227.36
targets:
Close
29 674.57
However I realised that the data needs to be in a numpy array, so I now get my features and targets like this
features = df.loc[:,df.columns != 'Closing'].values
targets = df.loc[:,df.columns == 'Closing'].values
So now my features look like this
[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
6.23944806e+07]
[7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
5.41391322e+08]
and my targets look like this
[ 674.57]
[ 8042.64]
I then split up my data using
X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)
I tried to follow the Scikit-Learn documentation, which resulted in the following
svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
svr_rbf.fit(X_training, y_training)
predictions = svr_rbf.predict(X_testing)
print(predictions)
I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.
[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818
I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value
I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data
python regression pandas numpy svr
New contributor
$endgroup$
add a comment |
$begingroup$
I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn
A sample row in my dataframe looks like this (2000 rows)
Open Close High Low Volume
0 537.40 537.10 541.55 530.47 52877.98
Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.
Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate
I was defining my features and targets as so
features = df.loc[:,df.columns != 'Closing']
targets = df.loc[:,df.columns == 'Closing']
Which would return a df looking like this
features:
Open High Low Vol from
29 670.02 685.11 661.09 92227.36
targets:
Close
29 674.57
However I realised that the data needs to be in a numpy array, so I now get my features and targets like this
features = df.loc[:,df.columns != 'Closing'].values
targets = df.loc[:,df.columns == 'Closing'].values
So now my features look like this
[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
6.23944806e+07]
[7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
5.41391322e+08]
and my targets look like this
[ 674.57]
[ 8042.64]
I then split up my data using
X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)
I tried to follow the Scikit-Learn documentation, which resulted in the following
svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
svr_rbf.fit(X_training, y_training)
predictions = svr_rbf.predict(X_testing)
print(predictions)
I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.
[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818
I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value
I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data
python regression pandas numpy svr
New contributor
$endgroup$
add a comment |
$begingroup$
I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn
A sample row in my dataframe looks like this (2000 rows)
Open Close High Low Volume
0 537.40 537.10 541.55 530.47 52877.98
Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.
Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate
I was defining my features and targets as so
features = df.loc[:,df.columns != 'Closing']
targets = df.loc[:,df.columns == 'Closing']
Which would return a df looking like this
features:
Open High Low Vol from
29 670.02 685.11 661.09 92227.36
targets:
Close
29 674.57
However I realised that the data needs to be in a numpy array, so I now get my features and targets like this
features = df.loc[:,df.columns != 'Closing'].values
targets = df.loc[:,df.columns == 'Closing'].values
So now my features look like this
[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
6.23944806e+07]
[7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
5.41391322e+08]
and my targets look like this
[ 674.57]
[ 8042.64]
I then split up my data using
X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)
I tried to follow the Scikit-Learn documentation, which resulted in the following
svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
svr_rbf.fit(X_training, y_training)
predictions = svr_rbf.predict(X_testing)
print(predictions)
I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.
[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818
I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value
I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data
python regression pandas numpy svr
New contributor
$endgroup$
I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn
A sample row in my dataframe looks like this (2000 rows)
Open Close High Low Volume
0 537.40 537.10 541.55 530.47 52877.98
Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.
Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate
I was defining my features and targets as so
features = df.loc[:,df.columns != 'Closing']
targets = df.loc[:,df.columns == 'Closing']
Which would return a df looking like this
features:
Open High Low Vol from
29 670.02 685.11 661.09 92227.36
targets:
Close
29 674.57
However I realised that the data needs to be in a numpy array, so I now get my features and targets like this
features = df.loc[:,df.columns != 'Closing'].values
targets = df.loc[:,df.columns == 'Closing'].values
So now my features look like this
[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
6.23944806e+07]
[7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
5.41391322e+08]
and my targets look like this
[ 674.57]
[ 8042.64]
I then split up my data using
X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)
I tried to follow the Scikit-Learn documentation, which resulted in the following
svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
svr_rbf.fit(X_training, y_training)
predictions = svr_rbf.predict(X_testing)
print(predictions)
I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.
[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818
I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value
I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data
python regression pandas numpy svr
python regression pandas numpy svr
New contributor
New contributor
New contributor
asked 13 hours ago
Ben WilliamsBen Williams
82
82
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
There are a couple of parts that I changing will help.
First a general one for all model building: I would suggest you scale your data before putting it into the model.
It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.
Have a look at the StandardScaler in sklean for a way how to do that.
Next a few suggestions of things to change, specifically because you are working with stock prices:
I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
Given this, you would need to shift your y
value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date
column and you only need to shift by one timestep anyway, you can just do this:
features = df.loc[:-1, df.columns != 'Closing'].values # leave out last step
targets = df.loc[1:, df.columns == 'Closing'].values # start one step later
You could then even then predict the opening price of the following day, or keep closing
data in the features
data, as that would not introduce temporal bias.
Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.
EDIT
You should also scale y_train
and y_test
, so that the model knows to predict within that range. Do this using the same StandardScaler
instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]
). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.
Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler
on the test data.
$endgroup$
$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago
$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago
$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46575%2fsvr-is-giving-same-prediction-for-all-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
There are a couple of parts that I changing will help.
First a general one for all model building: I would suggest you scale your data before putting it into the model.
It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.
Have a look at the StandardScaler in sklean for a way how to do that.
Next a few suggestions of things to change, specifically because you are working with stock prices:
I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
Given this, you would need to shift your y
value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date
column and you only need to shift by one timestep anyway, you can just do this:
features = df.loc[:-1, df.columns != 'Closing'].values # leave out last step
targets = df.loc[1:, df.columns == 'Closing'].values # start one step later
You could then even then predict the opening price of the following day, or keep closing
data in the features
data, as that would not introduce temporal bias.
Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.
EDIT
You should also scale y_train
and y_test
, so that the model knows to predict within that range. Do this using the same StandardScaler
instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]
). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.
Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler
on the test data.
$endgroup$
$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago
$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago
$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago
add a comment |
$begingroup$
There are a couple of parts that I changing will help.
First a general one for all model building: I would suggest you scale your data before putting it into the model.
It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.
Have a look at the StandardScaler in sklean for a way how to do that.
Next a few suggestions of things to change, specifically because you are working with stock prices:
I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
Given this, you would need to shift your y
value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date
column and you only need to shift by one timestep anyway, you can just do this:
features = df.loc[:-1, df.columns != 'Closing'].values # leave out last step
targets = df.loc[1:, df.columns == 'Closing'].values # start one step later
You could then even then predict the opening price of the following day, or keep closing
data in the features
data, as that would not introduce temporal bias.
Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.
EDIT
You should also scale y_train
and y_test
, so that the model knows to predict within that range. Do this using the same StandardScaler
instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]
). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.
Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler
on the test data.
$endgroup$
$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago
$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago
$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago
add a comment |
$begingroup$
There are a couple of parts that I changing will help.
First a general one for all model building: I would suggest you scale your data before putting it into the model.
It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.
Have a look at the StandardScaler in sklean for a way how to do that.
Next a few suggestions of things to change, specifically because you are working with stock prices:
I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
Given this, you would need to shift your y
value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date
column and you only need to shift by one timestep anyway, you can just do this:
features = df.loc[:-1, df.columns != 'Closing'].values # leave out last step
targets = df.loc[1:, df.columns == 'Closing'].values # start one step later
You could then even then predict the opening price of the following day, or keep closing
data in the features
data, as that would not introduce temporal bias.
Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.
EDIT
You should also scale y_train
and y_test
, so that the model knows to predict within that range. Do this using the same StandardScaler
instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]
). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.
Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler
on the test data.
$endgroup$
There are a couple of parts that I changing will help.
First a general one for all model building: I would suggest you scale your data before putting it into the model.
It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.
Have a look at the StandardScaler in sklean for a way how to do that.
Next a few suggestions of things to change, specifically because you are working with stock prices:
I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
Given this, you would need to shift your y
value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date
column and you only need to shift by one timestep anyway, you can just do this:
features = df.loc[:-1, df.columns != 'Closing'].values # leave out last step
targets = df.loc[1:, df.columns == 'Closing'].values # start one step later
You could then even then predict the opening price of the following day, or keep closing
data in the features
data, as that would not introduce temporal bias.
Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.
EDIT
You should also scale y_train
and y_test
, so that the model knows to predict within that range. Do this using the same StandardScaler
instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]
). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.
Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler
on the test data.
edited 9 hours ago
answered 12 hours ago
n1k31t4n1k31t4
6,1662319
6,1662319
$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago
$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago
$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago
add a comment |
$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago
$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago
$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago
$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago
$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago
$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago
$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago
$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago
$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago
add a comment |
Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.
Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.
Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.
Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46575%2fsvr-is-giving-same-prediction-for-all-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown