SVR is giving same prediction for all features

I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn

A sample row in my dataframe looks like this (2000 rows)

       Open     Close    High     Low      Volume     

0      537.40   537.10   541.55   530.47   52877.98

Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.

Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate

I was defining my features and targets as so

features = df.loc[:,df.columns != 'Closing']

targets = df.loc[:,df.columns  == 'Closing']

Which would return a df looking like this
features:

       Open      High      Low      Vol from  

29     670.02    685.11    661.09   92227.36

targets:

       Close

29     674.57

However I realised that the data needs to be in a numpy array, so I now get my features and targets like this

features = df.loc[:,df.columns != 'Closing'].values

targets = df.loc[:,df.columns  == 'Closing'].values

So now my features look like this

[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04

  6.23944806e+07]

 [7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04

  5.41391322e+08]

and my targets look like this

[  674.57]

[ 8042.64]

I then split up my data using

X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)

I tried to follow the Scikit-Learn documentation, which resulted in the following

svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )

svr_rbf.fit(X_training, y_training)

predictions = svr_rbf.predict(X_testing)

print(predictions)

I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.

[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818

I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value

I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data

asked 13 hours ago

Ben Williams

New contributor

add a comment |

I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn

A sample row in my dataframe looks like this (2000 rows)

       Open     Close    High     Low      Volume     

0      537.40   537.10   541.55   530.47   52877.98

Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.

Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate

I was defining my features and targets as so

features = df.loc[:,df.columns != 'Closing']

targets = df.loc[:,df.columns  == 'Closing']

Which would return a df looking like this
features:

       Open      High      Low      Vol from  

29     670.02    685.11    661.09   92227.36

targets:

       Close

29     674.57

However I realised that the data needs to be in a numpy array, so I now get my features and targets like this

features = df.loc[:,df.columns != 'Closing'].values

targets = df.loc[:,df.columns  == 'Closing'].values

So now my features look like this

[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04

  6.23944806e+07]

 [7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04

  5.41391322e+08]

and my targets look like this

[  674.57]

[ 8042.64]

I then split up my data using

X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)

I tried to follow the Scikit-Learn documentation, which resulted in the following

svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )

svr_rbf.fit(X_training, y_training)

predictions = svr_rbf.predict(X_testing)

print(predictions)

[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818

I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value

I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data

asked 13 hours ago

Ben Williams

New contributor

add a comment |

I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn

A sample row in my dataframe looks like this (2000 rows)

       Open     Close    High     Low      Volume     

0      537.40   537.10   541.55   530.47   52877.98

Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.

Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate

I was defining my features and targets as so

features = df.loc[:,df.columns != 'Closing']

targets = df.loc[:,df.columns  == 'Closing']

Which would return a df looking like this
features:

       Open      High      Low      Vol from  

29     670.02    685.11    661.09   92227.36

targets:

       Close

29     674.57

However I realised that the data needs to be in a numpy array, so I now get my features and targets like this

features = df.loc[:,df.columns != 'Closing'].values

targets = df.loc[:,df.columns  == 'Closing'].values

So now my features look like this

[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04

  6.23944806e+07]

 [7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04

  5.41391322e+08]

and my targets look like this

[  674.57]

[ 8042.64]

I then split up my data using

X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)

I tried to follow the Scikit-Learn documentation, which resulted in the following

svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )

svr_rbf.fit(X_training, y_training)

predictions = svr_rbf.predict(X_testing)

print(predictions)

[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818

I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value

I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data

asked 13 hours ago

Ben Williams

New contributor

I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn

A sample row in my dataframe looks like this (2000 rows)

       Open     Close    High     Low      Volume     

0      537.40   537.10   541.55   530.47   52877.98

Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.

Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate

I was defining my features and targets as so

features = df.loc[:,df.columns != 'Closing']

targets = df.loc[:,df.columns  == 'Closing']

Which would return a df looking like this
features:

       Open      High      Low      Vol from  

29     670.02    685.11    661.09   92227.36

targets:

       Close

29     674.57

However I realised that the data needs to be in a numpy array, so I now get my features and targets like this

features = df.loc[:,df.columns != 'Closing'].values

targets = df.loc[:,df.columns  == 'Closing'].values

So now my features look like this

[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04

  6.23944806e+07]

 [7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04

  5.41391322e+08]

and my targets look like this

[  674.57]

[ 8042.64]

I then split up my data using

X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)

I tried to follow the Scikit-Learn documentation, which resulted in the following

svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )

svr_rbf.fit(X_training, y_training)

predictions = svr_rbf.predict(X_testing)

print(predictions)

[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818

I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value

I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data

python regression pandas numpy svr

asked 13 hours ago

Ben Williams

New contributor

asked 13 hours ago

Ben Williams

New contributor

asked 13 hours ago

Ben Williams

New contributor

asked 13 hours ago

Ben Williams

asked 13 hours ago

Ben Williams

New contributor

Ben Williams is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

There are a couple of parts that I changing will help.

First a general one for all model building: I would suggest you scale your data before putting it into the model.

It might not directly solve the problem of receiving the same predicted value in each step, but you might notice that you predictions lie somewhere in the ranges of your input values - as you are using unscaled volume, that is making things difficult for the model. It is essentially have to work on two different scales at the same time, which is cannot do very well.

Have a look at the StandardScaler in sklean for a way how to do that.

Next a few suggestions of things to change, specifically because you are working with stock prices:

I would normally predict the value of the stock market tomorrow, and not the closing prices on the same data, where you are using open/high/low/volume. For me that only make sense if you were to have high-frequency (intraday) data.
Given this, you would need to shift your y value by one step. There is a method on Pandas DataFrames to help with that, but as you dont have a date column and you only need to shift by one timestep anyway, you can just do this:

features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step

targets = df.loc[1:, df.columns  == 'Closing'].values     # start one step later

You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.

Something that would require more setup, would be to look at shuffling your data. Again, because you want to use historical values to predict future ones, you need to keep the relevant hsitory together. Have a look at my other answer to this question and the diagram, which explains more about this idea.

EDIT

You should also scale y_train and y_test, so that the model knows to predict within that range. Do this using the same StandardScaler instance, as not to introduce bias. Have a look at this short tutorial. Your predictions will then be within the same range (e.g. [-1, +1]). You can compute errors on that range too. If you really want, you can then scale your predictions back to the original range so they look more realistic, but that isn't really necessary to validate the model. You can simply plot the predictions against ground truth in the scaled space.

Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.

edited 9 hours ago

answered 12 hours ago

n1k31t4

6,1662319

$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago

$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago

$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46575%2fsvr-is-giving-same-prediction-for-all-features%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

There are a couple of parts that I changing will help.

First a general one for all model building: I would suggest you scale your data before putting it into the model.

Have a look at the StandardScaler in sklean for a way how to do that.

Next a few suggestions of things to change, specifically because you are working with stock prices:

features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step

targets = df.loc[1:, df.columns  == 'Closing'].values     # start one step later

You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.

EDIT

Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.

edited 9 hours ago

answered 12 hours ago

n1k31t4

6,1662319

$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago

$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago

$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago

add a comment |

There are a couple of parts that I changing will help.

First a general one for all model building: I would suggest you scale your data before putting it into the model.

Have a look at the StandardScaler in sklean for a way how to do that.

Next a few suggestions of things to change, specifically because you are working with stock prices:

features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step

targets = df.loc[1:, df.columns  == 'Closing'].values     # start one step later

You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.

EDIT

Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.

edited 9 hours ago

answered 12 hours ago

n1k31t4

6,1662319

$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago

$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago

$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago

add a comment |

There are a couple of parts that I changing will help.

First a general one for all model building: I would suggest you scale your data before putting it into the model.

Have a look at the StandardScaler in sklean for a way how to do that.

Next a few suggestions of things to change, specifically because you are working with stock prices:

features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step

targets = df.loc[1:, df.columns  == 'Closing'].values     # start one step later

You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.

EDIT

Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.

edited 9 hours ago

answered 12 hours ago

n1k31t4

6,1662319

There are a couple of parts that I changing will help.

First a general one for all model building: I would suggest you scale your data before putting it into the model.

Have a look at the StandardScaler in sklean for a way how to do that.

Next a few suggestions of things to change, specifically because you are working with stock prices:

features = df.loc[:-1, df.columns != 'Closing'].values    # leave out last step

targets = df.loc[1:, df.columns  == 'Closing'].values     # start one step later

You could then even then predict the opening price of the following day, or keep closing data in the features data, as that would not introduce temporal bias.

EDIT

Check out this thread, which explains a few reasons as to why you should use the same instance of StandardScaler on the test data.

edited 9 hours ago

answered 12 hours ago

n1k31t4

6,1662319

edited 9 hours ago

answered 12 hours ago

n1k31t4

6,1662319

answered 12 hours ago

n1k31t4

6,1662319

answered 12 hours ago

n1k31t4

6,1662319

$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago

$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago

$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago

add a comment |

$begingroup$
Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference
$endgroup$
– Ben Williams
10 hours ago

$begingroup$
@BenWilliams - check my edit in my answer
$endgroup$
– n1k31t4
9 hours ago

$begingroup$
Thank you very much. My prediction now looks much better and so does my graph.
$endgroup$
– Ben Williams
9 hours ago

Thank you for the well written answer. I shifted the dataframe as you suggested and used scaler on my X_Train and X_test data. My training and testing scores are now around 0.992, however when I plot y_pred vs y_test I can see that some of the numbers are far off. Each time I run the file I predict the values for today based on yesterdays features, and these also differ massively (sometimes 3000, sometimes 5000). Would scaling my Y data help with this or is there not much I can do? I'm using GridSearchCV so changing the parameters shouldnt make a difference

– Ben Williams
10 hours ago

@BenWilliams - check my edit in my answer

– n1k31t4
9 hours ago

Thank you very much. My prediction now looks much better and so does my graph.

– Ben Williams
9 hours ago

add a comment |

Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Ben Williams is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk