Training xgboost model with more data having different characteristic
$begingroup$
I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~
This model is working fine.
Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.
I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.
Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?
If I train model with more A having lenth 15127 length will model be able to predict it correctly?
Let me know if any information required.
Here is sample training code :;
dtrain = xgb.DMatrix(train_data,label=train_labels)
# cross_validate(dtrain)
print('training model.....')
param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)
print('saving model....')
model.save_model('xgb_model.bin')
print('done!')
python classification predictive-modeling decision-trees xgboost
$endgroup$
add a comment |
$begingroup$
I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~
This model is working fine.
Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.
I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.
Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?
If I train model with more A having lenth 15127 length will model be able to predict it correctly?
Let me know if any information required.
Here is sample training code :;
dtrain = xgb.DMatrix(train_data,label=train_labels)
# cross_validate(dtrain)
print('training model.....')
param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)
print('saving model....')
model.save_model('xgb_model.bin')
print('done!')
python classification predictive-modeling decision-trees xgboost
$endgroup$
$begingroup$
So the second file has onlyA's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago
$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago
add a comment |
$begingroup$
I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~
This model is working fine.
Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.
I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.
Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?
If I train model with more A having lenth 15127 length will model be able to predict it correctly?
Let me know if any information required.
Here is sample training code :;
dtrain = xgb.DMatrix(train_data,label=train_labels)
# cross_validate(dtrain)
print('training model.....')
param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)
print('saving model....')
model.save_model('xgb_model.bin')
print('done!')
python classification predictive-modeling decision-trees xgboost
$endgroup$
I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~
This model is working fine.
Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.
I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.
Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?
If I train model with more A having lenth 15127 length will model be able to predict it correctly?
Let me know if any information required.
Here is sample training code :;
dtrain = xgb.DMatrix(train_data,label=train_labels)
# cross_validate(dtrain)
print('training model.....')
param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)
print('saving model....')
model.save_model('xgb_model.bin')
print('done!')
python classification predictive-modeling decision-trees xgboost
python classification predictive-modeling decision-trees xgboost
asked 15 hours ago
Jhon PatricJhon Patric
185
185
$begingroup$
So the second file has onlyA's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago
$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago
add a comment |
$begingroup$
So the second file has onlyA's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago
$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago
$begingroup$
So the second file has only
A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?$endgroup$
– lsmor
13 hours ago
$begingroup$
So the second file has only
A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?$endgroup$
– lsmor
13 hours ago
$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago
$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49310%2ftraining-xgboost-model-with-more-data-having-different-characteristic%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49310%2ftraining-xgboost-model-with-more-data-having-different-characteristic%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
So the second file has only
A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?$endgroup$
– lsmor
13 hours ago
$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago