Training xgboost model with more data having different characteristic












0












$begingroup$


I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~



This model is working fine.



Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.



I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.



Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?



If I train model with more A having lenth 15127 length will model be able to predict it correctly?



Let me know if any information required.



Here is sample training code :;



dtrain = xgb.DMatrix(train_data,label=train_labels)
# cross_validate(dtrain)
print('training model.....')
param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)
print('saving model....')
model.save_model('xgb_model.bin')
print('done!')









share|improve this question









$endgroup$












  • $begingroup$
    So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
    $endgroup$
    – lsmor
    13 hours ago










  • $begingroup$
    @Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
    $endgroup$
    – Jhon Patric
    13 hours ago
















0












$begingroup$


I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~



This model is working fine.



Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.



I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.



Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?



If I train model with more A having lenth 15127 length will model be able to predict it correctly?



Let me know if any information required.



Here is sample training code :;



dtrain = xgb.DMatrix(train_data,label=train_labels)
# cross_validate(dtrain)
print('training model.....')
param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)
print('saving model....')
model.save_model('xgb_model.bin')
print('done!')









share|improve this question









$endgroup$












  • $begingroup$
    So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
    $endgroup$
    – lsmor
    13 hours ago










  • $begingroup$
    @Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
    $endgroup$
    – Jhon Patric
    13 hours ago














0












0








0





$begingroup$


I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~



This model is working fine.



Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.



I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.



Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?



If I train model with more A having lenth 15127 length will model be able to predict it correctly?



Let me know if any information required.



Here is sample training code :;



dtrain = xgb.DMatrix(train_data,label=train_labels)
# cross_validate(dtrain)
print('training model.....')
param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)
print('saving model....')
model.save_model('xgb_model.bin')
print('done!')









share|improve this question









$endgroup$




I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~



This model is working fine.



Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.



I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.



Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?



If I train model with more A having lenth 15127 length will model be able to predict it correctly?



Let me know if any information required.



Here is sample training code :;



dtrain = xgb.DMatrix(train_data,label=train_labels)
# cross_validate(dtrain)
print('training model.....')
param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)
print('saving model....')
model.save_model('xgb_model.bin')
print('done!')






python classification predictive-modeling decision-trees xgboost






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 15 hours ago









Jhon PatricJhon Patric

185




185












  • $begingroup$
    So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
    $endgroup$
    – lsmor
    13 hours ago










  • $begingroup$
    @Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
    $endgroup$
    – Jhon Patric
    13 hours ago


















  • $begingroup$
    So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
    $endgroup$
    – lsmor
    13 hours ago










  • $begingroup$
    @Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
    $endgroup$
    – Jhon Patric
    13 hours ago
















$begingroup$
So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago




$begingroup$
So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago












$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago




$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago










0






active

oldest

votes












Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49310%2ftraining-xgboost-model-with-more-data-having-different-characteristic%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49310%2ftraining-xgboost-model-with-more-data-having-different-characteristic%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Callistus I

Tabula Rosettana

How to label and detect the document text images