Training xgboost model with more data having different characteristic

I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~

This model is working fine.

Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.

I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.

Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?

If I train model with more A having lenth 15127 length will model be able to predict it correctly?

Let me know if any information required.

Here is sample training code :;

dtrain = xgb.DMatrix(train_data,label=train_labels)

# cross_validate(dtrain)

print('training model.....')

param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}

model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)

print('saving model....')

model.save_model('xgb_model.bin')

print('done!')

asked 15 hours ago

Jhon Patric

185

$begingroup$
So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago

$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago

add a comment |

I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~

This model is working fine.

Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.

I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.

Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?

If I train model with more A having lenth 15127 length will model be able to predict it correctly?

Let me know if any information required.

Here is sample training code :;

dtrain = xgb.DMatrix(train_data,label=train_labels)

# cross_validate(dtrain)

print('training model.....')

param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}

model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)

print('saving model....')

model.save_model('xgb_model.bin')

print('done!')

asked 15 hours ago

Jhon Patric

185

$begingroup$
So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago

$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago

add a comment |

I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~

This model is working fine.

Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.

I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.

Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?

If I train model with more A having lenth 15127 length will model be able to predict it correctly?

Let me know if any information required.

Here is sample training code :;

dtrain = xgb.DMatrix(train_data,label=train_labels)

# cross_validate(dtrain)

print('training model.....')

param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}

model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)

print('saving model....')

model.save_model('xgb_model.bin')

print('done!')

asked 15 hours ago

Jhon Patric

185

I have trained my model for ECG data which has 8528 ECG files having length 30s and sample rate 300 so total file length in csv is 9000. Data has four classes i.e. A, N,O and ~

This model is working fine.

Now I want to retrain the model using few more ecg files. Which has length 42s and sample rate 360. So total csv file of this ecg signal has length 15127.

I trained model with A file having length 15127, then I tried to predict same file with trained model, but it predicts class value N.

Can anyone visualie how Xgboost will learn with added such file having signal length more? Does such file encountered as outlier?

If I train model with more A having lenth 15127 length will model be able to predict it correctly?

Let me know if any information required.

Here is sample training code :;

dtrain = xgb.DMatrix(train_data,label=train_labels)

# cross_validate(dtrain)

print('training model.....')

param = {'num_class':4,'objective':'multi:softmax','eval_metric':['merror'],'max_depth':7,'eta':0.04,'subsample':0.8,'min_child_weight':0.5,'max_delta_step':7,'gamma':2,'lambda':10,'colsample_bytree':0.5}

model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)

print('saving model....')

model.save_model('xgb_model.bin')

print('done!')

python classification predictive-modeling decision-trees xgboost

asked 15 hours ago

Jhon Patric

185

asked 15 hours ago

Jhon Patric

185

asked 15 hours ago

Jhon Patric

185

asked 15 hours ago

Jhon Patric

185

asked 15 hours ago

Jhon Patric

185

$begingroup$
So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago

$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago

add a comment |

$begingroup$
So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?
$endgroup$
– lsmor
13 hours ago

$begingroup$
@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.
$endgroup$
– Jhon Patric
13 hours ago

So the second file has only A's in it?? I haven't work with ecg, Could you share how is the training data? I guess you have 360 row and 43 variables (42 seconds + target). Am I right?

– lsmor
13 hours ago

@Ismor: No you are taking it other way. You can consider ECG as a signal record. (another example of signal record is Audio signal). Which has sample rate 360 and signal is 42 second long. So that size of one file is 360*42. How it looks like is : csv file. Signal to converted to value and stored in csv file. I have such 8529 records(rows), each row for different signal.

– Jhon Patric
13 hours ago

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49310%2ftraining-xgboost-model-with-more-data-having-different-characteristic%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk