Rule of thumb for good number of features when dealing with grouped data

I have a classification problem on clinical data where I have multiple samples for each patient. So the samples related to the same patient are somehow dependent from each other.

I know that is not possible to know a priori the optimal number of features to use, but there are some rule of thumb that works in many cases.

My question is: are those rules valid also in my case? In particular, I should relate the number of features to the number of instances or to the number of groups?

Thanks

asked Nov 9 '18 at 11:07

Davide Visentin

101

$begingroup$
do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
$endgroup$
– oW_♦
Nov 9 '18 at 16:29

1

$begingroup$
For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
$endgroup$
– Davide Visentin
Nov 10 '18 at 19:00

add a comment |

I have a classification problem on clinical data where I have multiple samples for each patient. So the samples related to the same patient are somehow dependent from each other.

I know that is not possible to know a priori the optimal number of features to use, but there are some rule of thumb that works in many cases.

My question is: are those rules valid also in my case? In particular, I should relate the number of features to the number of instances or to the number of groups?

Thanks

asked Nov 9 '18 at 11:07

Davide Visentin

101

$begingroup$
do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
$endgroup$
– oW_♦
Nov 9 '18 at 16:29

1

$begingroup$
For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
$endgroup$
– Davide Visentin
Nov 10 '18 at 19:00

add a comment |

I have a classification problem on clinical data where I have multiple samples for each patient. So the samples related to the same patient are somehow dependent from each other.

I know that is not possible to know a priori the optimal number of features to use, but there are some rule of thumb that works in many cases.

My question is: are those rules valid also in my case? In particular, I should relate the number of features to the number of instances or to the number of groups?

Thanks

asked Nov 9 '18 at 11:07

Davide Visentin

101

I have a classification problem on clinical data where I have multiple samples for each patient. So the samples related to the same patient are somehow dependent from each other.

I know that is not possible to know a priori the optimal number of features to use, but there are some rule of thumb that works in many cases.

My question is: are those rules valid also in my case? In particular, I should relate the number of features to the number of instances or to the number of groups?

Thanks

classification feature-selection

asked Nov 9 '18 at 11:07

Davide Visentin

101

asked Nov 9 '18 at 11:07

Davide Visentin

101

asked Nov 9 '18 at 11:07

Davide Visentin

101

asked Nov 9 '18 at 11:07

Davide Visentin

101

asked Nov 9 '18 at 11:07

Davide Visentin

101

$begingroup$
do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
$endgroup$
– oW_♦
Nov 9 '18 at 16:29

1

$begingroup$
For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
$endgroup$
– Davide Visentin
Nov 10 '18 at 19:00

add a comment |

$begingroup$
do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
$endgroup$
– oW_♦
Nov 9 '18 at 16:29

1

$begingroup$
For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
$endgroup$
– Davide Visentin
Nov 10 '18 at 19:00

do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.

– oW_♦
Nov 9 '18 at 16:29

For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.

– Davide Visentin
Nov 10 '18 at 19:00

add a comment |

3 Answers
3

active

oldest

votes

This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.

This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:

a) ML Mastery which also provides additional further readings

b) Kaggle

Good luck!

answered Nov 9 '18 at 11:58

TitoOrt

772417

$begingroup$
While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
$endgroup$
– oW_♦
Nov 9 '18 at 16:31

add a comment |

I am sorry to say that I am not aware of a simple "rule of thump", as this varies a lot according to the nature of the problem. But below you can find some guidelines you can use to determine the "optimal" number of features for your problem.

First of all, you should use some dimensionality reduction in order to reduce the number of columns that you are going to use as input. Dimensionality reduction techniques are separated in 2 categories: Feature transformation and feature selection.

Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.

The following material might help you select dimensionality reduction/feature selection approach.

A review article for feature selection for classification

A quite good summary of dimensionality reduction techniques

Now, the next step after selecting the right method and the right classification algorithm is to find out which is the optimal number of features for your problem. A good idea would be to redo the classification recursively every time adding one extra feature and observe the Classification Error. Given that the feature selection technique will work well, you are expected to observe something like this:

enter image description here

The blue dotted line shows the point where the Classification Error of the validation set gets its minimum value. This point indicates the optimal number of features for your problem. After this, the error of the validation set starts increasing while the training set error keeps decreasing - which is an indication of overfitting.
(most probably the curves that you will get from your real data will not be that smooth, there might be some fluctuations and the pattern will be less clear - but more or less this will be the general pattern)

Keep in mind that after the optimal number of features is determined, a separate test set should be used to evaluate the final model (since you used the validation set for calculating one of the model's parameters you cannot also use it for the evaluation).

answered Nov 12 '18 at 11:52

missrg

36518

add a comment |

You may want to define the problem a bit more. I think the most vital piece of information that would help answer this question is whether you are trying to classify patients or condition within patients (ie: "Does the patient have disease X?" vs "Is the patient in X state"?)

If you are building a model to determine whether or not a patient is in X state, then I think feature selection is not really what you should be thinking about. I would probably consider this as a batch effect problem. This makes sense in the case that you want to use as many samples as you can and therefore have multiple samples from each patient, but each patient might have different baselines or differing variation within their measurements. Therefore determining changes in the patient will be obscured unless the features are normalized within each batch.

Normally batch effects refer to difference in batches produced by different lab equipment. However, in this case, I think you could think of the patients as batches. therefore, you can check if there are batch effects by doing PCA and looking at a plot of P1 vs P2 with the samples colored by patient.If the samples are clustering together by color, then you should try correcting for batch effects by standardizing the features for each patient separately. Then redo the PCA and see if batch effects are removed.

At that point, you can just build your classification model and use feature selection or regularization as you normally would.

In the case that you are classifying the patients (ie patient has disease X or not), its clear that the difference between patients is actually what you need to build this model. I doubt that there is some rule of thumb about how many features you should use depending on the number of groups or samples within the group. You could try doing cross validation with random sampling per patient.

edited yesterday

answered yesterday

fractalnature

213

New contributor

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40958%2frule-of-thumb-for-good-number-of-features-when-dealing-with-grouped-data%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.

This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:

a) ML Mastery which also provides additional further readings

b) Kaggle

Good luck!

answered Nov 9 '18 at 11:58

TitoOrt

772417

$begingroup$
While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
$endgroup$
– oW_♦
Nov 9 '18 at 16:31

add a comment |

This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.

This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:

a) ML Mastery which also provides additional further readings

b) Kaggle

Good luck!

answered Nov 9 '18 at 11:58

TitoOrt

772417

$begingroup$
While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
$endgroup$
– oW_♦
Nov 9 '18 at 16:31

add a comment |

This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.

This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:

a) ML Mastery which also provides additional further readings

b) Kaggle

Good luck!

answered Nov 9 '18 at 11:58

TitoOrt

772417

This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.

This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:

a) ML Mastery which also provides additional further readings

b) Kaggle

Good luck!

answered Nov 9 '18 at 11:58

TitoOrt

772417

answered Nov 9 '18 at 11:58

TitoOrt

772417

answered Nov 9 '18 at 11:58

TitoOrt

772417

answered Nov 9 '18 at 11:58

TitoOrt

772417

$begingroup$
While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
$endgroup$
– oW_♦
Nov 9 '18 at 16:31

add a comment |

$begingroup$
While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
$endgroup$
– oW_♦
Nov 9 '18 at 16:31

While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.

– oW_♦
Nov 9 '18 at 16:31

add a comment |

Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.

The following material might help you select dimensionality reduction/feature selection approach.

A review article for feature selection for classification

A quite good summary of dimensionality reduction techniques

enter image description here

answered Nov 12 '18 at 11:52

missrg

36518

add a comment |

Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.

The following material might help you select dimensionality reduction/feature selection approach.

A review article for feature selection for classification

A quite good summary of dimensionality reduction techniques

enter image description here

answered Nov 12 '18 at 11:52

missrg

36518

add a comment |

Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.

The following material might help you select dimensionality reduction/feature selection approach.

A review article for feature selection for classification

A quite good summary of dimensionality reduction techniques

enter image description here

answered Nov 12 '18 at 11:52

missrg

36518

Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.

The following material might help you select dimensionality reduction/feature selection approach.

A review article for feature selection for classification

A quite good summary of dimensionality reduction techniques

enter image description here

answered Nov 12 '18 at 11:52

missrg

36518

answered Nov 12 '18 at 11:52

missrg

36518

answered Nov 12 '18 at 11:52

missrg

36518

answered Nov 12 '18 at 11:52

missrg

36518

add a comment |

At that point, you can just build your classification model and use feature selection or regularization as you normally would.

edited yesterday

answered yesterday

fractalnature

213

New contributor

add a comment |

At that point, you can just build your classification model and use feature selection or regularization as you normally would.

edited yesterday

answered yesterday

fractalnature

213

New contributor

add a comment |

At that point, you can just build your classification model and use feature selection or regularization as you normally would.

edited yesterday

answered yesterday

fractalnature

213

New contributor

At that point, you can just build your classification model and use feature selection or regularization as you normally would.

edited yesterday

answered yesterday

fractalnature

213

New contributor

edited yesterday

answered yesterday

fractalnature

213

New contributor

answered yesterday

fractalnature

213

answered yesterday

fractalnature

213

New contributor

fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk