Rule of thumb for good number of features when dealing with grouped data












0












$begingroup$


I have a classification problem on clinical data where I have multiple samples for each patient. So the samples related to the same patient are somehow dependent from each other.



I know that is not possible to know a priori the optimal number of features to use, but there are some rule of thumb that works in many cases.



My question is: are those rules valid also in my case? In particular, I should relate the number of features to the number of instances or to the number of groups?



Thanks










share|improve this question









$endgroup$












  • $begingroup$
    do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
    $endgroup$
    – oW_
    Nov 9 '18 at 16:29






  • 1




    $begingroup$
    For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
    $endgroup$
    – Davide Visentin
    Nov 10 '18 at 19:00
















0












$begingroup$


I have a classification problem on clinical data where I have multiple samples for each patient. So the samples related to the same patient are somehow dependent from each other.



I know that is not possible to know a priori the optimal number of features to use, but there are some rule of thumb that works in many cases.



My question is: are those rules valid also in my case? In particular, I should relate the number of features to the number of instances or to the number of groups?



Thanks










share|improve this question









$endgroup$












  • $begingroup$
    do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
    $endgroup$
    – oW_
    Nov 9 '18 at 16:29






  • 1




    $begingroup$
    For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
    $endgroup$
    – Davide Visentin
    Nov 10 '18 at 19:00














0












0








0





$begingroup$


I have a classification problem on clinical data where I have multiple samples for each patient. So the samples related to the same patient are somehow dependent from each other.



I know that is not possible to know a priori the optimal number of features to use, but there are some rule of thumb that works in many cases.



My question is: are those rules valid also in my case? In particular, I should relate the number of features to the number of instances or to the number of groups?



Thanks










share|improve this question









$endgroup$




I have a classification problem on clinical data where I have multiple samples for each patient. So the samples related to the same patient are somehow dependent from each other.



I know that is not possible to know a priori the optimal number of features to use, but there are some rule of thumb that works in many cases.



My question is: are those rules valid also in my case? In particular, I should relate the number of features to the number of instances or to the number of groups?



Thanks







classification feature-selection






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 9 '18 at 11:07









Davide VisentinDavide Visentin

101




101












  • $begingroup$
    do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
    $endgroup$
    – oW_
    Nov 9 '18 at 16:29






  • 1




    $begingroup$
    For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
    $endgroup$
    – Davide Visentin
    Nov 10 '18 at 19:00


















  • $begingroup$
    do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
    $endgroup$
    – oW_
    Nov 9 '18 at 16:29






  • 1




    $begingroup$
    For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
    $endgroup$
    – Davide Visentin
    Nov 10 '18 at 19:00
















$begingroup$
do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
$endgroup$
– oW_
Nov 9 '18 at 16:29




$begingroup$
do you mean different things by "samples" and "features" or do they refer to the same thing? your question would benefit from some additional details, e.g. what are you trying to accomplish, what is measured and how.
$endgroup$
– oW_
Nov 9 '18 at 16:29




1




1




$begingroup$
For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
$endgroup$
– Davide Visentin
Nov 10 '18 at 19:00




$begingroup$
For "samples" I mean "instances". I have 33 numeric features and around 6k instances. The instances belongs to 14 different patients (each patient has around 4-500 instances). I know how to correctly perform cross-validation etc. by keeping into account the fact that there are multiple instances for each patient (scikit learn gives the tool to address exactly this case), but I'd wish to know if there are previous studies on the relation between number of features, number of groups (in my case, number of patients) and number of instances in a case like this one.
$endgroup$
– Davide Visentin
Nov 10 '18 at 19:00










3 Answers
3






active

oldest

votes


















0












$begingroup$

This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.



This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:



a) ML Mastery which also provides additional further readings



b) Kaggle



Good luck!






share|improve this answer









$endgroup$













  • $begingroup$
    While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
    $endgroup$
    – oW_
    Nov 9 '18 at 16:31



















0












$begingroup$

I am sorry to say that I am not aware of a simple "rule of thump", as this varies a lot according to the nature of the problem. But below you can find some guidelines you can use to determine the "optimal" number of features for your problem.



First of all, you should use some dimensionality reduction in order to reduce the number of columns that you are going to use as input. Dimensionality reduction techniques are separated in 2 categories: Feature transformation and feature selection.




  • Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.


  • Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.



The following material might help you select dimensionality reduction/feature selection approach.




  • A review article for feature selection for classification

  • A quite good summary of dimensionality reduction techniques


Now, the next step after selecting the right method and the right classification algorithm is to find out which is the optimal number of features for your problem. A good idea would be to redo the classification recursively every time adding one extra feature and observe the Classification Error. Given that the feature selection technique will work well, you are expected to observe something like this:



enter image description here



The blue dotted line shows the point where the Classification Error of the validation set gets its minimum value. This point indicates the optimal number of features for your problem. After this, the error of the validation set starts increasing while the training set error keeps decreasing - which is an indication of overfitting.
(most probably the curves that you will get from your real data will not be that smooth, there might be some fluctuations and the pattern will be less clear - but more or less this will be the general pattern)



Keep in mind that after the optimal number of features is determined, a separate test set should be used to evaluate the final model (since you used the validation set for calculating one of the model's parameters you cannot also use it for the evaluation).






share|improve this answer









$endgroup$





















    0












    $begingroup$

    You may want to define the problem a bit more. I think the most vital piece of information that would help answer this question is whether you are trying to classify patients or condition within patients (ie: "Does the patient have disease X?" vs "Is the patient in X state"?)



    If you are building a model to determine whether or not a patient is in X state, then I think feature selection is not really what you should be thinking about. I would probably consider this as a batch effect problem. This makes sense in the case that you want to use as many samples as you can and therefore have multiple samples from each patient, but each patient might have different baselines or differing variation within their measurements. Therefore determining changes in the patient will be obscured unless the features are normalized within each batch.



    Normally batch effects refer to difference in batches produced by different lab equipment. However, in this case, I think you could think of the patients as batches. therefore, you can check if there are batch effects by doing PCA and looking at a plot of P1 vs P2 with the samples colored by patient.If the samples are clustering together by color, then you should try correcting for batch effects by standardizing the features for each patient separately. Then redo the PCA and see if batch effects are removed.



    At that point, you can just build your classification model and use feature selection or regularization as you normally would.



    In the case that you are classifying the patients (ie patient has disease X or not), its clear that the difference between patients is actually what you need to build this model. I doubt that there is some rule of thumb about how many features you should use depending on the number of groups or samples within the group. You could try doing cross validation with random sampling per patient.






    share|improve this answer










    New contributor




    fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$














      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "557"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40958%2frule-of-thumb-for-good-number-of-features-when-dealing-with-grouped-data%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0












      $begingroup$

      This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.



      This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:



      a) ML Mastery which also provides additional further readings



      b) Kaggle



      Good luck!






      share|improve this answer









      $endgroup$













      • $begingroup$
        While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
        $endgroup$
        – oW_
        Nov 9 '18 at 16:31
















      0












      $begingroup$

      This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.



      This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:



      a) ML Mastery which also provides additional further readings



      b) Kaggle



      Good luck!






      share|improve this answer









      $endgroup$













      • $begingroup$
        While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
        $endgroup$
        – oW_
        Nov 9 '18 at 16:31














      0












      0








      0





      $begingroup$

      This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.



      This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:



      a) ML Mastery which also provides additional further readings



      b) Kaggle



      Good luck!






      share|improve this answer









      $endgroup$



      This a really hard question to answer. I recommend you do some reading to get a feeling on what can be done and particularly, what can be done for your particular task.



      This paper is a must. But if you prefer a more practical approach have a look at these two interesting sources:



      a) ML Mastery which also provides additional further readings



      b) Kaggle



      Good luck!







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Nov 9 '18 at 11:58









      TitoOrtTitoOrt

      772417




      772417












      • $begingroup$
        While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
        $endgroup$
        – oW_
        Nov 9 '18 at 16:31


















      • $begingroup$
        While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
        $endgroup$
        – oW_
        Nov 9 '18 at 16:31
















      $begingroup$
      While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
      $endgroup$
      – oW_
      Nov 9 '18 at 16:31




      $begingroup$
      While the question is very vague itself, your answer is very generic and only addresses feature selection in general. Also link-only answers are discouraged.
      $endgroup$
      – oW_
      Nov 9 '18 at 16:31











      0












      $begingroup$

      I am sorry to say that I am not aware of a simple "rule of thump", as this varies a lot according to the nature of the problem. But below you can find some guidelines you can use to determine the "optimal" number of features for your problem.



      First of all, you should use some dimensionality reduction in order to reduce the number of columns that you are going to use as input. Dimensionality reduction techniques are separated in 2 categories: Feature transformation and feature selection.




      • Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.


      • Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.



      The following material might help you select dimensionality reduction/feature selection approach.




      • A review article for feature selection for classification

      • A quite good summary of dimensionality reduction techniques


      Now, the next step after selecting the right method and the right classification algorithm is to find out which is the optimal number of features for your problem. A good idea would be to redo the classification recursively every time adding one extra feature and observe the Classification Error. Given that the feature selection technique will work well, you are expected to observe something like this:



      enter image description here



      The blue dotted line shows the point where the Classification Error of the validation set gets its minimum value. This point indicates the optimal number of features for your problem. After this, the error of the validation set starts increasing while the training set error keeps decreasing - which is an indication of overfitting.
      (most probably the curves that you will get from your real data will not be that smooth, there might be some fluctuations and the pattern will be less clear - but more or less this will be the general pattern)



      Keep in mind that after the optimal number of features is determined, a separate test set should be used to evaluate the final model (since you used the validation set for calculating one of the model's parameters you cannot also use it for the evaluation).






      share|improve this answer









      $endgroup$


















        0












        $begingroup$

        I am sorry to say that I am not aware of a simple "rule of thump", as this varies a lot according to the nature of the problem. But below you can find some guidelines you can use to determine the "optimal" number of features for your problem.



        First of all, you should use some dimensionality reduction in order to reduce the number of columns that you are going to use as input. Dimensionality reduction techniques are separated in 2 categories: Feature transformation and feature selection.




        • Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.


        • Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.



        The following material might help you select dimensionality reduction/feature selection approach.




        • A review article for feature selection for classification

        • A quite good summary of dimensionality reduction techniques


        Now, the next step after selecting the right method and the right classification algorithm is to find out which is the optimal number of features for your problem. A good idea would be to redo the classification recursively every time adding one extra feature and observe the Classification Error. Given that the feature selection technique will work well, you are expected to observe something like this:



        enter image description here



        The blue dotted line shows the point where the Classification Error of the validation set gets its minimum value. This point indicates the optimal number of features for your problem. After this, the error of the validation set starts increasing while the training set error keeps decreasing - which is an indication of overfitting.
        (most probably the curves that you will get from your real data will not be that smooth, there might be some fluctuations and the pattern will be less clear - but more or less this will be the general pattern)



        Keep in mind that after the optimal number of features is determined, a separate test set should be used to evaluate the final model (since you used the validation set for calculating one of the model's parameters you cannot also use it for the evaluation).






        share|improve this answer









        $endgroup$
















          0












          0








          0





          $begingroup$

          I am sorry to say that I am not aware of a simple "rule of thump", as this varies a lot according to the nature of the problem. But below you can find some guidelines you can use to determine the "optimal" number of features for your problem.



          First of all, you should use some dimensionality reduction in order to reduce the number of columns that you are going to use as input. Dimensionality reduction techniques are separated in 2 categories: Feature transformation and feature selection.




          • Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.


          • Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.



          The following material might help you select dimensionality reduction/feature selection approach.




          • A review article for feature selection for classification

          • A quite good summary of dimensionality reduction techniques


          Now, the next step after selecting the right method and the right classification algorithm is to find out which is the optimal number of features for your problem. A good idea would be to redo the classification recursively every time adding one extra feature and observe the Classification Error. Given that the feature selection technique will work well, you are expected to observe something like this:



          enter image description here



          The blue dotted line shows the point where the Classification Error of the validation set gets its minimum value. This point indicates the optimal number of features for your problem. After this, the error of the validation set starts increasing while the training set error keeps decreasing - which is an indication of overfitting.
          (most probably the curves that you will get from your real data will not be that smooth, there might be some fluctuations and the pattern will be less clear - but more or less this will be the general pattern)



          Keep in mind that after the optimal number of features is determined, a separate test set should be used to evaluate the final model (since you used the validation set for calculating one of the model's parameters you cannot also use it for the evaluation).






          share|improve this answer









          $endgroup$



          I am sorry to say that I am not aware of a simple "rule of thump", as this varies a lot according to the nature of the problem. But below you can find some guidelines you can use to determine the "optimal" number of features for your problem.



          First of all, you should use some dimensionality reduction in order to reduce the number of columns that you are going to use as input. Dimensionality reduction techniques are separated in 2 categories: Feature transformation and feature selection.




          • Feature transformation techniques restructure the feature-space and produce a new set of features based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.


          • Feature selection techniques actually select the features with the highest "importance"/influence on the output variable, from the set of existing features. Some popular techniques are Fisher score (actually assigns weights to the features based on some "importance" criteria), Recursive Feature Elimination (usually provides quite good results when combined with SVM classifier) etc.



          The following material might help you select dimensionality reduction/feature selection approach.




          • A review article for feature selection for classification

          • A quite good summary of dimensionality reduction techniques


          Now, the next step after selecting the right method and the right classification algorithm is to find out which is the optimal number of features for your problem. A good idea would be to redo the classification recursively every time adding one extra feature and observe the Classification Error. Given that the feature selection technique will work well, you are expected to observe something like this:



          enter image description here



          The blue dotted line shows the point where the Classification Error of the validation set gets its minimum value. This point indicates the optimal number of features for your problem. After this, the error of the validation set starts increasing while the training set error keeps decreasing - which is an indication of overfitting.
          (most probably the curves that you will get from your real data will not be that smooth, there might be some fluctuations and the pattern will be less clear - but more or less this will be the general pattern)



          Keep in mind that after the optimal number of features is determined, a separate test set should be used to evaluate the final model (since you used the validation set for calculating one of the model's parameters you cannot also use it for the evaluation).







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 12 '18 at 11:52









          missrgmissrg

          36518




          36518























              0












              $begingroup$

              You may want to define the problem a bit more. I think the most vital piece of information that would help answer this question is whether you are trying to classify patients or condition within patients (ie: "Does the patient have disease X?" vs "Is the patient in X state"?)



              If you are building a model to determine whether or not a patient is in X state, then I think feature selection is not really what you should be thinking about. I would probably consider this as a batch effect problem. This makes sense in the case that you want to use as many samples as you can and therefore have multiple samples from each patient, but each patient might have different baselines or differing variation within their measurements. Therefore determining changes in the patient will be obscured unless the features are normalized within each batch.



              Normally batch effects refer to difference in batches produced by different lab equipment. However, in this case, I think you could think of the patients as batches. therefore, you can check if there are batch effects by doing PCA and looking at a plot of P1 vs P2 with the samples colored by patient.If the samples are clustering together by color, then you should try correcting for batch effects by standardizing the features for each patient separately. Then redo the PCA and see if batch effects are removed.



              At that point, you can just build your classification model and use feature selection or regularization as you normally would.



              In the case that you are classifying the patients (ie patient has disease X or not), its clear that the difference between patients is actually what you need to build this model. I doubt that there is some rule of thumb about how many features you should use depending on the number of groups or samples within the group. You could try doing cross validation with random sampling per patient.






              share|improve this answer










              New contributor




              fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$


















                0












                $begingroup$

                You may want to define the problem a bit more. I think the most vital piece of information that would help answer this question is whether you are trying to classify patients or condition within patients (ie: "Does the patient have disease X?" vs "Is the patient in X state"?)



                If you are building a model to determine whether or not a patient is in X state, then I think feature selection is not really what you should be thinking about. I would probably consider this as a batch effect problem. This makes sense in the case that you want to use as many samples as you can and therefore have multiple samples from each patient, but each patient might have different baselines or differing variation within their measurements. Therefore determining changes in the patient will be obscured unless the features are normalized within each batch.



                Normally batch effects refer to difference in batches produced by different lab equipment. However, in this case, I think you could think of the patients as batches. therefore, you can check if there are batch effects by doing PCA and looking at a plot of P1 vs P2 with the samples colored by patient.If the samples are clustering together by color, then you should try correcting for batch effects by standardizing the features for each patient separately. Then redo the PCA and see if batch effects are removed.



                At that point, you can just build your classification model and use feature selection or regularization as you normally would.



                In the case that you are classifying the patients (ie patient has disease X or not), its clear that the difference between patients is actually what you need to build this model. I doubt that there is some rule of thumb about how many features you should use depending on the number of groups or samples within the group. You could try doing cross validation with random sampling per patient.






                share|improve this answer










                New contributor




                fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  You may want to define the problem a bit more. I think the most vital piece of information that would help answer this question is whether you are trying to classify patients or condition within patients (ie: "Does the patient have disease X?" vs "Is the patient in X state"?)



                  If you are building a model to determine whether or not a patient is in X state, then I think feature selection is not really what you should be thinking about. I would probably consider this as a batch effect problem. This makes sense in the case that you want to use as many samples as you can and therefore have multiple samples from each patient, but each patient might have different baselines or differing variation within their measurements. Therefore determining changes in the patient will be obscured unless the features are normalized within each batch.



                  Normally batch effects refer to difference in batches produced by different lab equipment. However, in this case, I think you could think of the patients as batches. therefore, you can check if there are batch effects by doing PCA and looking at a plot of P1 vs P2 with the samples colored by patient.If the samples are clustering together by color, then you should try correcting for batch effects by standardizing the features for each patient separately. Then redo the PCA and see if batch effects are removed.



                  At that point, you can just build your classification model and use feature selection or regularization as you normally would.



                  In the case that you are classifying the patients (ie patient has disease X or not), its clear that the difference between patients is actually what you need to build this model. I doubt that there is some rule of thumb about how many features you should use depending on the number of groups or samples within the group. You could try doing cross validation with random sampling per patient.






                  share|improve this answer










                  New contributor




                  fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  $endgroup$



                  You may want to define the problem a bit more. I think the most vital piece of information that would help answer this question is whether you are trying to classify patients or condition within patients (ie: "Does the patient have disease X?" vs "Is the patient in X state"?)



                  If you are building a model to determine whether or not a patient is in X state, then I think feature selection is not really what you should be thinking about. I would probably consider this as a batch effect problem. This makes sense in the case that you want to use as many samples as you can and therefore have multiple samples from each patient, but each patient might have different baselines or differing variation within their measurements. Therefore determining changes in the patient will be obscured unless the features are normalized within each batch.



                  Normally batch effects refer to difference in batches produced by different lab equipment. However, in this case, I think you could think of the patients as batches. therefore, you can check if there are batch effects by doing PCA and looking at a plot of P1 vs P2 with the samples colored by patient.If the samples are clustering together by color, then you should try correcting for batch effects by standardizing the features for each patient separately. Then redo the PCA and see if batch effects are removed.



                  At that point, you can just build your classification model and use feature selection or regularization as you normally would.



                  In the case that you are classifying the patients (ie patient has disease X or not), its clear that the difference between patients is actually what you need to build this model. I doubt that there is some rule of thumb about how many features you should use depending on the number of groups or samples within the group. You could try doing cross validation with random sampling per patient.







                  share|improve this answer










                  New contributor




                  fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer








                  edited yesterday





















                  New contributor




                  fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered yesterday









                  fractalnaturefractalnature

                  213




                  213




                  New contributor




                  fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  fractalnature is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40958%2frule-of-thumb-for-good-number-of-features-when-dealing-with-grouped-data%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Callistus I

                      Tabula Rosettana

                      How to label and detect the document text images