Dimension reduction for data with categorical features [on hold]












0












$begingroup$


I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).



Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?










share|improve this question









New contributor




Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$



put on hold as unclear what you're asking by Toros91, Sean Owen 3 hours ago


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.




















    0












    $begingroup$


    I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).



    Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?










    share|improve this question









    New contributor




    Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$



    put on hold as unclear what you're asking by Toros91, Sean Owen 3 hours ago


    Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


















      0












      0








      0


      0



      $begingroup$


      I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).



      Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?










      share|improve this question









      New contributor




      Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).



      Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?







      data-cleaning categorical-data dimensionality-reduction






      share|improve this question









      New contributor




      Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 15 hours ago







      Puneet Shekhawat













      New contributor




      Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked yesterday









      Puneet ShekhawatPuneet Shekhawat

      12




      12




      New contributor




      Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Puneet Shekhawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      put on hold as unclear what you're asking by Toros91, Sean Owen 3 hours ago


      Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






      put on hold as unclear what you're asking by Toros91, Sean Owen 3 hours ago


      Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
























          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.



          There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).






          share|improve this answer









          $endgroup$













          • $begingroup$
            I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
            $endgroup$
            – Puneet Shekhawat
            15 hours ago












          • $begingroup$
            Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance, category_encoders in Python allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
            $endgroup$
            – bradS
            15 hours ago


















          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.



          There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).






          share|improve this answer









          $endgroup$













          • $begingroup$
            I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
            $endgroup$
            – Puneet Shekhawat
            15 hours ago












          • $begingroup$
            Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance, category_encoders in Python allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
            $endgroup$
            – bradS
            15 hours ago
















          0












          $begingroup$

          If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.



          There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).






          share|improve this answer









          $endgroup$













          • $begingroup$
            I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
            $endgroup$
            – Puneet Shekhawat
            15 hours ago












          • $begingroup$
            Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance, category_encoders in Python allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
            $endgroup$
            – bradS
            15 hours ago














          0












          0








          0





          $begingroup$

          If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.



          There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).






          share|improve this answer









          $endgroup$



          If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.



          There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 21 hours ago









          bradSbradS

          55312




          55312












          • $begingroup$
            I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
            $endgroup$
            – Puneet Shekhawat
            15 hours ago












          • $begingroup$
            Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance, category_encoders in Python allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
            $endgroup$
            – bradS
            15 hours ago


















          • $begingroup$
            I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
            $endgroup$
            – Puneet Shekhawat
            15 hours ago












          • $begingroup$
            Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance, category_encoders in Python allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
            $endgroup$
            – bradS
            15 hours ago
















          $begingroup$
          I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
          $endgroup$
          – Puneet Shekhawat
          15 hours ago






          $begingroup$
          I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
          $endgroup$
          – Puneet Shekhawat
          15 hours ago














          $begingroup$
          Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance, category_encoders in Python allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
          $endgroup$
          – bradS
          15 hours ago




          $begingroup$
          Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance, category_encoders in Python allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
          $endgroup$
          – bradS
          15 hours ago



          Popular posts from this blog

          How to label and detect the document text images

          Tabula Rosettana

          Aureus (color)