Dimension reduction for data with categorical features [on hold]
$begingroup$
I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).
Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?
data-cleaning categorical-data dimensionality-reduction
New contributor
$endgroup$
put on hold as unclear what you're asking by Toros91, Sean Owen♦ 3 hours ago
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
$begingroup$
I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).
Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?
data-cleaning categorical-data dimensionality-reduction
New contributor
$endgroup$
put on hold as unclear what you're asking by Toros91, Sean Owen♦ 3 hours ago
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
$begingroup$
I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).
Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?
data-cleaning categorical-data dimensionality-reduction
New contributor
$endgroup$
I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).
Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?
data-cleaning categorical-data dimensionality-reduction
data-cleaning categorical-data dimensionality-reduction
New contributor
New contributor
edited 15 hours ago
Puneet Shekhawat
New contributor
asked yesterday
Puneet ShekhawatPuneet Shekhawat
12
12
New contributor
New contributor
put on hold as unclear what you're asking by Toros91, Sean Owen♦ 3 hours ago
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as unclear what you're asking by Toros91, Sean Owen♦ 3 hours ago
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
$endgroup$
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
15 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
15 hours ago
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
$endgroup$
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
15 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
15 hours ago
add a comment |
$begingroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
$endgroup$
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
15 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
15 hours ago
add a comment |
$begingroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
$endgroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
answered 21 hours ago
bradSbradS
55312
55312
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
15 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
15 hours ago
add a comment |
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
15 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
15 hours ago
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
15 hours ago
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
15 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,
category_encoders
in Python
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.$endgroup$
– bradS
15 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,
category_encoders
in Python
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.$endgroup$
– bradS
15 hours ago
add a comment |