Feature selection vs Feature extraction. Which to use when?

Feature extraction and feature selection essentially reduce the dimensionality of the data, but feature extraction also makes the data more separable, if I am right.

Which technique would be preferred over the other and when?

I was thinking, since feature selection does not modify the original data and it's properties, I assume that you will use feature selection when it's important that the features you're training on be unchanged. But I can't imagine why you would want something like this..

edited 13 mins ago

Aditya

1,4101525

asked Mar 13 '18 at 5:32

Sid

11318

add a comment |

Feature extraction and feature selection essentially reduce the dimensionality of the data, but feature extraction also makes the data more separable, if I am right.

Which technique would be preferred over the other and when?

edited 13 mins ago

Aditya

1,4101525

asked Mar 13 '18 at 5:32

Sid

11318

add a comment |

Feature extraction and feature selection essentially reduce the dimensionality of the data, but feature extraction also makes the data more separable, if I am right.

Which technique would be preferred over the other and when?

edited 13 mins ago

Aditya

1,4101525

asked Mar 13 '18 at 5:32

Sid

11318

Feature extraction and feature selection essentially reduce the dimensionality of the data, but feature extraction also makes the data more separable, if I am right.

Which technique would be preferred over the other and when?

feature-selection feature-extraction dimensionality-reduction

edited 13 mins ago

Aditya

1,4101525

asked Mar 13 '18 at 5:32

Sid

11318

edited 13 mins ago

Aditya

1,4101525

asked Mar 13 '18 at 5:32

Sid

11318

edited 13 mins ago

Aditya

1,4101525

edited 13 mins ago

Aditya

1,4101525

edited 13 mins ago

Aditya

1,4101525

asked Mar 13 '18 at 5:32

Sid

11318

asked Mar 13 '18 at 5:32

Sid

11318

asked Mar 13 '18 at 5:32

Sid

11318

add a comment |

4 Answers
4

active

oldest

votes

Adding to The answer given by Toros,

These(see below bullets) three are quite similar but with a subtle differences-:(concise and easy to remember)

feature extraction and feature engineering: transformation of raw data into features suitable for modeling;

feature transformation: transformation of data to improve the accuracy of the algorithm;

feature selection: removing unnecessary features.

Just to add an Example of the same,

Feature Extraction and Engineering(we can extract something from them)

Texts(ngrams, word2vec, tf-idf etc)

Images(CNN'S, texts, q&a)

Geospatial data(lat, long etc)

Date and time(day,month,week,year..)

Time series, web, etc...

Dimensional Reduction Techniques..

.....(And Many Others)

Feature transformations(transforming them to make sense)

Normalization and changing distribution(Scaling)

Interactions

Filling in the missing values(median filling etc)

.....(And Many Others)

Feature selection(building your model on these selected features)

Statistical approaches

Selection by modeling

Grid search

Cross Validation

.....(And Many Others)

Hope this helps...

Do look at the links shared by others.
They are Quite Nice...

edited Oct 25 '18 at 9:38

answered Mar 13 '18 at 10:00

Aditya

1,4101525

$begingroup$
nice way of answering +1 for that.
$endgroup$
– Toros91
Mar 14 '18 at 1:48

$begingroup$
Kudos to this community.. Learning a lot from it..
$endgroup$
– Aditya
Mar 14 '18 at 2:11

1

$begingroup$
True that man, I've been a member since October, 2017. I've learned a lot of things. Hope it be the same for you as well. I've been reading your answers, they are good .BTW sorry for the thing which you had gone through on SO. I couldn't see the whole thing but as Neil Slater said good that you kept your cool all the way till the end. Keep it up! We still have a long way to go. :)
$endgroup$
– Toros91
Mar 14 '18 at 2:19

$begingroup$
What's the order in which these should be processed? In addition to data cleaning and data splitting. Which out of the 5 is the first step?
$endgroup$
– technazi
Oct 20 '18 at 19:39

$begingroup$
Data splitting is done at the very end when you make sure that the data is ready to be sent for Modelling...And imho there's no such ordering for the above mentioned things because they overlap quite a few times(feature extraction, feature engineering, Feature transformation.) but Feature Selection is surely done after splitting the data into train as validation provided that you are using your models metric or something equivalent on a validation dataset (to measure it's performance)for Cross Validation or something equivalent,You can iteratively start dropping columns and see imp colsorimp
$endgroup$
– Aditya
Oct 21 '18 at 2:00

|
show 2 more comments

I think they are 2 different things,

Lets start with Feature Selection:

This technique is used for selecting the features which explain the most of the target variable(has a correlation with the target variable).This test is ran just before the model is applied on the data.

To explain it better let us go by an example: there are 10 feature and 1 target variable, 9 features explain 90% of the target variable and 10 features together explains 91% of the target variable. So the 1 variable is not making much of a difference so you tend to remove that before modelling(It is subjective to the business as well). I can also be called as Predictor Importance.

Now lets talk about Feature Extraction,

Which is used in Unsupervised Learning,extraction of contours in images, extraction of Bi-grams from a text, extraction of phonemes from recording of spoken text.
When you don't know anything about the data like no data dictionary, too many features which means the data is not in understandable format. Then you try applying this technique to get some features which explains the most of the data. Feature extraction involves a transformation of the features, which often is not reversible because some information is lost in the process of dimensionality reduction.

You can apply Feature Extraction on the given data to extract features and then apply Feature Selection with respect to the Target Variable to select the subset which can help in making a good model with good results.

you can go through these Link-1,Link-2 for better understanding.

we can implement them in R, Python, SPSS.

let me know if need any more clarification.

edited Mar 13 '18 at 6:45

answered Mar 13 '18 at 6:15

Toros91

1,9612628

add a comment |

The two are very different: Feature Selection indeed reduces dimensions, but feature extraction adds dimensions which are computed from other features.

For panel or time series data, one usually has the datetime variable, and one does not want to train the dependent variable on the date itself as those do not occur in the future. So you should eliminate the datetime: feature elimination.

On the other hand, weekday/weekend day may be very relevant, so we need to compute the weekday status from the datetime: feature extraction.

answered Mar 13 '18 at 14:41

vinnief

1414

add a comment |

As Aditya said, there are 3 feature-related terms that sometimes are confused with each other. I will try and give summary explanation to each one of them:

Feature extraction: Generation of features from data that are in a format that is difficult to analyse directly/are not directly comparable (e.g. images, time-series, etc.) In the example of a time-series, some simple features could be for example: length of time-series, period, mean value, std, etc.

Feature transformation: Transformation of existing features in order to create new ones based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection: Selection of the features with the highest "importance"/influence on the target variable, from a set of existing features. This can be done with various techniques: e.g. Linear Regression, Decision Trees, calculation of "importance" weights (e.g. Fisher score, ReliefF)

If the only thing you want to achieve is dimensionality reduction in an existing dataset, you can use either feature transformation or feature selection methods. But if you need to know the physical interpretation of the features you identify as "important" or you are trying to limit the amount of data that need to be collected for your analysis (you need all the initial set of features for feature transformation), then only feature selection can work.

You can find more details on Feature Selection and Dimensionality Reduction in the following links:

A summary of Dimension Reduction methods

Classification and Feature Selection: A Review

Relevant question and answers in Stack Overflow

edited Mar 13 '18 at 14:41

Aditya

1,4101525

answered Mar 13 '18 at 14:05

missrg

36718

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29006%2ffeature-selection-vs-feature-extraction-which-to-use-when%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Adding to The answer given by Toros,

These(see below bullets) three are quite similar but with a subtle differences-:(concise and easy to remember)

feature extraction and feature engineering: transformation of raw data into features suitable for modeling;

feature transformation: transformation of data to improve the accuracy of the algorithm;

feature selection: removing unnecessary features.

Just to add an Example of the same,

Feature Extraction and Engineering(we can extract something from them)

Texts(ngrams, word2vec, tf-idf etc)

Images(CNN'S, texts, q&a)

Geospatial data(lat, long etc)

Date and time(day,month,week,year..)

Time series, web, etc...

Dimensional Reduction Techniques..

.....(And Many Others)

Feature transformations(transforming them to make sense)

Normalization and changing distribution(Scaling)

Interactions

Filling in the missing values(median filling etc)

.....(And Many Others)

Feature selection(building your model on these selected features)

Statistical approaches

Selection by modeling

Grid search

Cross Validation

.....(And Many Others)

Hope this helps...

Do look at the links shared by others.
They are Quite Nice...

edited Oct 25 '18 at 9:38

answered Mar 13 '18 at 10:00

Aditya

1,4101525

$begingroup$
nice way of answering +1 for that.
$endgroup$
– Toros91
Mar 14 '18 at 1:48

$begingroup$
Kudos to this community.. Learning a lot from it..
$endgroup$
– Aditya
Mar 14 '18 at 2:11

1

$begingroup$
True that man, I've been a member since October, 2017. I've learned a lot of things. Hope it be the same for you as well. I've been reading your answers, they are good .BTW sorry for the thing which you had gone through on SO. I couldn't see the whole thing but as Neil Slater said good that you kept your cool all the way till the end. Keep it up! We still have a long way to go. :)
$endgroup$
– Toros91
Mar 14 '18 at 2:19

$begingroup$
What's the order in which these should be processed? In addition to data cleaning and data splitting. Which out of the 5 is the first step?
$endgroup$
– technazi
Oct 20 '18 at 19:39

$begingroup$
Data splitting is done at the very end when you make sure that the data is ready to be sent for Modelling...And imho there's no such ordering for the above mentioned things because they overlap quite a few times(feature extraction, feature engineering, Feature transformation.) but Feature Selection is surely done after splitting the data into train as validation provided that you are using your models metric or something equivalent on a validation dataset (to measure it's performance)for Cross Validation or something equivalent,You can iteratively start dropping columns and see imp colsorimp
$endgroup$
– Aditya
Oct 21 '18 at 2:00

|
show 2 more comments

Adding to The answer given by Toros,

These(see below bullets) three are quite similar but with a subtle differences-:(concise and easy to remember)

feature extraction and feature engineering: transformation of raw data into features suitable for modeling;

feature transformation: transformation of data to improve the accuracy of the algorithm;

feature selection: removing unnecessary features.

Just to add an Example of the same,

Feature Extraction and Engineering(we can extract something from them)

Texts(ngrams, word2vec, tf-idf etc)

Images(CNN'S, texts, q&a)

Geospatial data(lat, long etc)

Date and time(day,month,week,year..)

Time series, web, etc...

Dimensional Reduction Techniques..

.....(And Many Others)

Feature transformations(transforming them to make sense)

Normalization and changing distribution(Scaling)

Interactions

Filling in the missing values(median filling etc)

.....(And Many Others)

Feature selection(building your model on these selected features)

Statistical approaches

Selection by modeling

Grid search

Cross Validation

.....(And Many Others)

Hope this helps...

Do look at the links shared by others.
They are Quite Nice...

edited Oct 25 '18 at 9:38

answered Mar 13 '18 at 10:00

Aditya

1,4101525

$begingroup$
nice way of answering +1 for that.
$endgroup$
– Toros91
Mar 14 '18 at 1:48

$begingroup$
Kudos to this community.. Learning a lot from it..
$endgroup$
– Aditya
Mar 14 '18 at 2:11

1

$begingroup$
True that man, I've been a member since October, 2017. I've learned a lot of things. Hope it be the same for you as well. I've been reading your answers, they are good .BTW sorry for the thing which you had gone through on SO. I couldn't see the whole thing but as Neil Slater said good that you kept your cool all the way till the end. Keep it up! We still have a long way to go. :)
$endgroup$
– Toros91
Mar 14 '18 at 2:19

$begingroup$
What's the order in which these should be processed? In addition to data cleaning and data splitting. Which out of the 5 is the first step?
$endgroup$
– technazi
Oct 20 '18 at 19:39

$begingroup$
Data splitting is done at the very end when you make sure that the data is ready to be sent for Modelling...And imho there's no such ordering for the above mentioned things because they overlap quite a few times(feature extraction, feature engineering, Feature transformation.) but Feature Selection is surely done after splitting the data into train as validation provided that you are using your models metric or something equivalent on a validation dataset (to measure it's performance)for Cross Validation or something equivalent,You can iteratively start dropping columns and see imp colsorimp
$endgroup$
– Aditya
Oct 21 '18 at 2:00

|
show 2 more comments

Adding to The answer given by Toros,

These(see below bullets) three are quite similar but with a subtle differences-:(concise and easy to remember)

feature extraction and feature engineering: transformation of raw data into features suitable for modeling;

feature transformation: transformation of data to improve the accuracy of the algorithm;

feature selection: removing unnecessary features.

Just to add an Example of the same,

Feature Extraction and Engineering(we can extract something from them)

Texts(ngrams, word2vec, tf-idf etc)

Images(CNN'S, texts, q&a)

Geospatial data(lat, long etc)

Date and time(day,month,week,year..)

Time series, web, etc...

Dimensional Reduction Techniques..

.....(And Many Others)

Feature transformations(transforming them to make sense)

Normalization and changing distribution(Scaling)

Interactions

Filling in the missing values(median filling etc)

.....(And Many Others)

Feature selection(building your model on these selected features)

Statistical approaches

Selection by modeling

Grid search

Cross Validation

.....(And Many Others)

Hope this helps...

Do look at the links shared by others.
They are Quite Nice...

edited Oct 25 '18 at 9:38

answered Mar 13 '18 at 10:00

Aditya

1,4101525

Adding to The answer given by Toros,

These(see below bullets) three are quite similar but with a subtle differences-:(concise and easy to remember)

feature extraction and feature engineering: transformation of raw data into features suitable for modeling;

feature transformation: transformation of data to improve the accuracy of the algorithm;

feature selection: removing unnecessary features.

Just to add an Example of the same,

Feature Extraction and Engineering(we can extract something from them)

Texts(ngrams, word2vec, tf-idf etc)

Images(CNN'S, texts, q&a)

Geospatial data(lat, long etc)

Date and time(day,month,week,year..)

Time series, web, etc...

Dimensional Reduction Techniques..

.....(And Many Others)

Feature transformations(transforming them to make sense)

Normalization and changing distribution(Scaling)

Interactions

Filling in the missing values(median filling etc)

.....(And Many Others)

Feature selection(building your model on these selected features)

Statistical approaches

Selection by modeling

Grid search

Cross Validation

.....(And Many Others)

Hope this helps...

Do look at the links shared by others.
They are Quite Nice...

edited Oct 25 '18 at 9:38

answered Mar 13 '18 at 10:00

Aditya

1,4101525

edited Oct 25 '18 at 9:38

answered Mar 13 '18 at 10:00

Aditya

1,4101525

answered Mar 13 '18 at 10:00

Aditya

1,4101525

answered Mar 13 '18 at 10:00

Aditya

1,4101525

$begingroup$
nice way of answering +1 for that.
$endgroup$
– Toros91
Mar 14 '18 at 1:48

$begingroup$
Kudos to this community.. Learning a lot from it..
$endgroup$
– Aditya
Mar 14 '18 at 2:11

1

$begingroup$
True that man, I've been a member since October, 2017. I've learned a lot of things. Hope it be the same for you as well. I've been reading your answers, they are good .BTW sorry for the thing which you had gone through on SO. I couldn't see the whole thing but as Neil Slater said good that you kept your cool all the way till the end. Keep it up! We still have a long way to go. :)
$endgroup$
– Toros91
Mar 14 '18 at 2:19

$begingroup$
What's the order in which these should be processed? In addition to data cleaning and data splitting. Which out of the 5 is the first step?
$endgroup$
– technazi
Oct 20 '18 at 19:39

$begingroup$
Data splitting is done at the very end when you make sure that the data is ready to be sent for Modelling...And imho there's no such ordering for the above mentioned things because they overlap quite a few times(feature extraction, feature engineering, Feature transformation.) but Feature Selection is surely done after splitting the data into train as validation provided that you are using your models metric or something equivalent on a validation dataset (to measure it's performance)for Cross Validation or something equivalent,You can iteratively start dropping columns and see imp colsorimp
$endgroup$
– Aditya
Oct 21 '18 at 2:00

|
show 2 more comments

$begingroup$
nice way of answering +1 for that.
$endgroup$
– Toros91
Mar 14 '18 at 1:48

$begingroup$
Kudos to this community.. Learning a lot from it..
$endgroup$
– Aditya
Mar 14 '18 at 2:11

1

$begingroup$
True that man, I've been a member since October, 2017. I've learned a lot of things. Hope it be the same for you as well. I've been reading your answers, they are good .BTW sorry for the thing which you had gone through on SO. I couldn't see the whole thing but as Neil Slater said good that you kept your cool all the way till the end. Keep it up! We still have a long way to go. :)
$endgroup$
– Toros91
Mar 14 '18 at 2:19

$begingroup$
What's the order in which these should be processed? In addition to data cleaning and data splitting. Which out of the 5 is the first step?
$endgroup$
– technazi
Oct 20 '18 at 19:39

$begingroup$
Data splitting is done at the very end when you make sure that the data is ready to be sent for Modelling...And imho there's no such ordering for the above mentioned things because they overlap quite a few times(feature extraction, feature engineering, Feature transformation.) but Feature Selection is surely done after splitting the data into train as validation provided that you are using your models metric or something equivalent on a validation dataset (to measure it's performance)for Cross Validation or something equivalent,You can iteratively start dropping columns and see imp colsorimp
$endgroup$
– Aditya
Oct 21 '18 at 2:00

nice way of answering +1 for that.

– Toros91
Mar 14 '18 at 1:48

Kudos to this community.. Learning a lot from it..

– Aditya
Mar 14 '18 at 2:11

True that man, I've been a member since October, 2017. I've learned a lot of things. Hope it be the same for you as well. I've been reading your answers, they are good .BTW sorry for the thing which you had gone through on SO. I couldn't see the whole thing but as Neil Slater said good that you kept your cool all the way till the end. Keep it up! We still have a long way to go. :)

– Toros91
Mar 14 '18 at 2:19

What's the order in which these should be processed? In addition to data cleaning and data splitting. Which out of the 5 is the first step?

– technazi
Oct 20 '18 at 19:39

Data splitting is done at the very end when you make sure that the data is ready to be sent for Modelling...And imho there's no such ordering for the above mentioned things because they overlap quite a few times(feature extraction, feature engineering, Feature transformation.) but Feature Selection is surely done after splitting the data into train as validation provided that you are using your models metric or something equivalent on a validation dataset (to measure it's performance)for Cross Validation or something equivalent,You can iteratively start dropping columns and see imp colsorimp

– Aditya
Oct 21 '18 at 2:00

|
show 2 more comments

I think they are 2 different things,

Lets start with Feature Selection:

Now lets talk about Feature Extraction,

you can go through these Link-1,Link-2 for better understanding.

we can implement them in R, Python, SPSS.

let me know if need any more clarification.

edited Mar 13 '18 at 6:45

answered Mar 13 '18 at 6:15

Toros91

1,9612628

add a comment |

I think they are 2 different things,

Lets start with Feature Selection:

Now lets talk about Feature Extraction,

you can go through these Link-1,Link-2 for better understanding.

we can implement them in R, Python, SPSS.

let me know if need any more clarification.

edited Mar 13 '18 at 6:45

answered Mar 13 '18 at 6:15

Toros91

1,9612628

add a comment |

I think they are 2 different things,

Lets start with Feature Selection:

Now lets talk about Feature Extraction,

you can go through these Link-1,Link-2 for better understanding.

we can implement them in R, Python, SPSS.

let me know if need any more clarification.

edited Mar 13 '18 at 6:45

answered Mar 13 '18 at 6:15

Toros91

1,9612628

I think they are 2 different things,

Lets start with Feature Selection:

Now lets talk about Feature Extraction,

you can go through these Link-1,Link-2 for better understanding.

we can implement them in R, Python, SPSS.

let me know if need any more clarification.

edited Mar 13 '18 at 6:45

answered Mar 13 '18 at 6:15

Toros91

1,9612628

edited Mar 13 '18 at 6:45

answered Mar 13 '18 at 6:15

Toros91

1,9612628

answered Mar 13 '18 at 6:15

Toros91

1,9612628

answered Mar 13 '18 at 6:15

Toros91

1,9612628

add a comment |

The two are very different: Feature Selection indeed reduces dimensions, but feature extraction adds dimensions which are computed from other features.

On the other hand, weekday/weekend day may be very relevant, so we need to compute the weekday status from the datetime: feature extraction.

answered Mar 13 '18 at 14:41

vinnief

1414

add a comment |

The two are very different: Feature Selection indeed reduces dimensions, but feature extraction adds dimensions which are computed from other features.

On the other hand, weekday/weekend day may be very relevant, so we need to compute the weekday status from the datetime: feature extraction.

answered Mar 13 '18 at 14:41

vinnief

1414

add a comment |

The two are very different: Feature Selection indeed reduces dimensions, but feature extraction adds dimensions which are computed from other features.

On the other hand, weekday/weekend day may be very relevant, so we need to compute the weekday status from the datetime: feature extraction.

answered Mar 13 '18 at 14:41

vinnief

1414

The two are very different: Feature Selection indeed reduces dimensions, but feature extraction adds dimensions which are computed from other features.

On the other hand, weekday/weekend day may be very relevant, so we need to compute the weekday status from the datetime: feature extraction.

answered Mar 13 '18 at 14:41

vinnief

1414

answered Mar 13 '18 at 14:41

vinnief

1414

answered Mar 13 '18 at 14:41

vinnief

1414

answered Mar 13 '18 at 14:41

vinnief

1414

add a comment |

As Aditya said, there are 3 feature-related terms that sometimes are confused with each other. I will try and give summary explanation to each one of them:

Feature extraction: Generation of features from data that are in a format that is difficult to analyse directly/are not directly comparable (e.g. images, time-series, etc.) In the example of a time-series, some simple features could be for example: length of time-series, period, mean value, std, etc.

Feature transformation: Transformation of existing features in order to create new ones based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection: Selection of the features with the highest "importance"/influence on the target variable, from a set of existing features. This can be done with various techniques: e.g. Linear Regression, Decision Trees, calculation of "importance" weights (e.g. Fisher score, ReliefF)

You can find more details on Feature Selection and Dimensionality Reduction in the following links:

A summary of Dimension Reduction methods

Classification and Feature Selection: A Review

Relevant question and answers in Stack Overflow

edited Mar 13 '18 at 14:41

Aditya

1,4101525

answered Mar 13 '18 at 14:05

missrg

36718

add a comment |

As Aditya said, there are 3 feature-related terms that sometimes are confused with each other. I will try and give summary explanation to each one of them:

Feature extraction: Generation of features from data that are in a format that is difficult to analyse directly/are not directly comparable (e.g. images, time-series, etc.) In the example of a time-series, some simple features could be for example: length of time-series, period, mean value, std, etc.

Feature transformation: Transformation of existing features in order to create new ones based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection: Selection of the features with the highest "importance"/influence on the target variable, from a set of existing features. This can be done with various techniques: e.g. Linear Regression, Decision Trees, calculation of "importance" weights (e.g. Fisher score, ReliefF)

You can find more details on Feature Selection and Dimensionality Reduction in the following links:

A summary of Dimension Reduction methods

Classification and Feature Selection: A Review

Relevant question and answers in Stack Overflow

edited Mar 13 '18 at 14:41

Aditya

1,4101525

answered Mar 13 '18 at 14:05

missrg

36718

add a comment |

As Aditya said, there are 3 feature-related terms that sometimes are confused with each other. I will try and give summary explanation to each one of them:

Feature extraction: Generation of features from data that are in a format that is difficult to analyse directly/are not directly comparable (e.g. images, time-series, etc.) In the example of a time-series, some simple features could be for example: length of time-series, period, mean value, std, etc.

Feature transformation: Transformation of existing features in order to create new ones based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection: Selection of the features with the highest "importance"/influence on the target variable, from a set of existing features. This can be done with various techniques: e.g. Linear Regression, Decision Trees, calculation of "importance" weights (e.g. Fisher score, ReliefF)

You can find more details on Feature Selection and Dimensionality Reduction in the following links:

A summary of Dimension Reduction methods

Classification and Feature Selection: A Review

Relevant question and answers in Stack Overflow

edited Mar 13 '18 at 14:41

Aditya

1,4101525

answered Mar 13 '18 at 14:05

missrg

36718

As Aditya said, there are 3 feature-related terms that sometimes are confused with each other. I will try and give summary explanation to each one of them:

Feature extraction: Generation of features from data that are in a format that is difficult to analyse directly/are not directly comparable (e.g. images, time-series, etc.) In the example of a time-series, some simple features could be for example: length of time-series, period, mean value, std, etc.

Feature transformation: Transformation of existing features in order to create new ones based on the old ones. A very popularly used technique for dimensionality reduction is Principal Component Analysis (pca) that uses some orthogonal transformation in order to produce a set of linearly non-correlated variables based on the initial set of variables.

Feature selection: Selection of the features with the highest "importance"/influence on the target variable, from a set of existing features. This can be done with various techniques: e.g. Linear Regression, Decision Trees, calculation of "importance" weights (e.g. Fisher score, ReliefF)

You can find more details on Feature Selection and Dimensionality Reduction in the following links:

A summary of Dimension Reduction methods

Classification and Feature Selection: A Review

Relevant question and answers in Stack Overflow

edited Mar 13 '18 at 14:41

Aditya

1,4101525

answered Mar 13 '18 at 14:05

missrg

36718

edited Mar 13 '18 at 14:41

Aditya

1,4101525

edited Mar 13 '18 at 14:41

Aditya

1,4101525

edited Mar 13 '18 at 14:41

Aditya

1,4101525

answered Mar 13 '18 at 14:05

missrg

36718

answered Mar 13 '18 at 14:05

missrg

36718

answered Mar 13 '18 at 14:05

missrg

36718

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk