What is dimensionality reduction? What is the difference between feature selection and extraction?












56












$begingroup$


From wikipedia,




dimensionality reduction or dimension reduction is the process of
reducing the number of random variables under consideration, and
can be divided into feature selection and feature extraction.




What is the difference between feature selection and feature extraction?



What is an example of dimensionality reduction in a Natural Language Processing task?










share|improve this question











$endgroup$

















    56












    $begingroup$


    From wikipedia,




    dimensionality reduction or dimension reduction is the process of
    reducing the number of random variables under consideration, and
    can be divided into feature selection and feature extraction.




    What is the difference between feature selection and feature extraction?



    What is an example of dimensionality reduction in a Natural Language Processing task?










    share|improve this question











    $endgroup$















      56












      56








      56


      23



      $begingroup$


      From wikipedia,




      dimensionality reduction or dimension reduction is the process of
      reducing the number of random variables under consideration, and
      can be divided into feature selection and feature extraction.




      What is the difference between feature selection and feature extraction?



      What is an example of dimensionality reduction in a Natural Language Processing task?










      share|improve this question











      $endgroup$




      From wikipedia,




      dimensionality reduction or dimension reduction is the process of
      reducing the number of random variables under consideration, and
      can be divided into feature selection and feature extraction.




      What is the difference between feature selection and feature extraction?



      What is an example of dimensionality reduction in a Natural Language Processing task?







      feature-selection feature-extraction dimensionality-reduction






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 8 '15 at 15:46









      DaL

      2,174410




      2,174410










      asked May 18 '14 at 6:26









      alvasalvas

      77631229




      77631229






















          11 Answers
          11






          active

          oldest

          votes


















          50












          $begingroup$

          Simply put:




          • feature selection: you select a subset of the original feature set; while

          • feature extraction: you build a new set of features from the original feature set.


          Examples of feature extraction: extraction of contours in images, extraction of digrams from a text, extraction of phonemes from recording of spoken text, etc.



          Feature extraction involves a transformation of the features, which often is not reversible because some information is lost in the process of dimensionality reduction.






          share|improve this answer









          $endgroup$









          • 2




            $begingroup$
            Both of these fall into the category of feature engineering as they involve manually creating or selecting features. Dimensionality reduction typically involves a change of basis or some other mathematical re-representation of the data
            $endgroup$
            – ragingSloth
            Jun 16 '14 at 21:05






          • 1




            $begingroup$
            @ragingSloth, I think the first one is definitely feature selection - and not feature engineering. While image and text processing examples indeed seem to be feature engineering
            $endgroup$
            – Alexey Grigorev
            Oct 2 '15 at 7:26










          • $begingroup$
            The way I found it, for some feature extractions you can still reconstruct the original dimensions approximately. But for feature selection, there is no reconstruction, as you have removed the useless dimensions.
            $endgroup$
            – Babak
            Jan 15 at 13:03



















          16












          $begingroup$

          Dimensionality reduction is typically choosing a basis or mathematical representation within which you can describe most but not all of the variance within your data, thereby retaining the relevant information, while reducing the amount of information necessary to represent it. There are a variety of techniques for doing this including but not limited to PCA, ICA, and Matrix Feature Factorization. These will take existing data and reduce it to the most discriminative components.These all allow you to represent most of the information in your dataset with fewer, more discriminative features.



          Feature Selection is hand selecting features which are highly discriminative. This has a lot more to do with feature engineering than analysis, and requires significantly more work on the part of the data scientist. It requires an understanding of what aspects of your dataset are important in whatever predictions you're making, and which aren't. Feature extraction usually involves generating new features which are composites of existing features. Both of these techniques fall into the category of feature engineering. Generally feature engineering is important if you want to obtain the best results, as it involves creating information that may not exist in your dataset, and increasing your signal to noise ratio.






          share|improve this answer











          $endgroup$









          • 2




            $begingroup$
            I agree mostly, with a precision: Feature selection needs not be done by hand, it can be automatic. See for instance the Lasso method (en.wikipedia.org/wiki/Least_squares#Lasso_method).
            $endgroup$
            – jrouquie
            Sep 29 '14 at 9:00










          • $begingroup$
            I agree with your Dimensionality Reduction clause but differ a bit on Feature Engineering usage - which from what I've seen is only Feature Extraction: Feature Selection is considered separately. It's just a difference in terminology.
            $endgroup$
            – javadba
            Dec 3 '17 at 23:30



















          7












          $begingroup$

          As in @damienfrancois answer feature selection is about selecting a subset of features. So in NLP it would be selecting a set of specific words (the typical in NLP is that each word represents a feature with value equal to the frequency of the word or some other weight based on TF/IDF or similar).



          Dimensionality reduction is the introduction of new feature space where the original features are represented. The new space is of lower dimension that the original space. In case of text an example would be the hashing trick where a piece of text is reduced to a vector of few bits (say 16 or 32) or bytes. The amazing thing is that the geometry of the space is preserved (given enough bits), so relative distances between documents remain the same as in the original space, so you can deploy standard machine learning techniques without having to deal with unbound (and huge number of) dimensions found in text.






          share|improve this answer











          $endgroup$





















            5












            $begingroup$

            Feature selection is about choosing some of features based on some statistical score but feature extraction is using techniques to extract some second layer information from the data e.g. interesting frequencies of a signal using Fourier transform.



            Dimensionality reduction is all about transforming data into a low-dimensional space in which data preserves its euclidean structure but does not suffer from curse of dimensionality.
            For instance assume you extract some word features $[x_1,...,x_n]$ from a data set where each document can be modeled as a point in n-dimensional space and n is too large (a toy example). In this case many algorithms do not work according to the distance distortion of high-dimensional space. Now you need to reduce dimensionality by either selecting most informative features or transforming them into a low-dimensional manifold using dimensionality reduction methods e.g. PCA, LLE, etc.






            share|improve this answer









            $endgroup$













            • $begingroup$
              Out of the answers available this one best matches what I've seen in several Data Science and ML Platform teams
              $endgroup$
              – javadba
              Dec 3 '17 at 23:28



















            3












            $begingroup$

            To complete Damien's answer, an example of dimensionality reduction in NLP is a topic model, where you represent the document by a vector indicating the weights of its constituent topics.






            share|improve this answer









            $endgroup$





















              2












              $begingroup$

              For a proper review and definition you may take a look at Dimension Reduction vs. Variable Selection also in the book Feature Extraction Foundations and Applications
              feature extraction is decomposed in to two steps: feature construction and feature selection.






              share|improve this answer









              $endgroup$





















                2












                $begingroup$

                A1. What is dimensionality reduction:
                If you think of data in a matrix, where rows are instances and columns are attributes (or features), then dimensionality reduction is mapping this data matrix to a new matrix with fewer columns. For visualization, if you think of each matrix-column (attribute) as a dimension in feature space, then dimensionality reduction is projection of instances from the higher dimensional space (more columns) to a lower dimensional sub-space (fewer columns).
                Dimensionality reduction is subspace projection
                Typical objective for this transformation is (1) preserving information in the data matrix, while reducing computational complexity; (2) improving separability of different classes in data.



                A2. Dimensionality reduction as feature selection or feature extraction:
                I'll use the ubiquitous Iris dataset, which is arguably the 'hello world' of data science. Briefly, the Iris dataset has 3 classes and 4 attributes (columns). I'll illustrate feature selection and extraction for the task of reducing Iris dataset dimensionality from 4 to 2.



                I compute pair-wise co-variance of this dataset using library in Python called seaborn. The code is: sns.pairplot(iris, hue="species", markers=["o", "s", "D"]) The figure I get is
                Iris pair-plot
                I can select the pair of attributes (2 dimensions) that provide me the greatest separation between the 3 classes (species) in the Iris dataset. This would be a case of feature-selection.



                Next up is feature extraction. Herein, I am projecting the 4-dimensional feature space of Iris to a new 2-dimensional subspace, which is not axis aligned with the original space. These are new attributes. They are typically based on the distribution in the original high dimensional space.
                The most popular method is Principal Component Analysis, which computes Eigenvectors in the original space.
                PCA using SVD
                Obviously, we are not restricted to using only a linear and global projection to a subspace based on Eigenvectors. We can use non-linear projection methods as well.
                Here is an example of non-linear PCA using neural networks
                non-linear PCA using NN
                The attributes (dimensions) in the last example are extracted from the original 4 attributes using neural networks. You can experiment with various flavors of PCA for iris dataset youself using this pca methods code.



                Summary:
                While feature extraction methods may appear to be superior in performance to feature selection, the choice is predicated by the application. The attributes from feature extraction typically lose physical interpretation, which may or may not be an issue based on the task at hand. For example, if you are designing a very expensive data collection task with costly sensors and need to economize on the attributes (number of different sensors), you'd want to collect a small pilot sample using all available sensors and then select the ones that are most informative for the big data collection task.






                share|improve this answer









                $endgroup$





















                  0












                  $begingroup$

                  Several great answers on here, in particular, @damienfrancois's answer very succinctly captures the general idea.



                  However, I don't see any examples of feature engineering for relational or time-series data. In that case, data scientists generally extract statistical patterns across relationships and over time. For instance, in order to predict what customers will by in the future in an ecommerce database, one might extract quantities like the average historical purchase amount, or the frequency of prior purchases.



                  I wrote a piece on this topic that goes into much more detail with several examples here: https://www.featurelabs.com/blog/feature-engineering-vs-feature-selection/






                  share|improve this answer









                  $endgroup$





















                    0












                    $begingroup$

                    Let me start with reverse order which feature extraction and why there is need of feature selection and dimensionality reduction.



                    Starting with the usage of feature extraction which is mainly for classification purposes. The classification is the process of making a decision on which category particular object belongs. It has two phases i) training phase, where given the data or objects their properties are learned using some process (feature extraction) ii) testing phase, where the unknown object is classified using the features learned in the previous (training) phase.



                    Feature extraction as the name suggests given the data aim is to find the underlying pattern. This underlying pattern which is term as feature corresponding to that respective data. There are various methodologies existing for feature extraction such as Support Vector Machine(SVM).



                    Now, feature extraction should generate features which should be




                    • robust

                    • discriminative

                    • optimal set of features


                    Feature Selection: A specific set of data can be represented either by a single feature or set of features. In the classification process, a system is trained for at least two classes. So the training system will either generate a single feature or set of features. These features should possess the properties stated above.



                    The problem comes when there is a feature set for each class and there exists correlation between some of the features. That implies among those correlating features one or few are sufficient for representation and that is where feature selection comes in to picture. Also, these features need to be stored with the increase in feature set memory requirement also increases.



                    Then comes the dimensionality reduction which is nothing but the part of feature selection process. It is the process of choosing the optimal set of features which best describe the data. There are many techniques for the same such as principal component analysis, independent component analysis, and matrix factorization etc.






                    share|improve this answer









                    $endgroup$





















                      0












                      $begingroup$

                      Extracted from Hands-on machine learning with scikit-learn & Tensorflow




                      1. Data cleaning:
                        Fix or remove outliers (optional).
                        Fill in missing values (e.g., with zero, mean, median…) or drop their rows (or columns).

                      2. Feature selection (optional):
                        Drop the attributes that provide no useful information for the task.

                      3. Feature engineering, where appropriate:
                        Discretize continuous features.
                        Decompose features (e.g., categorical, date/time, etc.).
                        Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
                        Aggregate features into promising new features.

                      4. Feature scaling: standardize or normalize features.






                      share|improve this answer








                      New contributor




                      Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$





















                        -3












                        $begingroup$

                        For example...if u have an agricultural land then selecting one particular area of that land would be feature selection.If u aim to find the affected plants in that area den u need to observe each plant based on a particular feature that is common in each plant so as to find the abnormalities...for this u would be considering feature extraction.In this example the original agricultural land corresponds to Dimensionality reduction.






                        share|improve this answer











                        $endgroup$













                        • $begingroup$
                          No, it has nothing to do with spatial data in particular. It's applicable to temporal, spatio-temporal, and other sorts of data too.
                          $endgroup$
                          – Emre
                          Jun 21 '14 at 6:10











                        Your Answer





                        StackExchange.ifUsing("editor", function () {
                        return StackExchange.using("mathjaxEditing", function () {
                        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                        });
                        });
                        }, "mathjax-editing");

                        StackExchange.ready(function() {
                        var channelOptions = {
                        tags: "".split(" "),
                        id: "557"
                        };
                        initTagRenderer("".split(" "), "".split(" "), channelOptions);

                        StackExchange.using("externalEditor", function() {
                        // Have to fire editor after snippets, if snippets enabled
                        if (StackExchange.settings.snippets.snippetsEnabled) {
                        StackExchange.using("snippets", function() {
                        createEditor();
                        });
                        }
                        else {
                        createEditor();
                        }
                        });

                        function createEditor() {
                        StackExchange.prepareEditor({
                        heartbeatType: 'answer',
                        autoActivateHeartbeat: false,
                        convertImagesToLinks: false,
                        noModals: true,
                        showLowRepImageUploadWarning: true,
                        reputationToPostImages: null,
                        bindNavPrevention: true,
                        postfix: "",
                        imageUploader: {
                        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                        allowUrls: true
                        },
                        onDemand: true,
                        discardSelector: ".discard-answer"
                        ,immediatelyShowMarkdownHelp:true
                        });


                        }
                        });














                        draft saved

                        draft discarded


















                        StackExchange.ready(
                        function () {
                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f130%2fwhat-is-dimensionality-reduction-what-is-the-difference-between-feature-selecti%23new-answer', 'question_page');
                        }
                        );

                        Post as a guest















                        Required, but never shown

























                        11 Answers
                        11






                        active

                        oldest

                        votes








                        11 Answers
                        11






                        active

                        oldest

                        votes









                        active

                        oldest

                        votes






                        active

                        oldest

                        votes









                        50












                        $begingroup$

                        Simply put:




                        • feature selection: you select a subset of the original feature set; while

                        • feature extraction: you build a new set of features from the original feature set.


                        Examples of feature extraction: extraction of contours in images, extraction of digrams from a text, extraction of phonemes from recording of spoken text, etc.



                        Feature extraction involves a transformation of the features, which often is not reversible because some information is lost in the process of dimensionality reduction.






                        share|improve this answer









                        $endgroup$









                        • 2




                          $begingroup$
                          Both of these fall into the category of feature engineering as they involve manually creating or selecting features. Dimensionality reduction typically involves a change of basis or some other mathematical re-representation of the data
                          $endgroup$
                          – ragingSloth
                          Jun 16 '14 at 21:05






                        • 1




                          $begingroup$
                          @ragingSloth, I think the first one is definitely feature selection - and not feature engineering. While image and text processing examples indeed seem to be feature engineering
                          $endgroup$
                          – Alexey Grigorev
                          Oct 2 '15 at 7:26










                        • $begingroup$
                          The way I found it, for some feature extractions you can still reconstruct the original dimensions approximately. But for feature selection, there is no reconstruction, as you have removed the useless dimensions.
                          $endgroup$
                          – Babak
                          Jan 15 at 13:03
















                        50












                        $begingroup$

                        Simply put:




                        • feature selection: you select a subset of the original feature set; while

                        • feature extraction: you build a new set of features from the original feature set.


                        Examples of feature extraction: extraction of contours in images, extraction of digrams from a text, extraction of phonemes from recording of spoken text, etc.



                        Feature extraction involves a transformation of the features, which often is not reversible because some information is lost in the process of dimensionality reduction.






                        share|improve this answer









                        $endgroup$









                        • 2




                          $begingroup$
                          Both of these fall into the category of feature engineering as they involve manually creating or selecting features. Dimensionality reduction typically involves a change of basis or some other mathematical re-representation of the data
                          $endgroup$
                          – ragingSloth
                          Jun 16 '14 at 21:05






                        • 1




                          $begingroup$
                          @ragingSloth, I think the first one is definitely feature selection - and not feature engineering. While image and text processing examples indeed seem to be feature engineering
                          $endgroup$
                          – Alexey Grigorev
                          Oct 2 '15 at 7:26










                        • $begingroup$
                          The way I found it, for some feature extractions you can still reconstruct the original dimensions approximately. But for feature selection, there is no reconstruction, as you have removed the useless dimensions.
                          $endgroup$
                          – Babak
                          Jan 15 at 13:03














                        50












                        50








                        50





                        $begingroup$

                        Simply put:




                        • feature selection: you select a subset of the original feature set; while

                        • feature extraction: you build a new set of features from the original feature set.


                        Examples of feature extraction: extraction of contours in images, extraction of digrams from a text, extraction of phonemes from recording of spoken text, etc.



                        Feature extraction involves a transformation of the features, which often is not reversible because some information is lost in the process of dimensionality reduction.






                        share|improve this answer









                        $endgroup$



                        Simply put:




                        • feature selection: you select a subset of the original feature set; while

                        • feature extraction: you build a new set of features from the original feature set.


                        Examples of feature extraction: extraction of contours in images, extraction of digrams from a text, extraction of phonemes from recording of spoken text, etc.



                        Feature extraction involves a transformation of the features, which often is not reversible because some information is lost in the process of dimensionality reduction.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered May 18 '14 at 7:53









                        damienfrancoisdamienfrancois

                        1,386107




                        1,386107








                        • 2




                          $begingroup$
                          Both of these fall into the category of feature engineering as they involve manually creating or selecting features. Dimensionality reduction typically involves a change of basis or some other mathematical re-representation of the data
                          $endgroup$
                          – ragingSloth
                          Jun 16 '14 at 21:05






                        • 1




                          $begingroup$
                          @ragingSloth, I think the first one is definitely feature selection - and not feature engineering. While image and text processing examples indeed seem to be feature engineering
                          $endgroup$
                          – Alexey Grigorev
                          Oct 2 '15 at 7:26










                        • $begingroup$
                          The way I found it, for some feature extractions you can still reconstruct the original dimensions approximately. But for feature selection, there is no reconstruction, as you have removed the useless dimensions.
                          $endgroup$
                          – Babak
                          Jan 15 at 13:03














                        • 2




                          $begingroup$
                          Both of these fall into the category of feature engineering as they involve manually creating or selecting features. Dimensionality reduction typically involves a change of basis or some other mathematical re-representation of the data
                          $endgroup$
                          – ragingSloth
                          Jun 16 '14 at 21:05






                        • 1




                          $begingroup$
                          @ragingSloth, I think the first one is definitely feature selection - and not feature engineering. While image and text processing examples indeed seem to be feature engineering
                          $endgroup$
                          – Alexey Grigorev
                          Oct 2 '15 at 7:26










                        • $begingroup$
                          The way I found it, for some feature extractions you can still reconstruct the original dimensions approximately. But for feature selection, there is no reconstruction, as you have removed the useless dimensions.
                          $endgroup$
                          – Babak
                          Jan 15 at 13:03








                        2




                        2




                        $begingroup$
                        Both of these fall into the category of feature engineering as they involve manually creating or selecting features. Dimensionality reduction typically involves a change of basis or some other mathematical re-representation of the data
                        $endgroup$
                        – ragingSloth
                        Jun 16 '14 at 21:05




                        $begingroup$
                        Both of these fall into the category of feature engineering as they involve manually creating or selecting features. Dimensionality reduction typically involves a change of basis or some other mathematical re-representation of the data
                        $endgroup$
                        – ragingSloth
                        Jun 16 '14 at 21:05




                        1




                        1




                        $begingroup$
                        @ragingSloth, I think the first one is definitely feature selection - and not feature engineering. While image and text processing examples indeed seem to be feature engineering
                        $endgroup$
                        – Alexey Grigorev
                        Oct 2 '15 at 7:26




                        $begingroup$
                        @ragingSloth, I think the first one is definitely feature selection - and not feature engineering. While image and text processing examples indeed seem to be feature engineering
                        $endgroup$
                        – Alexey Grigorev
                        Oct 2 '15 at 7:26












                        $begingroup$
                        The way I found it, for some feature extractions you can still reconstruct the original dimensions approximately. But for feature selection, there is no reconstruction, as you have removed the useless dimensions.
                        $endgroup$
                        – Babak
                        Jan 15 at 13:03




                        $begingroup$
                        The way I found it, for some feature extractions you can still reconstruct the original dimensions approximately. But for feature selection, there is no reconstruction, as you have removed the useless dimensions.
                        $endgroup$
                        – Babak
                        Jan 15 at 13:03











                        16












                        $begingroup$

                        Dimensionality reduction is typically choosing a basis or mathematical representation within which you can describe most but not all of the variance within your data, thereby retaining the relevant information, while reducing the amount of information necessary to represent it. There are a variety of techniques for doing this including but not limited to PCA, ICA, and Matrix Feature Factorization. These will take existing data and reduce it to the most discriminative components.These all allow you to represent most of the information in your dataset with fewer, more discriminative features.



                        Feature Selection is hand selecting features which are highly discriminative. This has a lot more to do with feature engineering than analysis, and requires significantly more work on the part of the data scientist. It requires an understanding of what aspects of your dataset are important in whatever predictions you're making, and which aren't. Feature extraction usually involves generating new features which are composites of existing features. Both of these techniques fall into the category of feature engineering. Generally feature engineering is important if you want to obtain the best results, as it involves creating information that may not exist in your dataset, and increasing your signal to noise ratio.






                        share|improve this answer











                        $endgroup$









                        • 2




                          $begingroup$
                          I agree mostly, with a precision: Feature selection needs not be done by hand, it can be automatic. See for instance the Lasso method (en.wikipedia.org/wiki/Least_squares#Lasso_method).
                          $endgroup$
                          – jrouquie
                          Sep 29 '14 at 9:00










                        • $begingroup$
                          I agree with your Dimensionality Reduction clause but differ a bit on Feature Engineering usage - which from what I've seen is only Feature Extraction: Feature Selection is considered separately. It's just a difference in terminology.
                          $endgroup$
                          – javadba
                          Dec 3 '17 at 23:30
















                        16












                        $begingroup$

                        Dimensionality reduction is typically choosing a basis or mathematical representation within which you can describe most but not all of the variance within your data, thereby retaining the relevant information, while reducing the amount of information necessary to represent it. There are a variety of techniques for doing this including but not limited to PCA, ICA, and Matrix Feature Factorization. These will take existing data and reduce it to the most discriminative components.These all allow you to represent most of the information in your dataset with fewer, more discriminative features.



                        Feature Selection is hand selecting features which are highly discriminative. This has a lot more to do with feature engineering than analysis, and requires significantly more work on the part of the data scientist. It requires an understanding of what aspects of your dataset are important in whatever predictions you're making, and which aren't. Feature extraction usually involves generating new features which are composites of existing features. Both of these techniques fall into the category of feature engineering. Generally feature engineering is important if you want to obtain the best results, as it involves creating information that may not exist in your dataset, and increasing your signal to noise ratio.






                        share|improve this answer











                        $endgroup$









                        • 2




                          $begingroup$
                          I agree mostly, with a precision: Feature selection needs not be done by hand, it can be automatic. See for instance the Lasso method (en.wikipedia.org/wiki/Least_squares#Lasso_method).
                          $endgroup$
                          – jrouquie
                          Sep 29 '14 at 9:00










                        • $begingroup$
                          I agree with your Dimensionality Reduction clause but differ a bit on Feature Engineering usage - which from what I've seen is only Feature Extraction: Feature Selection is considered separately. It's just a difference in terminology.
                          $endgroup$
                          – javadba
                          Dec 3 '17 at 23:30














                        16












                        16








                        16





                        $begingroup$

                        Dimensionality reduction is typically choosing a basis or mathematical representation within which you can describe most but not all of the variance within your data, thereby retaining the relevant information, while reducing the amount of information necessary to represent it. There are a variety of techniques for doing this including but not limited to PCA, ICA, and Matrix Feature Factorization. These will take existing data and reduce it to the most discriminative components.These all allow you to represent most of the information in your dataset with fewer, more discriminative features.



                        Feature Selection is hand selecting features which are highly discriminative. This has a lot more to do with feature engineering than analysis, and requires significantly more work on the part of the data scientist. It requires an understanding of what aspects of your dataset are important in whatever predictions you're making, and which aren't. Feature extraction usually involves generating new features which are composites of existing features. Both of these techniques fall into the category of feature engineering. Generally feature engineering is important if you want to obtain the best results, as it involves creating information that may not exist in your dataset, and increasing your signal to noise ratio.






                        share|improve this answer











                        $endgroup$



                        Dimensionality reduction is typically choosing a basis or mathematical representation within which you can describe most but not all of the variance within your data, thereby retaining the relevant information, while reducing the amount of information necessary to represent it. There are a variety of techniques for doing this including but not limited to PCA, ICA, and Matrix Feature Factorization. These will take existing data and reduce it to the most discriminative components.These all allow you to represent most of the information in your dataset with fewer, more discriminative features.



                        Feature Selection is hand selecting features which are highly discriminative. This has a lot more to do with feature engineering than analysis, and requires significantly more work on the part of the data scientist. It requires an understanding of what aspects of your dataset are important in whatever predictions you're making, and which aren't. Feature extraction usually involves generating new features which are composites of existing features. Both of these techniques fall into the category of feature engineering. Generally feature engineering is important if you want to obtain the best results, as it involves creating information that may not exist in your dataset, and increasing your signal to noise ratio.







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Jun 16 '14 at 21:44

























                        answered Jun 16 '14 at 19:49









                        ragingSlothragingSloth

                        8232915




                        8232915








                        • 2




                          $begingroup$
                          I agree mostly, with a precision: Feature selection needs not be done by hand, it can be automatic. See for instance the Lasso method (en.wikipedia.org/wiki/Least_squares#Lasso_method).
                          $endgroup$
                          – jrouquie
                          Sep 29 '14 at 9:00










                        • $begingroup$
                          I agree with your Dimensionality Reduction clause but differ a bit on Feature Engineering usage - which from what I've seen is only Feature Extraction: Feature Selection is considered separately. It's just a difference in terminology.
                          $endgroup$
                          – javadba
                          Dec 3 '17 at 23:30














                        • 2




                          $begingroup$
                          I agree mostly, with a precision: Feature selection needs not be done by hand, it can be automatic. See for instance the Lasso method (en.wikipedia.org/wiki/Least_squares#Lasso_method).
                          $endgroup$
                          – jrouquie
                          Sep 29 '14 at 9:00










                        • $begingroup$
                          I agree with your Dimensionality Reduction clause but differ a bit on Feature Engineering usage - which from what I've seen is only Feature Extraction: Feature Selection is considered separately. It's just a difference in terminology.
                          $endgroup$
                          – javadba
                          Dec 3 '17 at 23:30








                        2




                        2




                        $begingroup$
                        I agree mostly, with a precision: Feature selection needs not be done by hand, it can be automatic. See for instance the Lasso method (en.wikipedia.org/wiki/Least_squares#Lasso_method).
                        $endgroup$
                        – jrouquie
                        Sep 29 '14 at 9:00




                        $begingroup$
                        I agree mostly, with a precision: Feature selection needs not be done by hand, it can be automatic. See for instance the Lasso method (en.wikipedia.org/wiki/Least_squares#Lasso_method).
                        $endgroup$
                        – jrouquie
                        Sep 29 '14 at 9:00












                        $begingroup$
                        I agree with your Dimensionality Reduction clause but differ a bit on Feature Engineering usage - which from what I've seen is only Feature Extraction: Feature Selection is considered separately. It's just a difference in terminology.
                        $endgroup$
                        – javadba
                        Dec 3 '17 at 23:30




                        $begingroup$
                        I agree with your Dimensionality Reduction clause but differ a bit on Feature Engineering usage - which from what I've seen is only Feature Extraction: Feature Selection is considered separately. It's just a difference in terminology.
                        $endgroup$
                        – javadba
                        Dec 3 '17 at 23:30











                        7












                        $begingroup$

                        As in @damienfrancois answer feature selection is about selecting a subset of features. So in NLP it would be selecting a set of specific words (the typical in NLP is that each word represents a feature with value equal to the frequency of the word or some other weight based on TF/IDF or similar).



                        Dimensionality reduction is the introduction of new feature space where the original features are represented. The new space is of lower dimension that the original space. In case of text an example would be the hashing trick where a piece of text is reduced to a vector of few bits (say 16 or 32) or bytes. The amazing thing is that the geometry of the space is preserved (given enough bits), so relative distances between documents remain the same as in the original space, so you can deploy standard machine learning techniques without having to deal with unbound (and huge number of) dimensions found in text.






                        share|improve this answer











                        $endgroup$


















                          7












                          $begingroup$

                          As in @damienfrancois answer feature selection is about selecting a subset of features. So in NLP it would be selecting a set of specific words (the typical in NLP is that each word represents a feature with value equal to the frequency of the word or some other weight based on TF/IDF or similar).



                          Dimensionality reduction is the introduction of new feature space where the original features are represented. The new space is of lower dimension that the original space. In case of text an example would be the hashing trick where a piece of text is reduced to a vector of few bits (say 16 or 32) or bytes. The amazing thing is that the geometry of the space is preserved (given enough bits), so relative distances between documents remain the same as in the original space, so you can deploy standard machine learning techniques without having to deal with unbound (and huge number of) dimensions found in text.






                          share|improve this answer











                          $endgroup$
















                            7












                            7








                            7





                            $begingroup$

                            As in @damienfrancois answer feature selection is about selecting a subset of features. So in NLP it would be selecting a set of specific words (the typical in NLP is that each word represents a feature with value equal to the frequency of the word or some other weight based on TF/IDF or similar).



                            Dimensionality reduction is the introduction of new feature space where the original features are represented. The new space is of lower dimension that the original space. In case of text an example would be the hashing trick where a piece of text is reduced to a vector of few bits (say 16 or 32) or bytes. The amazing thing is that the geometry of the space is preserved (given enough bits), so relative distances between documents remain the same as in the original space, so you can deploy standard machine learning techniques without having to deal with unbound (and huge number of) dimensions found in text.






                            share|improve this answer











                            $endgroup$



                            As in @damienfrancois answer feature selection is about selecting a subset of features. So in NLP it would be selecting a set of specific words (the typical in NLP is that each word represents a feature with value equal to the frequency of the word or some other weight based on TF/IDF or similar).



                            Dimensionality reduction is the introduction of new feature space where the original features are represented. The new space is of lower dimension that the original space. In case of text an example would be the hashing trick where a piece of text is reduced to a vector of few bits (say 16 or 32) or bytes. The amazing thing is that the geometry of the space is preserved (given enough bits), so relative distances between documents remain the same as in the original space, so you can deploy standard machine learning techniques without having to deal with unbound (and huge number of) dimensions found in text.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Oct 20 '15 at 3:28









                            Emre

                            8,58111935




                            8,58111935










                            answered Jun 10 '14 at 22:26









                            iliasfliliasfl

                            536416




                            536416























                                5












                                $begingroup$

                                Feature selection is about choosing some of features based on some statistical score but feature extraction is using techniques to extract some second layer information from the data e.g. interesting frequencies of a signal using Fourier transform.



                                Dimensionality reduction is all about transforming data into a low-dimensional space in which data preserves its euclidean structure but does not suffer from curse of dimensionality.
                                For instance assume you extract some word features $[x_1,...,x_n]$ from a data set where each document can be modeled as a point in n-dimensional space and n is too large (a toy example). In this case many algorithms do not work according to the distance distortion of high-dimensional space. Now you need to reduce dimensionality by either selecting most informative features or transforming them into a low-dimensional manifold using dimensionality reduction methods e.g. PCA, LLE, etc.






                                share|improve this answer









                                $endgroup$













                                • $begingroup$
                                  Out of the answers available this one best matches what I've seen in several Data Science and ML Platform teams
                                  $endgroup$
                                  – javadba
                                  Dec 3 '17 at 23:28
















                                5












                                $begingroup$

                                Feature selection is about choosing some of features based on some statistical score but feature extraction is using techniques to extract some second layer information from the data e.g. interesting frequencies of a signal using Fourier transform.



                                Dimensionality reduction is all about transforming data into a low-dimensional space in which data preserves its euclidean structure but does not suffer from curse of dimensionality.
                                For instance assume you extract some word features $[x_1,...,x_n]$ from a data set where each document can be modeled as a point in n-dimensional space and n is too large (a toy example). In this case many algorithms do not work according to the distance distortion of high-dimensional space. Now you need to reduce dimensionality by either selecting most informative features or transforming them into a low-dimensional manifold using dimensionality reduction methods e.g. PCA, LLE, etc.






                                share|improve this answer









                                $endgroup$













                                • $begingroup$
                                  Out of the answers available this one best matches what I've seen in several Data Science and ML Platform teams
                                  $endgroup$
                                  – javadba
                                  Dec 3 '17 at 23:28














                                5












                                5








                                5





                                $begingroup$

                                Feature selection is about choosing some of features based on some statistical score but feature extraction is using techniques to extract some second layer information from the data e.g. interesting frequencies of a signal using Fourier transform.



                                Dimensionality reduction is all about transforming data into a low-dimensional space in which data preserves its euclidean structure but does not suffer from curse of dimensionality.
                                For instance assume you extract some word features $[x_1,...,x_n]$ from a data set where each document can be modeled as a point in n-dimensional space and n is too large (a toy example). In this case many algorithms do not work according to the distance distortion of high-dimensional space. Now you need to reduce dimensionality by either selecting most informative features or transforming them into a low-dimensional manifold using dimensionality reduction methods e.g. PCA, LLE, etc.






                                share|improve this answer









                                $endgroup$



                                Feature selection is about choosing some of features based on some statistical score but feature extraction is using techniques to extract some second layer information from the data e.g. interesting frequencies of a signal using Fourier transform.



                                Dimensionality reduction is all about transforming data into a low-dimensional space in which data preserves its euclidean structure but does not suffer from curse of dimensionality.
                                For instance assume you extract some word features $[x_1,...,x_n]$ from a data set where each document can be modeled as a point in n-dimensional space and n is too large (a toy example). In this case many algorithms do not work according to the distance distortion of high-dimensional space. Now you need to reduce dimensionality by either selecting most informative features or transforming them into a low-dimensional manifold using dimensionality reduction methods e.g. PCA, LLE, etc.







                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered Dec 10 '15 at 21:49









                                DanielWelkeDanielWelke

                                11319




                                11319












                                • $begingroup$
                                  Out of the answers available this one best matches what I've seen in several Data Science and ML Platform teams
                                  $endgroup$
                                  – javadba
                                  Dec 3 '17 at 23:28


















                                • $begingroup$
                                  Out of the answers available this one best matches what I've seen in several Data Science and ML Platform teams
                                  $endgroup$
                                  – javadba
                                  Dec 3 '17 at 23:28
















                                $begingroup$
                                Out of the answers available this one best matches what I've seen in several Data Science and ML Platform teams
                                $endgroup$
                                – javadba
                                Dec 3 '17 at 23:28




                                $begingroup$
                                Out of the answers available this one best matches what I've seen in several Data Science and ML Platform teams
                                $endgroup$
                                – javadba
                                Dec 3 '17 at 23:28











                                3












                                $begingroup$

                                To complete Damien's answer, an example of dimensionality reduction in NLP is a topic model, where you represent the document by a vector indicating the weights of its constituent topics.






                                share|improve this answer









                                $endgroup$


















                                  3












                                  $begingroup$

                                  To complete Damien's answer, an example of dimensionality reduction in NLP is a topic model, where you represent the document by a vector indicating the weights of its constituent topics.






                                  share|improve this answer









                                  $endgroup$
















                                    3












                                    3








                                    3





                                    $begingroup$

                                    To complete Damien's answer, an example of dimensionality reduction in NLP is a topic model, where you represent the document by a vector indicating the weights of its constituent topics.






                                    share|improve this answer









                                    $endgroup$



                                    To complete Damien's answer, an example of dimensionality reduction in NLP is a topic model, where you represent the document by a vector indicating the weights of its constituent topics.







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Jun 8 '14 at 7:03









                                    EmreEmre

                                    8,58111935




                                    8,58111935























                                        2












                                        $begingroup$

                                        For a proper review and definition you may take a look at Dimension Reduction vs. Variable Selection also in the book Feature Extraction Foundations and Applications
                                        feature extraction is decomposed in to two steps: feature construction and feature selection.






                                        share|improve this answer









                                        $endgroup$


















                                          2












                                          $begingroup$

                                          For a proper review and definition you may take a look at Dimension Reduction vs. Variable Selection also in the book Feature Extraction Foundations and Applications
                                          feature extraction is decomposed in to two steps: feature construction and feature selection.






                                          share|improve this answer









                                          $endgroup$
















                                            2












                                            2








                                            2





                                            $begingroup$

                                            For a proper review and definition you may take a look at Dimension Reduction vs. Variable Selection also in the book Feature Extraction Foundations and Applications
                                            feature extraction is decomposed in to two steps: feature construction and feature selection.






                                            share|improve this answer









                                            $endgroup$



                                            For a proper review and definition you may take a look at Dimension Reduction vs. Variable Selection also in the book Feature Extraction Foundations and Applications
                                            feature extraction is decomposed in to two steps: feature construction and feature selection.







                                            share|improve this answer












                                            share|improve this answer



                                            share|improve this answer










                                            answered Oct 19 '15 at 22:42









                                            AshAsh

                                            211




                                            211























                                                2












                                                $begingroup$

                                                A1. What is dimensionality reduction:
                                                If you think of data in a matrix, where rows are instances and columns are attributes (or features), then dimensionality reduction is mapping this data matrix to a new matrix with fewer columns. For visualization, if you think of each matrix-column (attribute) as a dimension in feature space, then dimensionality reduction is projection of instances from the higher dimensional space (more columns) to a lower dimensional sub-space (fewer columns).
                                                Dimensionality reduction is subspace projection
                                                Typical objective for this transformation is (1) preserving information in the data matrix, while reducing computational complexity; (2) improving separability of different classes in data.



                                                A2. Dimensionality reduction as feature selection or feature extraction:
                                                I'll use the ubiquitous Iris dataset, which is arguably the 'hello world' of data science. Briefly, the Iris dataset has 3 classes and 4 attributes (columns). I'll illustrate feature selection and extraction for the task of reducing Iris dataset dimensionality from 4 to 2.



                                                I compute pair-wise co-variance of this dataset using library in Python called seaborn. The code is: sns.pairplot(iris, hue="species", markers=["o", "s", "D"]) The figure I get is
                                                Iris pair-plot
                                                I can select the pair of attributes (2 dimensions) that provide me the greatest separation between the 3 classes (species) in the Iris dataset. This would be a case of feature-selection.



                                                Next up is feature extraction. Herein, I am projecting the 4-dimensional feature space of Iris to a new 2-dimensional subspace, which is not axis aligned with the original space. These are new attributes. They are typically based on the distribution in the original high dimensional space.
                                                The most popular method is Principal Component Analysis, which computes Eigenvectors in the original space.
                                                PCA using SVD
                                                Obviously, we are not restricted to using only a linear and global projection to a subspace based on Eigenvectors. We can use non-linear projection methods as well.
                                                Here is an example of non-linear PCA using neural networks
                                                non-linear PCA using NN
                                                The attributes (dimensions) in the last example are extracted from the original 4 attributes using neural networks. You can experiment with various flavors of PCA for iris dataset youself using this pca methods code.



                                                Summary:
                                                While feature extraction methods may appear to be superior in performance to feature selection, the choice is predicated by the application. The attributes from feature extraction typically lose physical interpretation, which may or may not be an issue based on the task at hand. For example, if you are designing a very expensive data collection task with costly sensors and need to economize on the attributes (number of different sensors), you'd want to collect a small pilot sample using all available sensors and then select the ones that are most informative for the big data collection task.






                                                share|improve this answer









                                                $endgroup$


















                                                  2












                                                  $begingroup$

                                                  A1. What is dimensionality reduction:
                                                  If you think of data in a matrix, where rows are instances and columns are attributes (or features), then dimensionality reduction is mapping this data matrix to a new matrix with fewer columns. For visualization, if you think of each matrix-column (attribute) as a dimension in feature space, then dimensionality reduction is projection of instances from the higher dimensional space (more columns) to a lower dimensional sub-space (fewer columns).
                                                  Dimensionality reduction is subspace projection
                                                  Typical objective for this transformation is (1) preserving information in the data matrix, while reducing computational complexity; (2) improving separability of different classes in data.



                                                  A2. Dimensionality reduction as feature selection or feature extraction:
                                                  I'll use the ubiquitous Iris dataset, which is arguably the 'hello world' of data science. Briefly, the Iris dataset has 3 classes and 4 attributes (columns). I'll illustrate feature selection and extraction for the task of reducing Iris dataset dimensionality from 4 to 2.



                                                  I compute pair-wise co-variance of this dataset using library in Python called seaborn. The code is: sns.pairplot(iris, hue="species", markers=["o", "s", "D"]) The figure I get is
                                                  Iris pair-plot
                                                  I can select the pair of attributes (2 dimensions) that provide me the greatest separation between the 3 classes (species) in the Iris dataset. This would be a case of feature-selection.



                                                  Next up is feature extraction. Herein, I am projecting the 4-dimensional feature space of Iris to a new 2-dimensional subspace, which is not axis aligned with the original space. These are new attributes. They are typically based on the distribution in the original high dimensional space.
                                                  The most popular method is Principal Component Analysis, which computes Eigenvectors in the original space.
                                                  PCA using SVD
                                                  Obviously, we are not restricted to using only a linear and global projection to a subspace based on Eigenvectors. We can use non-linear projection methods as well.
                                                  Here is an example of non-linear PCA using neural networks
                                                  non-linear PCA using NN
                                                  The attributes (dimensions) in the last example are extracted from the original 4 attributes using neural networks. You can experiment with various flavors of PCA for iris dataset youself using this pca methods code.



                                                  Summary:
                                                  While feature extraction methods may appear to be superior in performance to feature selection, the choice is predicated by the application. The attributes from feature extraction typically lose physical interpretation, which may or may not be an issue based on the task at hand. For example, if you are designing a very expensive data collection task with costly sensors and need to economize on the attributes (number of different sensors), you'd want to collect a small pilot sample using all available sensors and then select the ones that are most informative for the big data collection task.






                                                  share|improve this answer









                                                  $endgroup$
















                                                    2












                                                    2








                                                    2





                                                    $begingroup$

                                                    A1. What is dimensionality reduction:
                                                    If you think of data in a matrix, where rows are instances and columns are attributes (or features), then dimensionality reduction is mapping this data matrix to a new matrix with fewer columns. For visualization, if you think of each matrix-column (attribute) as a dimension in feature space, then dimensionality reduction is projection of instances from the higher dimensional space (more columns) to a lower dimensional sub-space (fewer columns).
                                                    Dimensionality reduction is subspace projection
                                                    Typical objective for this transformation is (1) preserving information in the data matrix, while reducing computational complexity; (2) improving separability of different classes in data.



                                                    A2. Dimensionality reduction as feature selection or feature extraction:
                                                    I'll use the ubiquitous Iris dataset, which is arguably the 'hello world' of data science. Briefly, the Iris dataset has 3 classes and 4 attributes (columns). I'll illustrate feature selection and extraction for the task of reducing Iris dataset dimensionality from 4 to 2.



                                                    I compute pair-wise co-variance of this dataset using library in Python called seaborn. The code is: sns.pairplot(iris, hue="species", markers=["o", "s", "D"]) The figure I get is
                                                    Iris pair-plot
                                                    I can select the pair of attributes (2 dimensions) that provide me the greatest separation between the 3 classes (species) in the Iris dataset. This would be a case of feature-selection.



                                                    Next up is feature extraction. Herein, I am projecting the 4-dimensional feature space of Iris to a new 2-dimensional subspace, which is not axis aligned with the original space. These are new attributes. They are typically based on the distribution in the original high dimensional space.
                                                    The most popular method is Principal Component Analysis, which computes Eigenvectors in the original space.
                                                    PCA using SVD
                                                    Obviously, we are not restricted to using only a linear and global projection to a subspace based on Eigenvectors. We can use non-linear projection methods as well.
                                                    Here is an example of non-linear PCA using neural networks
                                                    non-linear PCA using NN
                                                    The attributes (dimensions) in the last example are extracted from the original 4 attributes using neural networks. You can experiment with various flavors of PCA for iris dataset youself using this pca methods code.



                                                    Summary:
                                                    While feature extraction methods may appear to be superior in performance to feature selection, the choice is predicated by the application. The attributes from feature extraction typically lose physical interpretation, which may or may not be an issue based on the task at hand. For example, if you are designing a very expensive data collection task with costly sensors and need to economize on the attributes (number of different sensors), you'd want to collect a small pilot sample using all available sensors and then select the ones that are most informative for the big data collection task.






                                                    share|improve this answer









                                                    $endgroup$



                                                    A1. What is dimensionality reduction:
                                                    If you think of data in a matrix, where rows are instances and columns are attributes (or features), then dimensionality reduction is mapping this data matrix to a new matrix with fewer columns. For visualization, if you think of each matrix-column (attribute) as a dimension in feature space, then dimensionality reduction is projection of instances from the higher dimensional space (more columns) to a lower dimensional sub-space (fewer columns).
                                                    Dimensionality reduction is subspace projection
                                                    Typical objective for this transformation is (1) preserving information in the data matrix, while reducing computational complexity; (2) improving separability of different classes in data.



                                                    A2. Dimensionality reduction as feature selection or feature extraction:
                                                    I'll use the ubiquitous Iris dataset, which is arguably the 'hello world' of data science. Briefly, the Iris dataset has 3 classes and 4 attributes (columns). I'll illustrate feature selection and extraction for the task of reducing Iris dataset dimensionality from 4 to 2.



                                                    I compute pair-wise co-variance of this dataset using library in Python called seaborn. The code is: sns.pairplot(iris, hue="species", markers=["o", "s", "D"]) The figure I get is
                                                    Iris pair-plot
                                                    I can select the pair of attributes (2 dimensions) that provide me the greatest separation between the 3 classes (species) in the Iris dataset. This would be a case of feature-selection.



                                                    Next up is feature extraction. Herein, I am projecting the 4-dimensional feature space of Iris to a new 2-dimensional subspace, which is not axis aligned with the original space. These are new attributes. They are typically based on the distribution in the original high dimensional space.
                                                    The most popular method is Principal Component Analysis, which computes Eigenvectors in the original space.
                                                    PCA using SVD
                                                    Obviously, we are not restricted to using only a linear and global projection to a subspace based on Eigenvectors. We can use non-linear projection methods as well.
                                                    Here is an example of non-linear PCA using neural networks
                                                    non-linear PCA using NN
                                                    The attributes (dimensions) in the last example are extracted from the original 4 attributes using neural networks. You can experiment with various flavors of PCA for iris dataset youself using this pca methods code.



                                                    Summary:
                                                    While feature extraction methods may appear to be superior in performance to feature selection, the choice is predicated by the application. The attributes from feature extraction typically lose physical interpretation, which may or may not be an issue based on the task at hand. For example, if you are designing a very expensive data collection task with costly sensors and need to economize on the attributes (number of different sensors), you'd want to collect a small pilot sample using all available sensors and then select the ones that are most informative for the big data collection task.







                                                    share|improve this answer












                                                    share|improve this answer



                                                    share|improve this answer










                                                    answered Sep 17 '17 at 1:39









                                                    Dynamic StardustDynamic Stardust

                                                    838610




                                                    838610























                                                        0












                                                        $begingroup$

                                                        Several great answers on here, in particular, @damienfrancois's answer very succinctly captures the general idea.



                                                        However, I don't see any examples of feature engineering for relational or time-series data. In that case, data scientists generally extract statistical patterns across relationships and over time. For instance, in order to predict what customers will by in the future in an ecommerce database, one might extract quantities like the average historical purchase amount, or the frequency of prior purchases.



                                                        I wrote a piece on this topic that goes into much more detail with several examples here: https://www.featurelabs.com/blog/feature-engineering-vs-feature-selection/






                                                        share|improve this answer









                                                        $endgroup$


















                                                          0












                                                          $begingroup$

                                                          Several great answers on here, in particular, @damienfrancois's answer very succinctly captures the general idea.



                                                          However, I don't see any examples of feature engineering for relational or time-series data. In that case, data scientists generally extract statistical patterns across relationships and over time. For instance, in order to predict what customers will by in the future in an ecommerce database, one might extract quantities like the average historical purchase amount, or the frequency of prior purchases.



                                                          I wrote a piece on this topic that goes into much more detail with several examples here: https://www.featurelabs.com/blog/feature-engineering-vs-feature-selection/






                                                          share|improve this answer









                                                          $endgroup$
















                                                            0












                                                            0








                                                            0





                                                            $begingroup$

                                                            Several great answers on here, in particular, @damienfrancois's answer very succinctly captures the general idea.



                                                            However, I don't see any examples of feature engineering for relational or time-series data. In that case, data scientists generally extract statistical patterns across relationships and over time. For instance, in order to predict what customers will by in the future in an ecommerce database, one might extract quantities like the average historical purchase amount, or the frequency of prior purchases.



                                                            I wrote a piece on this topic that goes into much more detail with several examples here: https://www.featurelabs.com/blog/feature-engineering-vs-feature-selection/






                                                            share|improve this answer









                                                            $endgroup$



                                                            Several great answers on here, in particular, @damienfrancois's answer very succinctly captures the general idea.



                                                            However, I don't see any examples of feature engineering for relational or time-series data. In that case, data scientists generally extract statistical patterns across relationships and over time. For instance, in order to predict what customers will by in the future in an ecommerce database, one might extract quantities like the average historical purchase amount, or the frequency of prior purchases.



                                                            I wrote a piece on this topic that goes into much more detail with several examples here: https://www.featurelabs.com/blog/feature-engineering-vs-feature-selection/







                                                            share|improve this answer












                                                            share|improve this answer



                                                            share|improve this answer










                                                            answered Mar 8 '18 at 17:31









                                                            bschreckbschreck

                                                            1012




                                                            1012























                                                                0












                                                                $begingroup$

                                                                Let me start with reverse order which feature extraction and why there is need of feature selection and dimensionality reduction.



                                                                Starting with the usage of feature extraction which is mainly for classification purposes. The classification is the process of making a decision on which category particular object belongs. It has two phases i) training phase, where given the data or objects their properties are learned using some process (feature extraction) ii) testing phase, where the unknown object is classified using the features learned in the previous (training) phase.



                                                                Feature extraction as the name suggests given the data aim is to find the underlying pattern. This underlying pattern which is term as feature corresponding to that respective data. There are various methodologies existing for feature extraction such as Support Vector Machine(SVM).



                                                                Now, feature extraction should generate features which should be




                                                                • robust

                                                                • discriminative

                                                                • optimal set of features


                                                                Feature Selection: A specific set of data can be represented either by a single feature or set of features. In the classification process, a system is trained for at least two classes. So the training system will either generate a single feature or set of features. These features should possess the properties stated above.



                                                                The problem comes when there is a feature set for each class and there exists correlation between some of the features. That implies among those correlating features one or few are sufficient for representation and that is where feature selection comes in to picture. Also, these features need to be stored with the increase in feature set memory requirement also increases.



                                                                Then comes the dimensionality reduction which is nothing but the part of feature selection process. It is the process of choosing the optimal set of features which best describe the data. There are many techniques for the same such as principal component analysis, independent component analysis, and matrix factorization etc.






                                                                share|improve this answer









                                                                $endgroup$


















                                                                  0












                                                                  $begingroup$

                                                                  Let me start with reverse order which feature extraction and why there is need of feature selection and dimensionality reduction.



                                                                  Starting with the usage of feature extraction which is mainly for classification purposes. The classification is the process of making a decision on which category particular object belongs. It has two phases i) training phase, where given the data or objects their properties are learned using some process (feature extraction) ii) testing phase, where the unknown object is classified using the features learned in the previous (training) phase.



                                                                  Feature extraction as the name suggests given the data aim is to find the underlying pattern. This underlying pattern which is term as feature corresponding to that respective data. There are various methodologies existing for feature extraction such as Support Vector Machine(SVM).



                                                                  Now, feature extraction should generate features which should be




                                                                  • robust

                                                                  • discriminative

                                                                  • optimal set of features


                                                                  Feature Selection: A specific set of data can be represented either by a single feature or set of features. In the classification process, a system is trained for at least two classes. So the training system will either generate a single feature or set of features. These features should possess the properties stated above.



                                                                  The problem comes when there is a feature set for each class and there exists correlation between some of the features. That implies among those correlating features one or few are sufficient for representation and that is where feature selection comes in to picture. Also, these features need to be stored with the increase in feature set memory requirement also increases.



                                                                  Then comes the dimensionality reduction which is nothing but the part of feature selection process. It is the process of choosing the optimal set of features which best describe the data. There are many techniques for the same such as principal component analysis, independent component analysis, and matrix factorization etc.






                                                                  share|improve this answer









                                                                  $endgroup$
















                                                                    0












                                                                    0








                                                                    0





                                                                    $begingroup$

                                                                    Let me start with reverse order which feature extraction and why there is need of feature selection and dimensionality reduction.



                                                                    Starting with the usage of feature extraction which is mainly for classification purposes. The classification is the process of making a decision on which category particular object belongs. It has two phases i) training phase, where given the data or objects their properties are learned using some process (feature extraction) ii) testing phase, where the unknown object is classified using the features learned in the previous (training) phase.



                                                                    Feature extraction as the name suggests given the data aim is to find the underlying pattern. This underlying pattern which is term as feature corresponding to that respective data. There are various methodologies existing for feature extraction such as Support Vector Machine(SVM).



                                                                    Now, feature extraction should generate features which should be




                                                                    • robust

                                                                    • discriminative

                                                                    • optimal set of features


                                                                    Feature Selection: A specific set of data can be represented either by a single feature or set of features. In the classification process, a system is trained for at least two classes. So the training system will either generate a single feature or set of features. These features should possess the properties stated above.



                                                                    The problem comes when there is a feature set for each class and there exists correlation between some of the features. That implies among those correlating features one or few are sufficient for representation and that is where feature selection comes in to picture. Also, these features need to be stored with the increase in feature set memory requirement also increases.



                                                                    Then comes the dimensionality reduction which is nothing but the part of feature selection process. It is the process of choosing the optimal set of features which best describe the data. There are many techniques for the same such as principal component analysis, independent component analysis, and matrix factorization etc.






                                                                    share|improve this answer









                                                                    $endgroup$



                                                                    Let me start with reverse order which feature extraction and why there is need of feature selection and dimensionality reduction.



                                                                    Starting with the usage of feature extraction which is mainly for classification purposes. The classification is the process of making a decision on which category particular object belongs. It has two phases i) training phase, where given the data or objects their properties are learned using some process (feature extraction) ii) testing phase, where the unknown object is classified using the features learned in the previous (training) phase.



                                                                    Feature extraction as the name suggests given the data aim is to find the underlying pattern. This underlying pattern which is term as feature corresponding to that respective data. There are various methodologies existing for feature extraction such as Support Vector Machine(SVM).



                                                                    Now, feature extraction should generate features which should be




                                                                    • robust

                                                                    • discriminative

                                                                    • optimal set of features


                                                                    Feature Selection: A specific set of data can be represented either by a single feature or set of features. In the classification process, a system is trained for at least two classes. So the training system will either generate a single feature or set of features. These features should possess the properties stated above.



                                                                    The problem comes when there is a feature set for each class and there exists correlation between some of the features. That implies among those correlating features one or few are sufficient for representation and that is where feature selection comes in to picture. Also, these features need to be stored with the increase in feature set memory requirement also increases.



                                                                    Then comes the dimensionality reduction which is nothing but the part of feature selection process. It is the process of choosing the optimal set of features which best describe the data. There are many techniques for the same such as principal component analysis, independent component analysis, and matrix factorization etc.







                                                                    share|improve this answer












                                                                    share|improve this answer



                                                                    share|improve this answer










                                                                    answered Jan 12 at 8:09









                                                                    Chirag AroraChirag Arora

                                                                    1




                                                                    1























                                                                        0












                                                                        $begingroup$

                                                                        Extracted from Hands-on machine learning with scikit-learn & Tensorflow




                                                                        1. Data cleaning:
                                                                          Fix or remove outliers (optional).
                                                                          Fill in missing values (e.g., with zero, mean, median…) or drop their rows (or columns).

                                                                        2. Feature selection (optional):
                                                                          Drop the attributes that provide no useful information for the task.

                                                                        3. Feature engineering, where appropriate:
                                                                          Discretize continuous features.
                                                                          Decompose features (e.g., categorical, date/time, etc.).
                                                                          Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
                                                                          Aggregate features into promising new features.

                                                                        4. Feature scaling: standardize or normalize features.






                                                                        share|improve this answer








                                                                        New contributor




                                                                        Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                        Check out our Code of Conduct.






                                                                        $endgroup$


















                                                                          0












                                                                          $begingroup$

                                                                          Extracted from Hands-on machine learning with scikit-learn & Tensorflow




                                                                          1. Data cleaning:
                                                                            Fix or remove outliers (optional).
                                                                            Fill in missing values (e.g., with zero, mean, median…) or drop their rows (or columns).

                                                                          2. Feature selection (optional):
                                                                            Drop the attributes that provide no useful information for the task.

                                                                          3. Feature engineering, where appropriate:
                                                                            Discretize continuous features.
                                                                            Decompose features (e.g., categorical, date/time, etc.).
                                                                            Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
                                                                            Aggregate features into promising new features.

                                                                          4. Feature scaling: standardize or normalize features.






                                                                          share|improve this answer








                                                                          New contributor




                                                                          Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                          Check out our Code of Conduct.






                                                                          $endgroup$
















                                                                            0












                                                                            0








                                                                            0





                                                                            $begingroup$

                                                                            Extracted from Hands-on machine learning with scikit-learn & Tensorflow




                                                                            1. Data cleaning:
                                                                              Fix or remove outliers (optional).
                                                                              Fill in missing values (e.g., with zero, mean, median…) or drop their rows (or columns).

                                                                            2. Feature selection (optional):
                                                                              Drop the attributes that provide no useful information for the task.

                                                                            3. Feature engineering, where appropriate:
                                                                              Discretize continuous features.
                                                                              Decompose features (e.g., categorical, date/time, etc.).
                                                                              Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
                                                                              Aggregate features into promising new features.

                                                                            4. Feature scaling: standardize or normalize features.






                                                                            share|improve this answer








                                                                            New contributor




                                                                            Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                            Check out our Code of Conduct.






                                                                            $endgroup$



                                                                            Extracted from Hands-on machine learning with scikit-learn & Tensorflow




                                                                            1. Data cleaning:
                                                                              Fix or remove outliers (optional).
                                                                              Fill in missing values (e.g., with zero, mean, median…) or drop their rows (or columns).

                                                                            2. Feature selection (optional):
                                                                              Drop the attributes that provide no useful information for the task.

                                                                            3. Feature engineering, where appropriate:
                                                                              Discretize continuous features.
                                                                              Decompose features (e.g., categorical, date/time, etc.).
                                                                              Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
                                                                              Aggregate features into promising new features.

                                                                            4. Feature scaling: standardize or normalize features.







                                                                            share|improve this answer








                                                                            New contributor




                                                                            Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                            Check out our Code of Conduct.









                                                                            share|improve this answer



                                                                            share|improve this answer






                                                                            New contributor




                                                                            Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                            Check out our Code of Conduct.









                                                                            answered yesterday









                                                                            Hadi AskariHadi Askari

                                                                            1




                                                                            1




                                                                            New contributor




                                                                            Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                            Check out our Code of Conduct.





                                                                            New contributor





                                                                            Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                            Check out our Code of Conduct.






                                                                            Hadi Askari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                            Check out our Code of Conduct.























                                                                                -3












                                                                                $begingroup$

                                                                                For example...if u have an agricultural land then selecting one particular area of that land would be feature selection.If u aim to find the affected plants in that area den u need to observe each plant based on a particular feature that is common in each plant so as to find the abnormalities...for this u would be considering feature extraction.In this example the original agricultural land corresponds to Dimensionality reduction.






                                                                                share|improve this answer











                                                                                $endgroup$













                                                                                • $begingroup$
                                                                                  No, it has nothing to do with spatial data in particular. It's applicable to temporal, spatio-temporal, and other sorts of data too.
                                                                                  $endgroup$
                                                                                  – Emre
                                                                                  Jun 21 '14 at 6:10
















                                                                                -3












                                                                                $begingroup$

                                                                                For example...if u have an agricultural land then selecting one particular area of that land would be feature selection.If u aim to find the affected plants in that area den u need to observe each plant based on a particular feature that is common in each plant so as to find the abnormalities...for this u would be considering feature extraction.In this example the original agricultural land corresponds to Dimensionality reduction.






                                                                                share|improve this answer











                                                                                $endgroup$













                                                                                • $begingroup$
                                                                                  No, it has nothing to do with spatial data in particular. It's applicable to temporal, spatio-temporal, and other sorts of data too.
                                                                                  $endgroup$
                                                                                  – Emre
                                                                                  Jun 21 '14 at 6:10














                                                                                -3












                                                                                -3








                                                                                -3





                                                                                $begingroup$

                                                                                For example...if u have an agricultural land then selecting one particular area of that land would be feature selection.If u aim to find the affected plants in that area den u need to observe each plant based on a particular feature that is common in each plant so as to find the abnormalities...for this u would be considering feature extraction.In this example the original agricultural land corresponds to Dimensionality reduction.






                                                                                share|improve this answer











                                                                                $endgroup$



                                                                                For example...if u have an agricultural land then selecting one particular area of that land would be feature selection.If u aim to find the affected plants in that area den u need to observe each plant based on a particular feature that is common in each plant so as to find the abnormalities...for this u would be considering feature extraction.In this example the original agricultural land corresponds to Dimensionality reduction.







                                                                                share|improve this answer














                                                                                share|improve this answer



                                                                                share|improve this answer








                                                                                edited Jul 9 '14 at 17:18

























                                                                                answered Jun 20 '14 at 17:30









                                                                                DivyaDivya

                                                                                753




                                                                                753












                                                                                • $begingroup$
                                                                                  No, it has nothing to do with spatial data in particular. It's applicable to temporal, spatio-temporal, and other sorts of data too.
                                                                                  $endgroup$
                                                                                  – Emre
                                                                                  Jun 21 '14 at 6:10


















                                                                                • $begingroup$
                                                                                  No, it has nothing to do with spatial data in particular. It's applicable to temporal, spatio-temporal, and other sorts of data too.
                                                                                  $endgroup$
                                                                                  – Emre
                                                                                  Jun 21 '14 at 6:10
















                                                                                $begingroup$
                                                                                No, it has nothing to do with spatial data in particular. It's applicable to temporal, spatio-temporal, and other sorts of data too.
                                                                                $endgroup$
                                                                                – Emre
                                                                                Jun 21 '14 at 6:10




                                                                                $begingroup$
                                                                                No, it has nothing to do with spatial data in particular. It's applicable to temporal, spatio-temporal, and other sorts of data too.
                                                                                $endgroup$
                                                                                – Emre
                                                                                Jun 21 '14 at 6:10


















                                                                                draft saved

                                                                                draft discarded




















































                                                                                Thanks for contributing an answer to Data Science Stack Exchange!


                                                                                • Please be sure to answer the question. Provide details and share your research!

                                                                                But avoid



                                                                                • Asking for help, clarification, or responding to other answers.

                                                                                • Making statements based on opinion; back them up with references or personal experience.


                                                                                Use MathJax to format equations. MathJax reference.


                                                                                To learn more, see our tips on writing great answers.




                                                                                draft saved


                                                                                draft discarded














                                                                                StackExchange.ready(
                                                                                function () {
                                                                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f130%2fwhat-is-dimensionality-reduction-what-is-the-difference-between-feature-selecti%23new-answer', 'question_page');
                                                                                }
                                                                                );

                                                                                Post as a guest















                                                                                Required, but never shown





















































                                                                                Required, but never shown














                                                                                Required, but never shown












                                                                                Required, but never shown







                                                                                Required, but never shown

































                                                                                Required, but never shown














                                                                                Required, but never shown












                                                                                Required, but never shown







                                                                                Required, but never shown







                                                                                Popular posts from this blog

                                                                                Callistus I

                                                                                Tabula Rosettana

                                                                                How to label and detect the document text images