Data scaling before or after PCA












3












$begingroup$


I have seen senior data scientists doing data scaling either before or after applying PCA.



What is more right to do and why?










share|improve this question









$endgroup$








  • 1




    $begingroup$
    Closely related: stats.stackexchange.com/questions/53/…
    $endgroup$
    – Sycorax
    Jul 25 '18 at 15:33
















3












$begingroup$


I have seen senior data scientists doing data scaling either before or after applying PCA.



What is more right to do and why?










share|improve this question









$endgroup$








  • 1




    $begingroup$
    Closely related: stats.stackexchange.com/questions/53/…
    $endgroup$
    – Sycorax
    Jul 25 '18 at 15:33














3












3








3


1



$begingroup$


I have seen senior data scientists doing data scaling either before or after applying PCA.



What is more right to do and why?










share|improve this question









$endgroup$




I have seen senior data scientists doing data scaling either before or after applying PCA.



What is more right to do and why?







machine-learning feature-scaling






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jul 25 '18 at 13:50









Poete MauditPoete Maudit

386314




386314








  • 1




    $begingroup$
    Closely related: stats.stackexchange.com/questions/53/…
    $endgroup$
    – Sycorax
    Jul 25 '18 at 15:33














  • 1




    $begingroup$
    Closely related: stats.stackexchange.com/questions/53/…
    $endgroup$
    – Sycorax
    Jul 25 '18 at 15:33








1




1




$begingroup$
Closely related: stats.stackexchange.com/questions/53/…
$endgroup$
– Sycorax
Jul 25 '18 at 15:33




$begingroup$
Closely related: stats.stackexchange.com/questions/53/…
$endgroup$
– Sycorax
Jul 25 '18 at 15:33










2 Answers
2






active

oldest

votes


















11












$begingroup$

I once heard a data scinetist state at a conference talk: "Basically, you can do what you want, as long as you know what you are doing."



This also applies here. The more statistically sound way would be to transform all variables prior to additional steps such as PCA or factor analysis. Then you still know the scale of your variables and can interpret the rescaling in the context of your application. If you have no such interpretation, but good reasons for rescaling your principal components due to computational issues arising if some values are to close to zero while others are quite large, rescaling the components makes sense. However, reversing this process and still being able to interpret the effect of the rescaling operation in your context will become almost impossible.






share|improve this answer









$endgroup$













  • $begingroup$
    Thank you for your answer@alex. As I can for the upvotes that you got, your answer is right and actually this was what I had in my mind.
    $endgroup$
    – Poete Maudit
    Jul 27 '18 at 12:01



















0












$begingroup$

"It results more important to balance the classes rather than reduce the dimensionality, at least in terms of accuracy; (ii) The best choice seems to be the application of SMOTE followed by PCA.."



Link: https://core.ac.uk/download/pdf/61408511.pdf






share|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36002%2fdata-scaling-before-or-after-pca%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    11












    $begingroup$

    I once heard a data scinetist state at a conference talk: "Basically, you can do what you want, as long as you know what you are doing."



    This also applies here. The more statistically sound way would be to transform all variables prior to additional steps such as PCA or factor analysis. Then you still know the scale of your variables and can interpret the rescaling in the context of your application. If you have no such interpretation, but good reasons for rescaling your principal components due to computational issues arising if some values are to close to zero while others are quite large, rescaling the components makes sense. However, reversing this process and still being able to interpret the effect of the rescaling operation in your context will become almost impossible.






    share|improve this answer









    $endgroup$













    • $begingroup$
      Thank you for your answer@alex. As I can for the upvotes that you got, your answer is right and actually this was what I had in my mind.
      $endgroup$
      – Poete Maudit
      Jul 27 '18 at 12:01
















    11












    $begingroup$

    I once heard a data scinetist state at a conference talk: "Basically, you can do what you want, as long as you know what you are doing."



    This also applies here. The more statistically sound way would be to transform all variables prior to additional steps such as PCA or factor analysis. Then you still know the scale of your variables and can interpret the rescaling in the context of your application. If you have no such interpretation, but good reasons for rescaling your principal components due to computational issues arising if some values are to close to zero while others are quite large, rescaling the components makes sense. However, reversing this process and still being able to interpret the effect of the rescaling operation in your context will become almost impossible.






    share|improve this answer









    $endgroup$













    • $begingroup$
      Thank you for your answer@alex. As I can for the upvotes that you got, your answer is right and actually this was what I had in my mind.
      $endgroup$
      – Poete Maudit
      Jul 27 '18 at 12:01














    11












    11








    11





    $begingroup$

    I once heard a data scinetist state at a conference talk: "Basically, you can do what you want, as long as you know what you are doing."



    This also applies here. The more statistically sound way would be to transform all variables prior to additional steps such as PCA or factor analysis. Then you still know the scale of your variables and can interpret the rescaling in the context of your application. If you have no such interpretation, but good reasons for rescaling your principal components due to computational issues arising if some values are to close to zero while others are quite large, rescaling the components makes sense. However, reversing this process and still being able to interpret the effect of the rescaling operation in your context will become almost impossible.






    share|improve this answer









    $endgroup$



    I once heard a data scinetist state at a conference talk: "Basically, you can do what you want, as long as you know what you are doing."



    This also applies here. The more statistically sound way would be to transform all variables prior to additional steps such as PCA or factor analysis. Then you still know the scale of your variables and can interpret the rescaling in the context of your application. If you have no such interpretation, but good reasons for rescaling your principal components due to computational issues arising if some values are to close to zero while others are quite large, rescaling the components makes sense. However, reversing this process and still being able to interpret the effect of the rescaling operation in your context will become almost impossible.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jul 25 '18 at 13:59









    Alex2006Alex2006

    25118




    25118












    • $begingroup$
      Thank you for your answer@alex. As I can for the upvotes that you got, your answer is right and actually this was what I had in my mind.
      $endgroup$
      – Poete Maudit
      Jul 27 '18 at 12:01


















    • $begingroup$
      Thank you for your answer@alex. As I can for the upvotes that you got, your answer is right and actually this was what I had in my mind.
      $endgroup$
      – Poete Maudit
      Jul 27 '18 at 12:01
















    $begingroup$
    Thank you for your answer@alex. As I can for the upvotes that you got, your answer is right and actually this was what I had in my mind.
    $endgroup$
    – Poete Maudit
    Jul 27 '18 at 12:01




    $begingroup$
    Thank you for your answer@alex. As I can for the upvotes that you got, your answer is right and actually this was what I had in my mind.
    $endgroup$
    – Poete Maudit
    Jul 27 '18 at 12:01











    0












    $begingroup$

    "It results more important to balance the classes rather than reduce the dimensionality, at least in terms of accuracy; (ii) The best choice seems to be the application of SMOTE followed by PCA.."



    Link: https://core.ac.uk/download/pdf/61408511.pdf






    share|improve this answer









    $endgroup$


















      0












      $begingroup$

      "It results more important to balance the classes rather than reduce the dimensionality, at least in terms of accuracy; (ii) The best choice seems to be the application of SMOTE followed by PCA.."



      Link: https://core.ac.uk/download/pdf/61408511.pdf






      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        "It results more important to balance the classes rather than reduce the dimensionality, at least in terms of accuracy; (ii) The best choice seems to be the application of SMOTE followed by PCA.."



        Link: https://core.ac.uk/download/pdf/61408511.pdf






        share|improve this answer









        $endgroup$



        "It results more important to balance the classes rather than reduce the dimensionality, at least in terms of accuracy; (ii) The best choice seems to be the application of SMOTE followed by PCA.."



        Link: https://core.ac.uk/download/pdf/61408511.pdf







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 14 hours ago









        tsumaranainatsumaranaina

        4510




        4510






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36002%2fdata-scaling-before-or-after-pca%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Callistus I

            Tabula Rosettana

            How to label and detect the document text images