Do Random Forest overfit?












23












$begingroup$


I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but it seems that there is not consensus about this. This is creating me quite some confusion about the issue.



Maybe someone more expert than me can give me a more concrete answer or point me in the right direction to better understand the problem.










share|improve this question









$endgroup$








  • 3




    $begingroup$
    All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
    $endgroup$
    – indico
    Aug 23 '14 at 18:16






  • 1




    $begingroup$
    ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
    $endgroup$
    – JenSCDC
    Aug 28 '14 at 1:18










  • $begingroup$
    @AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
    $endgroup$
    – Hack-R
    Nov 3 '15 at 3:15


















23












$begingroup$


I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but it seems that there is not consensus about this. This is creating me quite some confusion about the issue.



Maybe someone more expert than me can give me a more concrete answer or point me in the right direction to better understand the problem.










share|improve this question









$endgroup$








  • 3




    $begingroup$
    All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
    $endgroup$
    – indico
    Aug 23 '14 at 18:16






  • 1




    $begingroup$
    ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
    $endgroup$
    – JenSCDC
    Aug 28 '14 at 1:18










  • $begingroup$
    @AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
    $endgroup$
    – Hack-R
    Nov 3 '15 at 3:15
















23












23








23


11



$begingroup$


I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but it seems that there is not consensus about this. This is creating me quite some confusion about the issue.



Maybe someone more expert than me can give me a more concrete answer or point me in the right direction to better understand the problem.










share|improve this question









$endgroup$




I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but it seems that there is not consensus about this. This is creating me quite some confusion about the issue.



Maybe someone more expert than me can give me a more concrete answer or point me in the right direction to better understand the problem.







machine-learning random-forest






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Aug 23 '14 at 16:54









markusianmarkusian

270128




270128








  • 3




    $begingroup$
    All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
    $endgroup$
    – indico
    Aug 23 '14 at 18:16






  • 1




    $begingroup$
    ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
    $endgroup$
    – JenSCDC
    Aug 28 '14 at 1:18










  • $begingroup$
    @AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
    $endgroup$
    – Hack-R
    Nov 3 '15 at 3:15
















  • 3




    $begingroup$
    All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
    $endgroup$
    – indico
    Aug 23 '14 at 18:16






  • 1




    $begingroup$
    ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
    $endgroup$
    – JenSCDC
    Aug 28 '14 at 1:18










  • $begingroup$
    @AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
    $endgroup$
    – Hack-R
    Nov 3 '15 at 3:15










3




3




$begingroup$
All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
$endgroup$
– indico
Aug 23 '14 at 18:16




$begingroup$
All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
$endgroup$
– indico
Aug 23 '14 at 18:16




1




1




$begingroup$
ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
$endgroup$
– JenSCDC
Aug 28 '14 at 1:18




$begingroup$
ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
$endgroup$
– JenSCDC
Aug 28 '14 at 1:18












$begingroup$
@AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
$endgroup$
– Hack-R
Nov 3 '15 at 3:15






$begingroup$
@AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
$endgroup$
– Hack-R
Nov 3 '15 at 3:15












4 Answers
4






active

oldest

votes


















18












$begingroup$

Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.



In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:





From here, the variance of the expected generalization error of an ensemble corresponds to:





where p(x) is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M, the variance of the ensemble decreases when ρ(x)<1. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.



In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.






share|improve this answer











$endgroup$









  • 1




    $begingroup$
    That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
    $endgroup$
    – Hack-R
    Feb 18 '16 at 14:41








  • 3




    $begingroup$
    If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
    $endgroup$
    – tashuhka
    Feb 19 '16 at 13:43



















9












$begingroup$

You may want to check cross-validated - a stachexchange website for many things, including machine learning.



In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit



But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest in R you can only control the complexity






share|improve this answer











$endgroup$





















    1












    $begingroup$

    STRUCTURED DATASET -> MISLEADING OOB ERRORS



    I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.



    Detail :



    I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).

    Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).

    So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.



    easier to predict in case of error autocorrelation :
    known, known, prediction, known, prediction - OBB case

    harder one :
    known, known, known, prediction, prediction - real world prediction case



    I hope its interesting






    share|improve this answer









    $endgroup$





















      1












      $begingroup$


      1. The Random Forest does overfit.

      2. The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.


      I've made a very simple experiment. I have generated the synthetic data:



      y = 10 * x + noise


      I've train two Random Forest models:




      • one with full trees

      • one with pruned trees


      The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:



      responses



      It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:



      growing trees



      As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.






      share|improve this answer









      $endgroup$














        Your Answer





        StackExchange.ifUsing("editor", function () {
        return StackExchange.using("mathjaxEditing", function () {
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        });
        });
        }, "mathjax-editing");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "557"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f1028%2fdo-random-forest-overfit%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        18












        $begingroup$

        Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.



        In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:





        From here, the variance of the expected generalization error of an ensemble corresponds to:





        where p(x) is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M, the variance of the ensemble decreases when ρ(x)<1. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.



        In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.






        share|improve this answer











        $endgroup$









        • 1




          $begingroup$
          That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
          $endgroup$
          – Hack-R
          Feb 18 '16 at 14:41








        • 3




          $begingroup$
          If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
          $endgroup$
          – tashuhka
          Feb 19 '16 at 13:43
















        18












        $begingroup$

        Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.



        In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:





        From here, the variance of the expected generalization error of an ensemble corresponds to:





        where p(x) is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M, the variance of the ensemble decreases when ρ(x)<1. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.



        In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.






        share|improve this answer











        $endgroup$









        • 1




          $begingroup$
          That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
          $endgroup$
          – Hack-R
          Feb 18 '16 at 14:41








        • 3




          $begingroup$
          If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
          $endgroup$
          – tashuhka
          Feb 19 '16 at 13:43














        18












        18








        18





        $begingroup$

        Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.



        In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:





        From here, the variance of the expected generalization error of an ensemble corresponds to:





        where p(x) is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M, the variance of the ensemble decreases when ρ(x)<1. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.



        In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.






        share|improve this answer











        $endgroup$



        Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.



        In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:





        From here, the variance of the expected generalization error of an ensemble corresponds to:





        where p(x) is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M, the variance of the ensemble decreases when ρ(x)<1. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.



        In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 17 '15 at 16:19









        DaL

        2,194411




        2,194411










        answered Oct 20 '14 at 9:31









        tashuhkatashuhka

        356310




        356310








        • 1




          $begingroup$
          That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
          $endgroup$
          – Hack-R
          Feb 18 '16 at 14:41








        • 3




          $begingroup$
          If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
          $endgroup$
          – tashuhka
          Feb 19 '16 at 13:43














        • 1




          $begingroup$
          That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
          $endgroup$
          – Hack-R
          Feb 18 '16 at 14:41








        • 3




          $begingroup$
          If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
          $endgroup$
          – tashuhka
          Feb 19 '16 at 13:43








        1




        1




        $begingroup$
        That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
        $endgroup$
        – Hack-R
        Feb 18 '16 at 14:41






        $begingroup$
        That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
        $endgroup$
        – Hack-R
        Feb 18 '16 at 14:41






        3




        3




        $begingroup$
        If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
        $endgroup$
        – tashuhka
        Feb 19 '16 at 13:43




        $begingroup$
        If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
        $endgroup$
        – tashuhka
        Feb 19 '16 at 13:43











        9












        $begingroup$

        You may want to check cross-validated - a stachexchange website for many things, including machine learning.



        In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit



        But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest in R you can only control the complexity






        share|improve this answer











        $endgroup$


















          9












          $begingroup$

          You may want to check cross-validated - a stachexchange website for many things, including machine learning.



          In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit



          But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest in R you can only control the complexity






          share|improve this answer











          $endgroup$
















            9












            9








            9





            $begingroup$

            You may want to check cross-validated - a stachexchange website for many things, including machine learning.



            In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit



            But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest in R you can only control the complexity






            share|improve this answer











            $endgroup$



            You may want to check cross-validated - a stachexchange website for many things, including machine learning.



            In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit



            But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest in R you can only control the complexity







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 13 '17 at 12:44









            Community

            1




            1










            answered Aug 24 '14 at 8:22









            Alexey GrigorevAlexey Grigorev

            1,900617




            1,900617























                1












                $begingroup$

                STRUCTURED DATASET -> MISLEADING OOB ERRORS



                I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.



                Detail :



                I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).

                Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).

                So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.



                easier to predict in case of error autocorrelation :
                known, known, prediction, known, prediction - OBB case

                harder one :
                known, known, known, prediction, prediction - real world prediction case



                I hope its interesting






                share|improve this answer









                $endgroup$


















                  1












                  $begingroup$

                  STRUCTURED DATASET -> MISLEADING OOB ERRORS



                  I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.



                  Detail :



                  I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).

                  Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).

                  So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.



                  easier to predict in case of error autocorrelation :
                  known, known, prediction, known, prediction - OBB case

                  harder one :
                  known, known, known, prediction, prediction - real world prediction case



                  I hope its interesting






                  share|improve this answer









                  $endgroup$
















                    1












                    1








                    1





                    $begingroup$

                    STRUCTURED DATASET -> MISLEADING OOB ERRORS



                    I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.



                    Detail :



                    I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).

                    Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).

                    So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.



                    easier to predict in case of error autocorrelation :
                    known, known, prediction, known, prediction - OBB case

                    harder one :
                    known, known, known, prediction, prediction - real world prediction case



                    I hope its interesting






                    share|improve this answer









                    $endgroup$



                    STRUCTURED DATASET -> MISLEADING OOB ERRORS



                    I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.



                    Detail :



                    I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).

                    Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).

                    So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.



                    easier to predict in case of error autocorrelation :
                    known, known, prediction, known, prediction - OBB case

                    harder one :
                    known, known, known, prediction, prediction - real world prediction case



                    I hope its interesting







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jul 22 '16 at 8:15









                    QbikQbik

                    1284




                    1284























                        1












                        $begingroup$


                        1. The Random Forest does overfit.

                        2. The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.


                        I've made a very simple experiment. I have generated the synthetic data:



                        y = 10 * x + noise


                        I've train two Random Forest models:




                        • one with full trees

                        • one with pruned trees


                        The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:



                        responses



                        It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:



                        growing trees



                        As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.






                        share|improve this answer









                        $endgroup$


















                          1












                          $begingroup$


                          1. The Random Forest does overfit.

                          2. The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.


                          I've made a very simple experiment. I have generated the synthetic data:



                          y = 10 * x + noise


                          I've train two Random Forest models:




                          • one with full trees

                          • one with pruned trees


                          The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:



                          responses



                          It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:



                          growing trees



                          As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.






                          share|improve this answer









                          $endgroup$
















                            1












                            1








                            1





                            $begingroup$


                            1. The Random Forest does overfit.

                            2. The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.


                            I've made a very simple experiment. I have generated the synthetic data:



                            y = 10 * x + noise


                            I've train two Random Forest models:




                            • one with full trees

                            • one with pruned trees


                            The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:



                            responses



                            It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:



                            growing trees



                            As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.






                            share|improve this answer









                            $endgroup$




                            1. The Random Forest does overfit.

                            2. The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.


                            I've made a very simple experiment. I have generated the synthetic data:



                            y = 10 * x + noise


                            I've train two Random Forest models:




                            • one with full trees

                            • one with pruned trees


                            The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:



                            responses



                            It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:



                            growing trees



                            As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered yesterday









                            pplonskipplonski

                            21115




                            21115






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Data Science Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f1028%2fdo-random-forest-overfit%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How to label and detect the document text images

                                Vallis Paradisi

                                Tabula Rosettana