Gini Impurity vs Entropy












53












$begingroup$


Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)?



Which metric is better to use in different scenarios while using decision trees?










share|improve this question











$endgroup$








  • 5




    $begingroup$
    @Anony-Mousse I guess that was obvious before your comment. The question is not if both have their advantages, but in which scenarios one is better than the other.
    $endgroup$
    – Martin Thoma
    Feb 14 '16 at 10:34










  • $begingroup$
    I have proposed "Information gain" instead of "Entropy", since it is quite closer (IMHO), as marked in the related links. Then, the question was asked in a different form in When to use Gini impurity and when to use information gain?
    $endgroup$
    – Laurent Duval
    Feb 16 '16 at 7:23






  • 1




    $begingroup$
    I have posted here a simple interpretation of the Gini impurity that may be helpful.
    $endgroup$
    – Picaud Vincent
    Nov 6 '17 at 11:33


















53












$begingroup$


Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)?



Which metric is better to use in different scenarios while using decision trees?










share|improve this question











$endgroup$








  • 5




    $begingroup$
    @Anony-Mousse I guess that was obvious before your comment. The question is not if both have their advantages, but in which scenarios one is better than the other.
    $endgroup$
    – Martin Thoma
    Feb 14 '16 at 10:34










  • $begingroup$
    I have proposed "Information gain" instead of "Entropy", since it is quite closer (IMHO), as marked in the related links. Then, the question was asked in a different form in When to use Gini impurity and when to use information gain?
    $endgroup$
    – Laurent Duval
    Feb 16 '16 at 7:23






  • 1




    $begingroup$
    I have posted here a simple interpretation of the Gini impurity that may be helpful.
    $endgroup$
    – Picaud Vincent
    Nov 6 '17 at 11:33
















53












53








53


37



$begingroup$


Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)?



Which metric is better to use in different scenarios while using decision trees?










share|improve this question











$endgroup$




Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)?



Which metric is better to use in different scenarios while using decision trees?







machine-learning decision-trees






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 17 '16 at 8:45









Dawny33

5,47683188




5,47683188










asked Feb 12 '16 at 22:05









Krish MahajanKrish Mahajan

366144




366144








  • 5




    $begingroup$
    @Anony-Mousse I guess that was obvious before your comment. The question is not if both have their advantages, but in which scenarios one is better than the other.
    $endgroup$
    – Martin Thoma
    Feb 14 '16 at 10:34










  • $begingroup$
    I have proposed "Information gain" instead of "Entropy", since it is quite closer (IMHO), as marked in the related links. Then, the question was asked in a different form in When to use Gini impurity and when to use information gain?
    $endgroup$
    – Laurent Duval
    Feb 16 '16 at 7:23






  • 1




    $begingroup$
    I have posted here a simple interpretation of the Gini impurity that may be helpful.
    $endgroup$
    – Picaud Vincent
    Nov 6 '17 at 11:33
















  • 5




    $begingroup$
    @Anony-Mousse I guess that was obvious before your comment. The question is not if both have their advantages, but in which scenarios one is better than the other.
    $endgroup$
    – Martin Thoma
    Feb 14 '16 at 10:34










  • $begingroup$
    I have proposed "Information gain" instead of "Entropy", since it is quite closer (IMHO), as marked in the related links. Then, the question was asked in a different form in When to use Gini impurity and when to use information gain?
    $endgroup$
    – Laurent Duval
    Feb 16 '16 at 7:23






  • 1




    $begingroup$
    I have posted here a simple interpretation of the Gini impurity that may be helpful.
    $endgroup$
    – Picaud Vincent
    Nov 6 '17 at 11:33










5




5




$begingroup$
@Anony-Mousse I guess that was obvious before your comment. The question is not if both have their advantages, but in which scenarios one is better than the other.
$endgroup$
– Martin Thoma
Feb 14 '16 at 10:34




$begingroup$
@Anony-Mousse I guess that was obvious before your comment. The question is not if both have their advantages, but in which scenarios one is better than the other.
$endgroup$
– Martin Thoma
Feb 14 '16 at 10:34












$begingroup$
I have proposed "Information gain" instead of "Entropy", since it is quite closer (IMHO), as marked in the related links. Then, the question was asked in a different form in When to use Gini impurity and when to use information gain?
$endgroup$
– Laurent Duval
Feb 16 '16 at 7:23




$begingroup$
I have proposed "Information gain" instead of "Entropy", since it is quite closer (IMHO), as marked in the related links. Then, the question was asked in a different form in When to use Gini impurity and when to use information gain?
$endgroup$
– Laurent Duval
Feb 16 '16 at 7:23




1




1




$begingroup$
I have posted here a simple interpretation of the Gini impurity that may be helpful.
$endgroup$
– Picaud Vincent
Nov 6 '17 at 11:33






$begingroup$
I have posted here a simple interpretation of the Gini impurity that may be helpful.
$endgroup$
– Picaud Vincent
Nov 6 '17 at 11:33












8 Answers
8






active

oldest

votes


















39












$begingroup$

Gini impurity and Information Gain Entropy are pretty much the same. And people do use the values interchangeably. Below are the formulae of both:




  1. $textit{Gini}: mathit{Gini}(E) = 1 - sum_{j=1}^{c}p_j^2$

  2. $textit{Entropy}: H(E) = -sum_{j=1}^{c}p_jlog p_j$


Given a choice, I would use the Gini impurity, as it doesn't require me to compute logarithmic functions, which are computationally intensive. The closed form of it's solution can also be found.




Which metric is better to use in different scenarios while using
decision trees ?




The Gini impurity, for reasons stated above.



So, they are pretty much same when it comes to CART analytics.



Helpful reference for computational comparison of the two methods






share|improve this answer











$endgroup$









  • 1




    $begingroup$
    It is so common to see formula of entropy, while what is really used in decision tree looks like conditional entropy. I think it is important distinction or am missing something?
    $endgroup$
    – user1700890
    Aug 12 '17 at 13:01










  • $begingroup$
    @user1700890 The ID3 algorithm uses Info. gain entropy. I need to read up on conditional entropy. Probably an improvement over ID3 :)
    $endgroup$
    – Dawny33
    Aug 12 '17 at 13:55






  • 1




    $begingroup$
    I think your definition of the gini impurtiy might be wrong: en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
    $endgroup$
    – Martin Thoma
    Oct 19 '17 at 11:30



















14












$begingroup$

Generally, your performance will not change whether you use Gini impurity or Entropy.



Laura Elena Raileanu and Kilian Stoffel compared both in "Theoretical comparison between the gini index and information gain criteria". The most important remarks were:




  • It only matters in 2% of the cases whether you use gini impurity or entropy.

  • Entropy might be a little slower to compute (because it makes use of the logarithm).


I was once told that both metrics exist because they emerged in different disciplines of science.






share|improve this answer











$endgroup$





















    14












    $begingroup$


    Gini is intended for continuous attributes and Entropy is for attributes that occur in classes




    Gini is to minimize misclassification
    Entropy is for exploratory analysis




    Entropy is a little slower to compute







    share|improve this answer











    $endgroup$





















      12












      $begingroup$

      For the case of a variable with two values, appearing with fractions f and (1-f),

      the gini and entropy are given by:

      gini = 2*f(1-f)

      entropy = f*ln(1/f) + (1-f)*ln(1/(1-f))

      These measures are very similar if scaled to 1.0
      (plotting 2*gini and entropy/ln(2) ):



      Gini (y4,purple) and Entropy (y3,green) values scaled for comparison






      share|improve this answer









      $endgroup$





















        5












        $begingroup$

        To add upon the fact that there are more or less the same, consider also the fact that:
        $$
        begin{split}
        forall ; 0 < u < 1,; log (1-u) &= -u - u^2/2 - u^3/3 , + , cdots\
        forall ; 0 < p < 1,; log (p) &= p-1 - (1-p)^2/2 - (1-p)^3/3 , + , cdots\
        end{split}
        $$
        so that:
        $$
        forall ; 0 < p < 1,; -p log (p) = p(1-p) + p(1-p)^2/2 + p(1-p)^3/3 , + , cdots
        $$
        See the following plot of the two functions normalised to get 1 as maximum value: red curve is for Gini while black one is for entropy.
        Normalised Gini and Entropy criteria



        In the end as explained by @NIMISHAN Gini is more suitable to minimise misclassfication as it is symetric to 0.5, while entropy will more penalised small probabilities.






        share|improve this answer









        $endgroup$





















          2












          $begingroup$

          Entropy takes slightly more computation time than Gini Index because of the log calculation, maybe that's why Gini Index has become the default option for many ML algorithms.
          But, from Tan et. al book Introduction to Data Mining



          "Impurity measure are quite consistent with each other...
          Indeed, the strategy used to prune the tree has a greater impact on the final tree than the choice of impurity measure."



          So, it looks like the selection of impurity measure has little effect on the performance of single decision tree algorithms.



          Also.
          "Gini method works only when the target variable is a binary variable." - Learning Predictive Analytics with Python.






          share|improve this answer









          $endgroup$





















            1












            $begingroup$

            I've been doing optimizations on binary classification for the past week+, and in every case, entropy significantly outperforms gini. This may be data set specific, but it would seem like trying both while tuning hyperparameters is a rational choice, rather than making assumptions about the model ahead of time.



            You never know how data will react until you've run the statistics.






            share|improve this answer











            $endgroup$





















              0












              $begingroup$

              As per parsimony principal Gini outperform entropy as of computation ease (log is obvious has more computations involved rather that plain multiplication at processor/Machine level).



              But entropy definitely has an edge in some data cases involving high imbalance.



              Since entropy uses log of probabilities and multiplying with probabilities of event, what is happening at background is value of lower probabilities are getting scaled up.



              If your data probability distribution is exponential or Laplace (like in case of deep learning where we need probability distribution at sharp point) entropy outperform Gini.



              To give an example if you have 2 events one .01 probability and other .99 probability.



              In Gini Prob sq will be .01^2+.99^2, .0001 + .9801 means lower probability do not play any role as everything is govern by majority probability.



              Now in case of entropy .01*log(.01)+.99*log(.99)= .01*(-2)+ .99*(-.00436)
              = -.02-.00432
              now in this case clearly seen lower probabilities are given better weight-age.






              share|improve this answer










              New contributor




              Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$













                Your Answer





                StackExchange.ifUsing("editor", function () {
                return StackExchange.using("mathjaxEditing", function () {
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                });
                });
                }, "mathjax-editing");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "557"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f10228%2fgini-impurity-vs-entropy%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                8 Answers
                8






                active

                oldest

                votes








                8 Answers
                8






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                39












                $begingroup$

                Gini impurity and Information Gain Entropy are pretty much the same. And people do use the values interchangeably. Below are the formulae of both:




                1. $textit{Gini}: mathit{Gini}(E) = 1 - sum_{j=1}^{c}p_j^2$

                2. $textit{Entropy}: H(E) = -sum_{j=1}^{c}p_jlog p_j$


                Given a choice, I would use the Gini impurity, as it doesn't require me to compute logarithmic functions, which are computationally intensive. The closed form of it's solution can also be found.




                Which metric is better to use in different scenarios while using
                decision trees ?




                The Gini impurity, for reasons stated above.



                So, they are pretty much same when it comes to CART analytics.



                Helpful reference for computational comparison of the two methods






                share|improve this answer











                $endgroup$









                • 1




                  $begingroup$
                  It is so common to see formula of entropy, while what is really used in decision tree looks like conditional entropy. I think it is important distinction or am missing something?
                  $endgroup$
                  – user1700890
                  Aug 12 '17 at 13:01










                • $begingroup$
                  @user1700890 The ID3 algorithm uses Info. gain entropy. I need to read up on conditional entropy. Probably an improvement over ID3 :)
                  $endgroup$
                  – Dawny33
                  Aug 12 '17 at 13:55






                • 1




                  $begingroup$
                  I think your definition of the gini impurtiy might be wrong: en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
                  $endgroup$
                  – Martin Thoma
                  Oct 19 '17 at 11:30
















                39












                $begingroup$

                Gini impurity and Information Gain Entropy are pretty much the same. And people do use the values interchangeably. Below are the formulae of both:




                1. $textit{Gini}: mathit{Gini}(E) = 1 - sum_{j=1}^{c}p_j^2$

                2. $textit{Entropy}: H(E) = -sum_{j=1}^{c}p_jlog p_j$


                Given a choice, I would use the Gini impurity, as it doesn't require me to compute logarithmic functions, which are computationally intensive. The closed form of it's solution can also be found.




                Which metric is better to use in different scenarios while using
                decision trees ?




                The Gini impurity, for reasons stated above.



                So, they are pretty much same when it comes to CART analytics.



                Helpful reference for computational comparison of the two methods






                share|improve this answer











                $endgroup$









                • 1




                  $begingroup$
                  It is so common to see formula of entropy, while what is really used in decision tree looks like conditional entropy. I think it is important distinction or am missing something?
                  $endgroup$
                  – user1700890
                  Aug 12 '17 at 13:01










                • $begingroup$
                  @user1700890 The ID3 algorithm uses Info. gain entropy. I need to read up on conditional entropy. Probably an improvement over ID3 :)
                  $endgroup$
                  – Dawny33
                  Aug 12 '17 at 13:55






                • 1




                  $begingroup$
                  I think your definition of the gini impurtiy might be wrong: en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
                  $endgroup$
                  – Martin Thoma
                  Oct 19 '17 at 11:30














                39












                39








                39





                $begingroup$

                Gini impurity and Information Gain Entropy are pretty much the same. And people do use the values interchangeably. Below are the formulae of both:




                1. $textit{Gini}: mathit{Gini}(E) = 1 - sum_{j=1}^{c}p_j^2$

                2. $textit{Entropy}: H(E) = -sum_{j=1}^{c}p_jlog p_j$


                Given a choice, I would use the Gini impurity, as it doesn't require me to compute logarithmic functions, which are computationally intensive. The closed form of it's solution can also be found.




                Which metric is better to use in different scenarios while using
                decision trees ?




                The Gini impurity, for reasons stated above.



                So, they are pretty much same when it comes to CART analytics.



                Helpful reference for computational comparison of the two methods






                share|improve this answer











                $endgroup$



                Gini impurity and Information Gain Entropy are pretty much the same. And people do use the values interchangeably. Below are the formulae of both:




                1. $textit{Gini}: mathit{Gini}(E) = 1 - sum_{j=1}^{c}p_j^2$

                2. $textit{Entropy}: H(E) = -sum_{j=1}^{c}p_jlog p_j$


                Given a choice, I would use the Gini impurity, as it doesn't require me to compute logarithmic functions, which are computationally intensive. The closed form of it's solution can also be found.




                Which metric is better to use in different scenarios while using
                decision trees ?




                The Gini impurity, for reasons stated above.



                So, they are pretty much same when it comes to CART analytics.



                Helpful reference for computational comparison of the two methods







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Feb 17 '16 at 8:44

























                answered Feb 17 '16 at 6:13









                Dawny33Dawny33

                5,47683188




                5,47683188








                • 1




                  $begingroup$
                  It is so common to see formula of entropy, while what is really used in decision tree looks like conditional entropy. I think it is important distinction or am missing something?
                  $endgroup$
                  – user1700890
                  Aug 12 '17 at 13:01










                • $begingroup$
                  @user1700890 The ID3 algorithm uses Info. gain entropy. I need to read up on conditional entropy. Probably an improvement over ID3 :)
                  $endgroup$
                  – Dawny33
                  Aug 12 '17 at 13:55






                • 1




                  $begingroup$
                  I think your definition of the gini impurtiy might be wrong: en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
                  $endgroup$
                  – Martin Thoma
                  Oct 19 '17 at 11:30














                • 1




                  $begingroup$
                  It is so common to see formula of entropy, while what is really used in decision tree looks like conditional entropy. I think it is important distinction or am missing something?
                  $endgroup$
                  – user1700890
                  Aug 12 '17 at 13:01










                • $begingroup$
                  @user1700890 The ID3 algorithm uses Info. gain entropy. I need to read up on conditional entropy. Probably an improvement over ID3 :)
                  $endgroup$
                  – Dawny33
                  Aug 12 '17 at 13:55






                • 1




                  $begingroup$
                  I think your definition of the gini impurtiy might be wrong: en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
                  $endgroup$
                  – Martin Thoma
                  Oct 19 '17 at 11:30








                1




                1




                $begingroup$
                It is so common to see formula of entropy, while what is really used in decision tree looks like conditional entropy. I think it is important distinction or am missing something?
                $endgroup$
                – user1700890
                Aug 12 '17 at 13:01




                $begingroup$
                It is so common to see formula of entropy, while what is really used in decision tree looks like conditional entropy. I think it is important distinction or am missing something?
                $endgroup$
                – user1700890
                Aug 12 '17 at 13:01












                $begingroup$
                @user1700890 The ID3 algorithm uses Info. gain entropy. I need to read up on conditional entropy. Probably an improvement over ID3 :)
                $endgroup$
                – Dawny33
                Aug 12 '17 at 13:55




                $begingroup$
                @user1700890 The ID3 algorithm uses Info. gain entropy. I need to read up on conditional entropy. Probably an improvement over ID3 :)
                $endgroup$
                – Dawny33
                Aug 12 '17 at 13:55




                1




                1




                $begingroup$
                I think your definition of the gini impurtiy might be wrong: en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
                $endgroup$
                – Martin Thoma
                Oct 19 '17 at 11:30




                $begingroup$
                I think your definition of the gini impurtiy might be wrong: en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
                $endgroup$
                – Martin Thoma
                Oct 19 '17 at 11:30











                14












                $begingroup$

                Generally, your performance will not change whether you use Gini impurity or Entropy.



                Laura Elena Raileanu and Kilian Stoffel compared both in "Theoretical comparison between the gini index and information gain criteria". The most important remarks were:




                • It only matters in 2% of the cases whether you use gini impurity or entropy.

                • Entropy might be a little slower to compute (because it makes use of the logarithm).


                I was once told that both metrics exist because they emerged in different disciplines of science.






                share|improve this answer











                $endgroup$


















                  14












                  $begingroup$

                  Generally, your performance will not change whether you use Gini impurity or Entropy.



                  Laura Elena Raileanu and Kilian Stoffel compared both in "Theoretical comparison between the gini index and information gain criteria". The most important remarks were:




                  • It only matters in 2% of the cases whether you use gini impurity or entropy.

                  • Entropy might be a little slower to compute (because it makes use of the logarithm).


                  I was once told that both metrics exist because they emerged in different disciplines of science.






                  share|improve this answer











                  $endgroup$
















                    14












                    14








                    14





                    $begingroup$

                    Generally, your performance will not change whether you use Gini impurity or Entropy.



                    Laura Elena Raileanu and Kilian Stoffel compared both in "Theoretical comparison between the gini index and information gain criteria". The most important remarks were:




                    • It only matters in 2% of the cases whether you use gini impurity or entropy.

                    • Entropy might be a little slower to compute (because it makes use of the logarithm).


                    I was once told that both metrics exist because they emerged in different disciplines of science.






                    share|improve this answer











                    $endgroup$



                    Generally, your performance will not change whether you use Gini impurity or Entropy.



                    Laura Elena Raileanu and Kilian Stoffel compared both in "Theoretical comparison between the gini index and information gain criteria". The most important remarks were:




                    • It only matters in 2% of the cases whether you use gini impurity or entropy.

                    • Entropy might be a little slower to compute (because it makes use of the logarithm).


                    I was once told that both metrics exist because they emerged in different disciplines of science.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Sep 27 '17 at 2:35









                    MarkusK96

                    32




                    32










                    answered Feb 15 '16 at 12:17









                    ArchieArchie

                    498318




                    498318























                        14












                        $begingroup$


                        Gini is intended for continuous attributes and Entropy is for attributes that occur in classes




                        Gini is to minimize misclassification
                        Entropy is for exploratory analysis




                        Entropy is a little slower to compute







                        share|improve this answer











                        $endgroup$


















                          14












                          $begingroup$


                          Gini is intended for continuous attributes and Entropy is for attributes that occur in classes




                          Gini is to minimize misclassification
                          Entropy is for exploratory analysis




                          Entropy is a little slower to compute







                          share|improve this answer











                          $endgroup$
















                            14












                            14








                            14





                            $begingroup$


                            Gini is intended for continuous attributes and Entropy is for attributes that occur in classes




                            Gini is to minimize misclassification
                            Entropy is for exploratory analysis




                            Entropy is a little slower to compute







                            share|improve this answer











                            $endgroup$




                            Gini is intended for continuous attributes and Entropy is for attributes that occur in classes




                            Gini is to minimize misclassification
                            Entropy is for exploratory analysis




                            Entropy is a little slower to compute








                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Jun 29 '18 at 21:44









                            Joe S

                            1055




                            1055










                            answered Dec 14 '16 at 5:35









                            NIMISHANNIMISHAN

                            24327




                            24327























                                12












                                $begingroup$

                                For the case of a variable with two values, appearing with fractions f and (1-f),

                                the gini and entropy are given by:

                                gini = 2*f(1-f)

                                entropy = f*ln(1/f) + (1-f)*ln(1/(1-f))

                                These measures are very similar if scaled to 1.0
                                (plotting 2*gini and entropy/ln(2) ):



                                Gini (y4,purple) and Entropy (y3,green) values scaled for comparison






                                share|improve this answer









                                $endgroup$


















                                  12












                                  $begingroup$

                                  For the case of a variable with two values, appearing with fractions f and (1-f),

                                  the gini and entropy are given by:

                                  gini = 2*f(1-f)

                                  entropy = f*ln(1/f) + (1-f)*ln(1/(1-f))

                                  These measures are very similar if scaled to 1.0
                                  (plotting 2*gini and entropy/ln(2) ):



                                  Gini (y4,purple) and Entropy (y3,green) values scaled for comparison






                                  share|improve this answer









                                  $endgroup$
















                                    12












                                    12








                                    12





                                    $begingroup$

                                    For the case of a variable with two values, appearing with fractions f and (1-f),

                                    the gini and entropy are given by:

                                    gini = 2*f(1-f)

                                    entropy = f*ln(1/f) + (1-f)*ln(1/(1-f))

                                    These measures are very similar if scaled to 1.0
                                    (plotting 2*gini and entropy/ln(2) ):



                                    Gini (y4,purple) and Entropy (y3,green) values scaled for comparison






                                    share|improve this answer









                                    $endgroup$



                                    For the case of a variable with two values, appearing with fractions f and (1-f),

                                    the gini and entropy are given by:

                                    gini = 2*f(1-f)

                                    entropy = f*ln(1/f) + (1-f)*ln(1/(1-f))

                                    These measures are very similar if scaled to 1.0
                                    (plotting 2*gini and entropy/ln(2) ):



                                    Gini (y4,purple) and Entropy (y3,green) values scaled for comparison







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Apr 20 '17 at 15:37









                                    DanLvii DeweyDanLvii Dewey

                                    12112




                                    12112























                                        5












                                        $begingroup$

                                        To add upon the fact that there are more or less the same, consider also the fact that:
                                        $$
                                        begin{split}
                                        forall ; 0 < u < 1,; log (1-u) &= -u - u^2/2 - u^3/3 , + , cdots\
                                        forall ; 0 < p < 1,; log (p) &= p-1 - (1-p)^2/2 - (1-p)^3/3 , + , cdots\
                                        end{split}
                                        $$
                                        so that:
                                        $$
                                        forall ; 0 < p < 1,; -p log (p) = p(1-p) + p(1-p)^2/2 + p(1-p)^3/3 , + , cdots
                                        $$
                                        See the following plot of the two functions normalised to get 1 as maximum value: red curve is for Gini while black one is for entropy.
                                        Normalised Gini and Entropy criteria



                                        In the end as explained by @NIMISHAN Gini is more suitable to minimise misclassfication as it is symetric to 0.5, while entropy will more penalised small probabilities.






                                        share|improve this answer









                                        $endgroup$


















                                          5












                                          $begingroup$

                                          To add upon the fact that there are more or less the same, consider also the fact that:
                                          $$
                                          begin{split}
                                          forall ; 0 < u < 1,; log (1-u) &= -u - u^2/2 - u^3/3 , + , cdots\
                                          forall ; 0 < p < 1,; log (p) &= p-1 - (1-p)^2/2 - (1-p)^3/3 , + , cdots\
                                          end{split}
                                          $$
                                          so that:
                                          $$
                                          forall ; 0 < p < 1,; -p log (p) = p(1-p) + p(1-p)^2/2 + p(1-p)^3/3 , + , cdots
                                          $$
                                          See the following plot of the two functions normalised to get 1 as maximum value: red curve is for Gini while black one is for entropy.
                                          Normalised Gini and Entropy criteria



                                          In the end as explained by @NIMISHAN Gini is more suitable to minimise misclassfication as it is symetric to 0.5, while entropy will more penalised small probabilities.






                                          share|improve this answer









                                          $endgroup$
















                                            5












                                            5








                                            5





                                            $begingroup$

                                            To add upon the fact that there are more or less the same, consider also the fact that:
                                            $$
                                            begin{split}
                                            forall ; 0 < u < 1,; log (1-u) &= -u - u^2/2 - u^3/3 , + , cdots\
                                            forall ; 0 < p < 1,; log (p) &= p-1 - (1-p)^2/2 - (1-p)^3/3 , + , cdots\
                                            end{split}
                                            $$
                                            so that:
                                            $$
                                            forall ; 0 < p < 1,; -p log (p) = p(1-p) + p(1-p)^2/2 + p(1-p)^3/3 , + , cdots
                                            $$
                                            See the following plot of the two functions normalised to get 1 as maximum value: red curve is for Gini while black one is for entropy.
                                            Normalised Gini and Entropy criteria



                                            In the end as explained by @NIMISHAN Gini is more suitable to minimise misclassfication as it is symetric to 0.5, while entropy will more penalised small probabilities.






                                            share|improve this answer









                                            $endgroup$



                                            To add upon the fact that there are more or less the same, consider also the fact that:
                                            $$
                                            begin{split}
                                            forall ; 0 < u < 1,; log (1-u) &= -u - u^2/2 - u^3/3 , + , cdots\
                                            forall ; 0 < p < 1,; log (p) &= p-1 - (1-p)^2/2 - (1-p)^3/3 , + , cdots\
                                            end{split}
                                            $$
                                            so that:
                                            $$
                                            forall ; 0 < p < 1,; -p log (p) = p(1-p) + p(1-p)^2/2 + p(1-p)^3/3 , + , cdots
                                            $$
                                            See the following plot of the two functions normalised to get 1 as maximum value: red curve is for Gini while black one is for entropy.
                                            Normalised Gini and Entropy criteria



                                            In the end as explained by @NIMISHAN Gini is more suitable to minimise misclassfication as it is symetric to 0.5, while entropy will more penalised small probabilities.







                                            share|improve this answer












                                            share|improve this answer



                                            share|improve this answer










                                            answered Mar 29 '17 at 10:39









                                            clemlaflemmeclemlaflemme

                                            15111




                                            15111























                                                2












                                                $begingroup$

                                                Entropy takes slightly more computation time than Gini Index because of the log calculation, maybe that's why Gini Index has become the default option for many ML algorithms.
                                                But, from Tan et. al book Introduction to Data Mining



                                                "Impurity measure are quite consistent with each other...
                                                Indeed, the strategy used to prune the tree has a greater impact on the final tree than the choice of impurity measure."



                                                So, it looks like the selection of impurity measure has little effect on the performance of single decision tree algorithms.



                                                Also.
                                                "Gini method works only when the target variable is a binary variable." - Learning Predictive Analytics with Python.






                                                share|improve this answer









                                                $endgroup$


















                                                  2












                                                  $begingroup$

                                                  Entropy takes slightly more computation time than Gini Index because of the log calculation, maybe that's why Gini Index has become the default option for many ML algorithms.
                                                  But, from Tan et. al book Introduction to Data Mining



                                                  "Impurity measure are quite consistent with each other...
                                                  Indeed, the strategy used to prune the tree has a greater impact on the final tree than the choice of impurity measure."



                                                  So, it looks like the selection of impurity measure has little effect on the performance of single decision tree algorithms.



                                                  Also.
                                                  "Gini method works only when the target variable is a binary variable." - Learning Predictive Analytics with Python.






                                                  share|improve this answer









                                                  $endgroup$
















                                                    2












                                                    2








                                                    2





                                                    $begingroup$

                                                    Entropy takes slightly more computation time than Gini Index because of the log calculation, maybe that's why Gini Index has become the default option for many ML algorithms.
                                                    But, from Tan et. al book Introduction to Data Mining



                                                    "Impurity measure are quite consistent with each other...
                                                    Indeed, the strategy used to prune the tree has a greater impact on the final tree than the choice of impurity measure."



                                                    So, it looks like the selection of impurity measure has little effect on the performance of single decision tree algorithms.



                                                    Also.
                                                    "Gini method works only when the target variable is a binary variable." - Learning Predictive Analytics with Python.






                                                    share|improve this answer









                                                    $endgroup$



                                                    Entropy takes slightly more computation time than Gini Index because of the log calculation, maybe that's why Gini Index has become the default option for many ML algorithms.
                                                    But, from Tan et. al book Introduction to Data Mining



                                                    "Impurity measure are quite consistent with each other...
                                                    Indeed, the strategy used to prune the tree has a greater impact on the final tree than the choice of impurity measure."



                                                    So, it looks like the selection of impurity measure has little effect on the performance of single decision tree algorithms.



                                                    Also.
                                                    "Gini method works only when the target variable is a binary variable." - Learning Predictive Analytics with Python.







                                                    share|improve this answer












                                                    share|improve this answer



                                                    share|improve this answer










                                                    answered Jun 8 '18 at 4:00









                                                    Rakend DubbaRakend Dubba

                                                    214




                                                    214























                                                        1












                                                        $begingroup$

                                                        I've been doing optimizations on binary classification for the past week+, and in every case, entropy significantly outperforms gini. This may be data set specific, but it would seem like trying both while tuning hyperparameters is a rational choice, rather than making assumptions about the model ahead of time.



                                                        You never know how data will react until you've run the statistics.






                                                        share|improve this answer











                                                        $endgroup$


















                                                          1












                                                          $begingroup$

                                                          I've been doing optimizations on binary classification for the past week+, and in every case, entropy significantly outperforms gini. This may be data set specific, but it would seem like trying both while tuning hyperparameters is a rational choice, rather than making assumptions about the model ahead of time.



                                                          You never know how data will react until you've run the statistics.






                                                          share|improve this answer











                                                          $endgroup$
















                                                            1












                                                            1








                                                            1





                                                            $begingroup$

                                                            I've been doing optimizations on binary classification for the past week+, and in every case, entropy significantly outperforms gini. This may be data set specific, but it would seem like trying both while tuning hyperparameters is a rational choice, rather than making assumptions about the model ahead of time.



                                                            You never know how data will react until you've run the statistics.






                                                            share|improve this answer











                                                            $endgroup$



                                                            I've been doing optimizations on binary classification for the past week+, and in every case, entropy significantly outperforms gini. This may be data set specific, but it would seem like trying both while tuning hyperparameters is a rational choice, rather than making assumptions about the model ahead of time.



                                                            You never know how data will react until you've run the statistics.







                                                            share|improve this answer














                                                            share|improve this answer



                                                            share|improve this answer








                                                            edited Jun 29 '18 at 17:55









                                                            Stephen Rauch

                                                            1,52551229




                                                            1,52551229










                                                            answered Jun 29 '18 at 17:20









                                                            H FroedgeH Froedge

                                                            304




                                                            304























                                                                0












                                                                $begingroup$

                                                                As per parsimony principal Gini outperform entropy as of computation ease (log is obvious has more computations involved rather that plain multiplication at processor/Machine level).



                                                                But entropy definitely has an edge in some data cases involving high imbalance.



                                                                Since entropy uses log of probabilities and multiplying with probabilities of event, what is happening at background is value of lower probabilities are getting scaled up.



                                                                If your data probability distribution is exponential or Laplace (like in case of deep learning where we need probability distribution at sharp point) entropy outperform Gini.



                                                                To give an example if you have 2 events one .01 probability and other .99 probability.



                                                                In Gini Prob sq will be .01^2+.99^2, .0001 + .9801 means lower probability do not play any role as everything is govern by majority probability.



                                                                Now in case of entropy .01*log(.01)+.99*log(.99)= .01*(-2)+ .99*(-.00436)
                                                                = -.02-.00432
                                                                now in this case clearly seen lower probabilities are given better weight-age.






                                                                share|improve this answer










                                                                New contributor




                                                                Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                Check out our Code of Conduct.






                                                                $endgroup$


















                                                                  0












                                                                  $begingroup$

                                                                  As per parsimony principal Gini outperform entropy as of computation ease (log is obvious has more computations involved rather that plain multiplication at processor/Machine level).



                                                                  But entropy definitely has an edge in some data cases involving high imbalance.



                                                                  Since entropy uses log of probabilities and multiplying with probabilities of event, what is happening at background is value of lower probabilities are getting scaled up.



                                                                  If your data probability distribution is exponential or Laplace (like in case of deep learning where we need probability distribution at sharp point) entropy outperform Gini.



                                                                  To give an example if you have 2 events one .01 probability and other .99 probability.



                                                                  In Gini Prob sq will be .01^2+.99^2, .0001 + .9801 means lower probability do not play any role as everything is govern by majority probability.



                                                                  Now in case of entropy .01*log(.01)+.99*log(.99)= .01*(-2)+ .99*(-.00436)
                                                                  = -.02-.00432
                                                                  now in this case clearly seen lower probabilities are given better weight-age.






                                                                  share|improve this answer










                                                                  New contributor




                                                                  Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                  Check out our Code of Conduct.






                                                                  $endgroup$
















                                                                    0












                                                                    0








                                                                    0





                                                                    $begingroup$

                                                                    As per parsimony principal Gini outperform entropy as of computation ease (log is obvious has more computations involved rather that plain multiplication at processor/Machine level).



                                                                    But entropy definitely has an edge in some data cases involving high imbalance.



                                                                    Since entropy uses log of probabilities and multiplying with probabilities of event, what is happening at background is value of lower probabilities are getting scaled up.



                                                                    If your data probability distribution is exponential or Laplace (like in case of deep learning where we need probability distribution at sharp point) entropy outperform Gini.



                                                                    To give an example if you have 2 events one .01 probability and other .99 probability.



                                                                    In Gini Prob sq will be .01^2+.99^2, .0001 + .9801 means lower probability do not play any role as everything is govern by majority probability.



                                                                    Now in case of entropy .01*log(.01)+.99*log(.99)= .01*(-2)+ .99*(-.00436)
                                                                    = -.02-.00432
                                                                    now in this case clearly seen lower probabilities are given better weight-age.






                                                                    share|improve this answer










                                                                    New contributor




                                                                    Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                    Check out our Code of Conduct.






                                                                    $endgroup$



                                                                    As per parsimony principal Gini outperform entropy as of computation ease (log is obvious has more computations involved rather that plain multiplication at processor/Machine level).



                                                                    But entropy definitely has an edge in some data cases involving high imbalance.



                                                                    Since entropy uses log of probabilities and multiplying with probabilities of event, what is happening at background is value of lower probabilities are getting scaled up.



                                                                    If your data probability distribution is exponential or Laplace (like in case of deep learning where we need probability distribution at sharp point) entropy outperform Gini.



                                                                    To give an example if you have 2 events one .01 probability and other .99 probability.



                                                                    In Gini Prob sq will be .01^2+.99^2, .0001 + .9801 means lower probability do not play any role as everything is govern by majority probability.



                                                                    Now in case of entropy .01*log(.01)+.99*log(.99)= .01*(-2)+ .99*(-.00436)
                                                                    = -.02-.00432
                                                                    now in this case clearly seen lower probabilities are given better weight-age.







                                                                    share|improve this answer










                                                                    New contributor




                                                                    Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                    Check out our Code of Conduct.









                                                                    share|improve this answer



                                                                    share|improve this answer








                                                                    edited yesterday





















                                                                    New contributor




                                                                    Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                    Check out our Code of Conduct.









                                                                    answered 2 days ago









                                                                    Gaurav DograGaurav Dogra

                                                                    312




                                                                    312




                                                                    New contributor




                                                                    Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                    Check out our Code of Conduct.





                                                                    New contributor





                                                                    Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                    Check out our Code of Conduct.






                                                                    Gaurav Dogra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                                    Check out our Code of Conduct.






























                                                                        draft saved

                                                                        draft discarded




















































                                                                        Thanks for contributing an answer to Data Science Stack Exchange!


                                                                        • Please be sure to answer the question. Provide details and share your research!

                                                                        But avoid



                                                                        • Asking for help, clarification, or responding to other answers.

                                                                        • Making statements based on opinion; back them up with references or personal experience.


                                                                        Use MathJax to format equations. MathJax reference.


                                                                        To learn more, see our tips on writing great answers.




                                                                        draft saved


                                                                        draft discarded














                                                                        StackExchange.ready(
                                                                        function () {
                                                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f10228%2fgini-impurity-vs-entropy%23new-answer', 'question_page');
                                                                        }
                                                                        );

                                                                        Post as a guest















                                                                        Required, but never shown





















































                                                                        Required, but never shown














                                                                        Required, but never shown












                                                                        Required, but never shown







                                                                        Required, but never shown

































                                                                        Required, but never shown














                                                                        Required, but never shown












                                                                        Required, but never shown







                                                                        Required, but never shown







                                                                        Popular posts from this blog

                                                                        Callistus I

                                                                        Tabula Rosettana

                                                                        How to label and detect the document text images