Cross Entropy vs Entropy (Decision Tree)












2












$begingroup$


Several papers/books I have read say that cross-entropy is used when looking for the best split in a classification tree, e.g. The Elements of Statistical Learning (Hastie, Tibshirani, Friedman) without even mentioning entropy in the context of classification trees.



Yet, other sources mention entropy and not cross-entropy as a measure of finding the best splits. Are both measures usable? Is only cross-entropy used? Since the two concepts significantly differ from each other as far as my understanding goes.










share|improve this question







New contributor




shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    2












    $begingroup$


    Several papers/books I have read say that cross-entropy is used when looking for the best split in a classification tree, e.g. The Elements of Statistical Learning (Hastie, Tibshirani, Friedman) without even mentioning entropy in the context of classification trees.



    Yet, other sources mention entropy and not cross-entropy as a measure of finding the best splits. Are both measures usable? Is only cross-entropy used? Since the two concepts significantly differ from each other as far as my understanding goes.










    share|improve this question







    New contributor




    shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      2












      2








      2





      $begingroup$


      Several papers/books I have read say that cross-entropy is used when looking for the best split in a classification tree, e.g. The Elements of Statistical Learning (Hastie, Tibshirani, Friedman) without even mentioning entropy in the context of classification trees.



      Yet, other sources mention entropy and not cross-entropy as a measure of finding the best splits. Are both measures usable? Is only cross-entropy used? Since the two concepts significantly differ from each other as far as my understanding goes.










      share|improve this question







      New contributor




      shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      Several papers/books I have read say that cross-entropy is used when looking for the best split in a classification tree, e.g. The Elements of Statistical Learning (Hastie, Tibshirani, Friedman) without even mentioning entropy in the context of classification trees.



      Yet, other sources mention entropy and not cross-entropy as a measure of finding the best splits. Are both measures usable? Is only cross-entropy used? Since the two concepts significantly differ from each other as far as my understanding goes.







      machine-learning classification decision-trees






      share|improve this question







      New contributor




      shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      shenflowshenflow

      133




      133




      New contributor




      shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      shenflow is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$


          Are both measures usable? Is only cross-entropy used?




          They are both used for different reasons. But only one is used in decision trees if we agree on the definition.



          The most agreed upon and consistent use of entropy and cross-entropy is that entropy is a function of one distribution, i.e. $-sum_x P(x)mbox{log}P(x)$, and cross-entropy is a function of two distributions, i.e. $-sum_x P(x)mbox{log}Q(x)$ (integral for continuous $x$).



          Based on these definitions, the cross-entropy used in The Elements of Statistical Learning [Page 308, 9.2.3 Classification Trees] should be changed to entropy since it is a function of only one distribution $P_{m}(k)$, which is the ratio of class $k$ in node $m$. In my opinion, it can be due to historical reasons (the book also uses deviance to acknowledge the historical background I think). We can confidently use "entropy" for decision tree. For example, a split occurs when entropy of class distribution in parent node is higher than the weighted-average of entropies in left and right children (i.e. positive information gain).



          In addition, cross-entropy is mostly used as a loss function to bring one distribution (e.g. model estimation) close to another one (e.g. true distribution). A well-known example is classification cross-entropy (my answer). Also, KL-divergence (cross-entropy minus entropy) is basically used for the same reason.






          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Would you mind bringing a citation for cross entropy?
            $endgroup$
            – Media
            2 days ago








          • 1




            $begingroup$
            Thank you @Esmailian. That was what I was thinking aswell. It is kind of confusing when definitions overlap and different sources state different things. In which context is cross-entropy used though?
            $endgroup$
            – shenflow
            2 days ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          shenflow is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47168%2fcross-entropy-vs-entropy-decision-tree%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$


          Are both measures usable? Is only cross-entropy used?




          They are both used for different reasons. But only one is used in decision trees if we agree on the definition.



          The most agreed upon and consistent use of entropy and cross-entropy is that entropy is a function of one distribution, i.e. $-sum_x P(x)mbox{log}P(x)$, and cross-entropy is a function of two distributions, i.e. $-sum_x P(x)mbox{log}Q(x)$ (integral for continuous $x$).



          Based on these definitions, the cross-entropy used in The Elements of Statistical Learning [Page 308, 9.2.3 Classification Trees] should be changed to entropy since it is a function of only one distribution $P_{m}(k)$, which is the ratio of class $k$ in node $m$. In my opinion, it can be due to historical reasons (the book also uses deviance to acknowledge the historical background I think). We can confidently use "entropy" for decision tree. For example, a split occurs when entropy of class distribution in parent node is higher than the weighted-average of entropies in left and right children (i.e. positive information gain).



          In addition, cross-entropy is mostly used as a loss function to bring one distribution (e.g. model estimation) close to another one (e.g. true distribution). A well-known example is classification cross-entropy (my answer). Also, KL-divergence (cross-entropy minus entropy) is basically used for the same reason.






          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Would you mind bringing a citation for cross entropy?
            $endgroup$
            – Media
            2 days ago








          • 1




            $begingroup$
            Thank you @Esmailian. That was what I was thinking aswell. It is kind of confusing when definitions overlap and different sources state different things. In which context is cross-entropy used though?
            $endgroup$
            – shenflow
            2 days ago
















          2












          $begingroup$


          Are both measures usable? Is only cross-entropy used?




          They are both used for different reasons. But only one is used in decision trees if we agree on the definition.



          The most agreed upon and consistent use of entropy and cross-entropy is that entropy is a function of one distribution, i.e. $-sum_x P(x)mbox{log}P(x)$, and cross-entropy is a function of two distributions, i.e. $-sum_x P(x)mbox{log}Q(x)$ (integral for continuous $x$).



          Based on these definitions, the cross-entropy used in The Elements of Statistical Learning [Page 308, 9.2.3 Classification Trees] should be changed to entropy since it is a function of only one distribution $P_{m}(k)$, which is the ratio of class $k$ in node $m$. In my opinion, it can be due to historical reasons (the book also uses deviance to acknowledge the historical background I think). We can confidently use "entropy" for decision tree. For example, a split occurs when entropy of class distribution in parent node is higher than the weighted-average of entropies in left and right children (i.e. positive information gain).



          In addition, cross-entropy is mostly used as a loss function to bring one distribution (e.g. model estimation) close to another one (e.g. true distribution). A well-known example is classification cross-entropy (my answer). Also, KL-divergence (cross-entropy minus entropy) is basically used for the same reason.






          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Would you mind bringing a citation for cross entropy?
            $endgroup$
            – Media
            2 days ago








          • 1




            $begingroup$
            Thank you @Esmailian. That was what I was thinking aswell. It is kind of confusing when definitions overlap and different sources state different things. In which context is cross-entropy used though?
            $endgroup$
            – shenflow
            2 days ago














          2












          2








          2





          $begingroup$


          Are both measures usable? Is only cross-entropy used?




          They are both used for different reasons. But only one is used in decision trees if we agree on the definition.



          The most agreed upon and consistent use of entropy and cross-entropy is that entropy is a function of one distribution, i.e. $-sum_x P(x)mbox{log}P(x)$, and cross-entropy is a function of two distributions, i.e. $-sum_x P(x)mbox{log}Q(x)$ (integral for continuous $x$).



          Based on these definitions, the cross-entropy used in The Elements of Statistical Learning [Page 308, 9.2.3 Classification Trees] should be changed to entropy since it is a function of only one distribution $P_{m}(k)$, which is the ratio of class $k$ in node $m$. In my opinion, it can be due to historical reasons (the book also uses deviance to acknowledge the historical background I think). We can confidently use "entropy" for decision tree. For example, a split occurs when entropy of class distribution in parent node is higher than the weighted-average of entropies in left and right children (i.e. positive information gain).



          In addition, cross-entropy is mostly used as a loss function to bring one distribution (e.g. model estimation) close to another one (e.g. true distribution). A well-known example is classification cross-entropy (my answer). Also, KL-divergence (cross-entropy minus entropy) is basically used for the same reason.






          share|improve this answer











          $endgroup$




          Are both measures usable? Is only cross-entropy used?




          They are both used for different reasons. But only one is used in decision trees if we agree on the definition.



          The most agreed upon and consistent use of entropy and cross-entropy is that entropy is a function of one distribution, i.e. $-sum_x P(x)mbox{log}P(x)$, and cross-entropy is a function of two distributions, i.e. $-sum_x P(x)mbox{log}Q(x)$ (integral for continuous $x$).



          Based on these definitions, the cross-entropy used in The Elements of Statistical Learning [Page 308, 9.2.3 Classification Trees] should be changed to entropy since it is a function of only one distribution $P_{m}(k)$, which is the ratio of class $k$ in node $m$. In my opinion, it can be due to historical reasons (the book also uses deviance to acknowledge the historical background I think). We can confidently use "entropy" for decision tree. For example, a split occurs when entropy of class distribution in parent node is higher than the weighted-average of entropies in left and right children (i.e. positive information gain).



          In addition, cross-entropy is mostly used as a loss function to bring one distribution (e.g. model estimation) close to another one (e.g. true distribution). A well-known example is classification cross-entropy (my answer). Also, KL-divergence (cross-entropy minus entropy) is basically used for the same reason.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 days ago

























          answered 2 days ago









          EsmailianEsmailian

          1,077111




          1,077111








          • 1




            $begingroup$
            Would you mind bringing a citation for cross entropy?
            $endgroup$
            – Media
            2 days ago








          • 1




            $begingroup$
            Thank you @Esmailian. That was what I was thinking aswell. It is kind of confusing when definitions overlap and different sources state different things. In which context is cross-entropy used though?
            $endgroup$
            – shenflow
            2 days ago














          • 1




            $begingroup$
            Would you mind bringing a citation for cross entropy?
            $endgroup$
            – Media
            2 days ago








          • 1




            $begingroup$
            Thank you @Esmailian. That was what I was thinking aswell. It is kind of confusing when definitions overlap and different sources state different things. In which context is cross-entropy used though?
            $endgroup$
            – shenflow
            2 days ago








          1




          1




          $begingroup$
          Would you mind bringing a citation for cross entropy?
          $endgroup$
          – Media
          2 days ago






          $begingroup$
          Would you mind bringing a citation for cross entropy?
          $endgroup$
          – Media
          2 days ago






          1




          1




          $begingroup$
          Thank you @Esmailian. That was what I was thinking aswell. It is kind of confusing when definitions overlap and different sources state different things. In which context is cross-entropy used though?
          $endgroup$
          – shenflow
          2 days ago




          $begingroup$
          Thank you @Esmailian. That was what I was thinking aswell. It is kind of confusing when definitions overlap and different sources state different things. In which context is cross-entropy used though?
          $endgroup$
          – shenflow
          2 days ago










          shenflow is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          shenflow is a new contributor. Be nice, and check out our Code of Conduct.













          shenflow is a new contributor. Be nice, and check out our Code of Conduct.












          shenflow is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47168%2fcross-entropy-vs-entropy-decision-tree%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Vallis Paradisi

          Tabula Rosettana