How to choose metrics for evaluating classification results?












1












$begingroup$


Recently we have developed a python library named PyCM specialized for analyzing multi-class confusion matrices. A parameter recommender system has been added in version 1.9 of this module in order to recommend most related parameters considering the characteristics of the input dataset and its classification problem.
This new option is very challenging and raising many questions. At first, I try to explain the assumptions and describe how this module works in this part. After that, some questions are going to be asked for evaluating the performance of this recommender system.



Considered characteristics:



The characteristics according to which the parameters are suggested are as following:




  1. Classification problem type (binary or multi-class)

  2. Dataset type (balanced or imbalanced)


It should be noticed that in the case that the problem is either a binary or a multi-class classification on an imbalanced dataset, for recommending the parameters, just being imbalanced is considered. Therefore, the inspected states can be categorized into three main groups:




  1. Balanced dataset – Binary classification

  2. Balanced dataset – Multi-class classification

  3. Imbalanced dataset


The definition of being imbalanced:



Recognizing the fact that a classification problem is binary or multi-class is so easy. But the margin between being balanced or imbalanced for a dataset is not clear. In PyCM module for checking if the input dataset is balanced or not, a definition has been introduced. According to this definition, if the ratio of the population of the most populous class to the population of the most deserted class is bigger than 3, the dataset is assumed imbalanced.



Recommended parameters:



The recommendation lists have been gathered according to the respective paper of each parameter and the capabilities which had been claimed by the paper. For further information, read the document of PyCM or visit the project page.




  • Binary – Balanced recommended parameters: ACC, TPR, PPV, AUC, AUCI, TNR, F1


  • Multi-class – Balanced recommended parameters: ERR, TPR Micro, TPR Macro, PPV Micro, PPV Macro, ACC, Overall ACC, MCC, Overall MCC, BCD, Hamming Loss, Zero-one Loss


  • Imbalanced recommended parameters: Kappa, SOA1(Landis & Koch), SOA2(Fleiss), SOA3(Altman), SOA4(Cicchetti), CEN, MCEN, MCC, J, Overall J, Overall MCC, Overall CEN, Overall MCEN, AUC, AUCI, G, DP, DPI, GI



Questions:
1. Is the proposed definition of being imbalanced correct? Is there any more comprehensive definition for this characteristic?
2. Is recommending the same parameters for both binary and multi-class classification problem correct over imbalanced dataset?
3. Are the recommendation parameter lists correct and complete? Is there any other parameter for recommending?
4. Is there any other characteristics (like binary/multi-class and balanced/imbalanced) which can effect on evaluating the result of a classification method?



Website: http://www.pycm.ir/



Github: https://github.com/sepandhaghighi/pycm



Paper: https://www.theoj.org/joss-papers/joss.00729/10.21105.joss.00729.pdf










share|improve this question









$endgroup$

















    1












    $begingroup$


    Recently we have developed a python library named PyCM specialized for analyzing multi-class confusion matrices. A parameter recommender system has been added in version 1.9 of this module in order to recommend most related parameters considering the characteristics of the input dataset and its classification problem.
    This new option is very challenging and raising many questions. At first, I try to explain the assumptions and describe how this module works in this part. After that, some questions are going to be asked for evaluating the performance of this recommender system.



    Considered characteristics:



    The characteristics according to which the parameters are suggested are as following:




    1. Classification problem type (binary or multi-class)

    2. Dataset type (balanced or imbalanced)


    It should be noticed that in the case that the problem is either a binary or a multi-class classification on an imbalanced dataset, for recommending the parameters, just being imbalanced is considered. Therefore, the inspected states can be categorized into three main groups:




    1. Balanced dataset – Binary classification

    2. Balanced dataset – Multi-class classification

    3. Imbalanced dataset


    The definition of being imbalanced:



    Recognizing the fact that a classification problem is binary or multi-class is so easy. But the margin between being balanced or imbalanced for a dataset is not clear. In PyCM module for checking if the input dataset is balanced or not, a definition has been introduced. According to this definition, if the ratio of the population of the most populous class to the population of the most deserted class is bigger than 3, the dataset is assumed imbalanced.



    Recommended parameters:



    The recommendation lists have been gathered according to the respective paper of each parameter and the capabilities which had been claimed by the paper. For further information, read the document of PyCM or visit the project page.




    • Binary – Balanced recommended parameters: ACC, TPR, PPV, AUC, AUCI, TNR, F1


    • Multi-class – Balanced recommended parameters: ERR, TPR Micro, TPR Macro, PPV Micro, PPV Macro, ACC, Overall ACC, MCC, Overall MCC, BCD, Hamming Loss, Zero-one Loss


    • Imbalanced recommended parameters: Kappa, SOA1(Landis & Koch), SOA2(Fleiss), SOA3(Altman), SOA4(Cicchetti), CEN, MCEN, MCC, J, Overall J, Overall MCC, Overall CEN, Overall MCEN, AUC, AUCI, G, DP, DPI, GI



    Questions:
    1. Is the proposed definition of being imbalanced correct? Is there any more comprehensive definition for this characteristic?
    2. Is recommending the same parameters for both binary and multi-class classification problem correct over imbalanced dataset?
    3. Are the recommendation parameter lists correct and complete? Is there any other parameter for recommending?
    4. Is there any other characteristics (like binary/multi-class and balanced/imbalanced) which can effect on evaluating the result of a classification method?



    Website: http://www.pycm.ir/



    Github: https://github.com/sepandhaghighi/pycm



    Paper: https://www.theoj.org/joss-papers/joss.00729/10.21105.joss.00729.pdf










    share|improve this question









    $endgroup$















      1












      1








      1





      $begingroup$


      Recently we have developed a python library named PyCM specialized for analyzing multi-class confusion matrices. A parameter recommender system has been added in version 1.9 of this module in order to recommend most related parameters considering the characteristics of the input dataset and its classification problem.
      This new option is very challenging and raising many questions. At first, I try to explain the assumptions and describe how this module works in this part. After that, some questions are going to be asked for evaluating the performance of this recommender system.



      Considered characteristics:



      The characteristics according to which the parameters are suggested are as following:




      1. Classification problem type (binary or multi-class)

      2. Dataset type (balanced or imbalanced)


      It should be noticed that in the case that the problem is either a binary or a multi-class classification on an imbalanced dataset, for recommending the parameters, just being imbalanced is considered. Therefore, the inspected states can be categorized into three main groups:




      1. Balanced dataset – Binary classification

      2. Balanced dataset – Multi-class classification

      3. Imbalanced dataset


      The definition of being imbalanced:



      Recognizing the fact that a classification problem is binary or multi-class is so easy. But the margin between being balanced or imbalanced for a dataset is not clear. In PyCM module for checking if the input dataset is balanced or not, a definition has been introduced. According to this definition, if the ratio of the population of the most populous class to the population of the most deserted class is bigger than 3, the dataset is assumed imbalanced.



      Recommended parameters:



      The recommendation lists have been gathered according to the respective paper of each parameter and the capabilities which had been claimed by the paper. For further information, read the document of PyCM or visit the project page.




      • Binary – Balanced recommended parameters: ACC, TPR, PPV, AUC, AUCI, TNR, F1


      • Multi-class – Balanced recommended parameters: ERR, TPR Micro, TPR Macro, PPV Micro, PPV Macro, ACC, Overall ACC, MCC, Overall MCC, BCD, Hamming Loss, Zero-one Loss


      • Imbalanced recommended parameters: Kappa, SOA1(Landis & Koch), SOA2(Fleiss), SOA3(Altman), SOA4(Cicchetti), CEN, MCEN, MCC, J, Overall J, Overall MCC, Overall CEN, Overall MCEN, AUC, AUCI, G, DP, DPI, GI



      Questions:
      1. Is the proposed definition of being imbalanced correct? Is there any more comprehensive definition for this characteristic?
      2. Is recommending the same parameters for both binary and multi-class classification problem correct over imbalanced dataset?
      3. Are the recommendation parameter lists correct and complete? Is there any other parameter for recommending?
      4. Is there any other characteristics (like binary/multi-class and balanced/imbalanced) which can effect on evaluating the result of a classification method?



      Website: http://www.pycm.ir/



      Github: https://github.com/sepandhaghighi/pycm



      Paper: https://www.theoj.org/joss-papers/joss.00729/10.21105.joss.00729.pdf










      share|improve this question









      $endgroup$




      Recently we have developed a python library named PyCM specialized for analyzing multi-class confusion matrices. A parameter recommender system has been added in version 1.9 of this module in order to recommend most related parameters considering the characteristics of the input dataset and its classification problem.
      This new option is very challenging and raising many questions. At first, I try to explain the assumptions and describe how this module works in this part. After that, some questions are going to be asked for evaluating the performance of this recommender system.



      Considered characteristics:



      The characteristics according to which the parameters are suggested are as following:




      1. Classification problem type (binary or multi-class)

      2. Dataset type (balanced or imbalanced)


      It should be noticed that in the case that the problem is either a binary or a multi-class classification on an imbalanced dataset, for recommending the parameters, just being imbalanced is considered. Therefore, the inspected states can be categorized into three main groups:




      1. Balanced dataset – Binary classification

      2. Balanced dataset – Multi-class classification

      3. Imbalanced dataset


      The definition of being imbalanced:



      Recognizing the fact that a classification problem is binary or multi-class is so easy. But the margin between being balanced or imbalanced for a dataset is not clear. In PyCM module for checking if the input dataset is balanced or not, a definition has been introduced. According to this definition, if the ratio of the population of the most populous class to the population of the most deserted class is bigger than 3, the dataset is assumed imbalanced.



      Recommended parameters:



      The recommendation lists have been gathered according to the respective paper of each parameter and the capabilities which had been claimed by the paper. For further information, read the document of PyCM or visit the project page.




      • Binary – Balanced recommended parameters: ACC, TPR, PPV, AUC, AUCI, TNR, F1


      • Multi-class – Balanced recommended parameters: ERR, TPR Micro, TPR Macro, PPV Micro, PPV Macro, ACC, Overall ACC, MCC, Overall MCC, BCD, Hamming Loss, Zero-one Loss


      • Imbalanced recommended parameters: Kappa, SOA1(Landis & Koch), SOA2(Fleiss), SOA3(Altman), SOA4(Cicchetti), CEN, MCEN, MCC, J, Overall J, Overall MCC, Overall CEN, Overall MCEN, AUC, AUCI, G, DP, DPI, GI



      Questions:
      1. Is the proposed definition of being imbalanced correct? Is there any more comprehensive definition for this characteristic?
      2. Is recommending the same parameters for both binary and multi-class classification problem correct over imbalanced dataset?
      3. Are the recommendation parameter lists correct and complete? Is there any other parameter for recommending?
      4. Is there any other characteristics (like binary/multi-class and balanced/imbalanced) which can effect on evaluating the result of a classification method?



      Website: http://www.pycm.ir/



      Github: https://github.com/sepandhaghighi/pycm



      Paper: https://www.theoj.org/joss-papers/joss.00729/10.21105.joss.00729.pdf







      machine-learning python classification multiclass-classification confusion-matrix






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 19 hours ago









      alireza zolanvarialireza zolanvari

      595




      595






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46509%2fhow-to-choose-metrics-for-evaluating-classification-results%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46509%2fhow-to-choose-metrics-for-evaluating-classification-results%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Vallis Paradisi

          Tabula Rosettana