How to compare the performance of two unsupervised algorithms on same data-set?












5












$begingroup$


I want to solve an anomaly detection problem on an unlabeled data-set. The only information about this problem is that the anomalies population is lower than 0.1%. It should be notice that the size of the feature vector for each sample is 40. Is there any clear way to compare the performance of unsupervised algorithms?










share|improve this question











$endgroup$

















    5












    $begingroup$


    I want to solve an anomaly detection problem on an unlabeled data-set. The only information about this problem is that the anomalies population is lower than 0.1%. It should be notice that the size of the feature vector for each sample is 40. Is there any clear way to compare the performance of unsupervised algorithms?










    share|improve this question











    $endgroup$















      5












      5








      5





      $begingroup$


      I want to solve an anomaly detection problem on an unlabeled data-set. The only information about this problem is that the anomalies population is lower than 0.1%. It should be notice that the size of the feature vector for each sample is 40. Is there any clear way to compare the performance of unsupervised algorithms?










      share|improve this question











      $endgroup$




      I want to solve an anomaly detection problem on an unlabeled data-set. The only information about this problem is that the anomalies population is lower than 0.1%. It should be notice that the size of the feature vector for each sample is 40. Is there any clear way to compare the performance of unsupervised algorithms?







      unsupervised-learning anomaly-detection unbalanced-classes evaluation






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 18 hours ago







      Alireza Zolanvari

















      asked 19 hours ago









      Alireza ZolanvariAlireza Zolanvari

      35716




      35716






















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.





          1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.




            1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
              COF, INFLO and LoOP) are not good candidates for global anomaly detection:
              2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data



          2. Objective comparison: possible in theory, impossible in practice.



          Requirements for objective comparison:




          1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


          2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


          3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


          4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,



          As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
            $endgroup$
            – Alireza Zolanvari
            18 hours ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47658%2fhow-to-compare-the-performance-of-two-unsupervised-algorithms-on-same-data-set%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.





          1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.




            1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
              COF, INFLO and LoOP) are not good candidates for global anomaly detection:
              2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data



          2. Objective comparison: possible in theory, impossible in practice.



          Requirements for objective comparison:




          1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


          2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


          3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


          4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,



          As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
            $endgroup$
            – Alireza Zolanvari
            18 hours ago
















          1












          $begingroup$

          For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.





          1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.




            1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
              COF, INFLO and LoOP) are not good candidates for global anomaly detection:
              2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data



          2. Objective comparison: possible in theory, impossible in practice.



          Requirements for objective comparison:




          1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


          2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


          3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


          4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,



          As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
            $endgroup$
            – Alireza Zolanvari
            18 hours ago














          1












          1








          1





          $begingroup$

          For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.





          1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.




            1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
              COF, INFLO and LoOP) are not good candidates for global anomaly detection:
              2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data



          2. Objective comparison: possible in theory, impossible in practice.



          Requirements for objective comparison:




          1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


          2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


          3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


          4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,



          As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.






          share|improve this answer











          $endgroup$



          For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.





          1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.




            1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
              COF, INFLO and LoOP) are not good candidates for global anomaly detection:
              2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data



          2. Objective comparison: possible in theory, impossible in practice.



          Requirements for objective comparison:




          1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


          2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


          3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


          4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,



          As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 15 hours ago

























          answered 18 hours ago









          EsmailianEsmailian

          1,601114




          1,601114












          • $begingroup$
            Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
            $endgroup$
            – Alireza Zolanvari
            18 hours ago


















          • $begingroup$
            Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
            $endgroup$
            – Alireza Zolanvari
            18 hours ago
















          $begingroup$
          Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
          $endgroup$
          – Alireza Zolanvari
          18 hours ago




          $begingroup$
          Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
          $endgroup$
          – Alireza Zolanvari
          18 hours ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47658%2fhow-to-compare-the-performance-of-two-unsupervised-algorithms-on-same-data-set%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Vallis Paradisi

          Tabula Rosettana