Skewed two class data set












0












$begingroup$


Is there any theory on the influence of skew in the data set on the performance of binary classifiers? At work, we are doing abuse detection, the negative population is regular logins, and the positive population is attack logins (account take over = ATO).



However, the frequency of ATO logins is 1/50,000 or less. So, we have a very skewed natural data set. Should I "unskew" my training data set by downsampling the legit logins? How much can I do that and still keep a model that will work well on the actual data? Any theory behind that?










share|improve this question











$endgroup$








  • 1




    $begingroup$
    It is a rather common problem for anomaly detection problems, and downsampling the legit login is a reasonable first try. As long as the data is representative it could still work well (as long as the pattern is very different from that of ATOs, etc. sorry not a domain expert)
    $endgroup$
    – The Lyrist
    Oct 24 '18 at 16:23










  • $begingroup$
    Sure - but I want to understand why "it is reasonable", if there is a solid theory behind it and/or what happens at various levels of downsampling.
    $endgroup$
    – Frank
    Oct 24 '18 at 21:03
















0












$begingroup$


Is there any theory on the influence of skew in the data set on the performance of binary classifiers? At work, we are doing abuse detection, the negative population is regular logins, and the positive population is attack logins (account take over = ATO).



However, the frequency of ATO logins is 1/50,000 or less. So, we have a very skewed natural data set. Should I "unskew" my training data set by downsampling the legit logins? How much can I do that and still keep a model that will work well on the actual data? Any theory behind that?










share|improve this question











$endgroup$








  • 1




    $begingroup$
    It is a rather common problem for anomaly detection problems, and downsampling the legit login is a reasonable first try. As long as the data is representative it could still work well (as long as the pattern is very different from that of ATOs, etc. sorry not a domain expert)
    $endgroup$
    – The Lyrist
    Oct 24 '18 at 16:23










  • $begingroup$
    Sure - but I want to understand why "it is reasonable", if there is a solid theory behind it and/or what happens at various levels of downsampling.
    $endgroup$
    – Frank
    Oct 24 '18 at 21:03














0












0








0





$begingroup$


Is there any theory on the influence of skew in the data set on the performance of binary classifiers? At work, we are doing abuse detection, the negative population is regular logins, and the positive population is attack logins (account take over = ATO).



However, the frequency of ATO logins is 1/50,000 or less. So, we have a very skewed natural data set. Should I "unskew" my training data set by downsampling the legit logins? How much can I do that and still keep a model that will work well on the actual data? Any theory behind that?










share|improve this question











$endgroup$




Is there any theory on the influence of skew in the data set on the performance of binary classifiers? At work, we are doing abuse detection, the negative population is regular logins, and the positive population is attack logins (account take over = ATO).



However, the frequency of ATO logins is 1/50,000 or less. So, we have a very skewed natural data set. Should I "unskew" my training data set by downsampling the legit logins? How much can I do that and still keep a model that will work well on the actual data? Any theory behind that?







classification dataset anomaly-detection unbalanced-classes






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 25 '18 at 1:10









Stephen Rauch

1,52551229




1,52551229










asked Oct 24 '18 at 16:09









FrankFrank

1012




1012








  • 1




    $begingroup$
    It is a rather common problem for anomaly detection problems, and downsampling the legit login is a reasonable first try. As long as the data is representative it could still work well (as long as the pattern is very different from that of ATOs, etc. sorry not a domain expert)
    $endgroup$
    – The Lyrist
    Oct 24 '18 at 16:23










  • $begingroup$
    Sure - but I want to understand why "it is reasonable", if there is a solid theory behind it and/or what happens at various levels of downsampling.
    $endgroup$
    – Frank
    Oct 24 '18 at 21:03














  • 1




    $begingroup$
    It is a rather common problem for anomaly detection problems, and downsampling the legit login is a reasonable first try. As long as the data is representative it could still work well (as long as the pattern is very different from that of ATOs, etc. sorry not a domain expert)
    $endgroup$
    – The Lyrist
    Oct 24 '18 at 16:23










  • $begingroup$
    Sure - but I want to understand why "it is reasonable", if there is a solid theory behind it and/or what happens at various levels of downsampling.
    $endgroup$
    – Frank
    Oct 24 '18 at 21:03








1




1




$begingroup$
It is a rather common problem for anomaly detection problems, and downsampling the legit login is a reasonable first try. As long as the data is representative it could still work well (as long as the pattern is very different from that of ATOs, etc. sorry not a domain expert)
$endgroup$
– The Lyrist
Oct 24 '18 at 16:23




$begingroup$
It is a rather common problem for anomaly detection problems, and downsampling the legit login is a reasonable first try. As long as the data is representative it could still work well (as long as the pattern is very different from that of ATOs, etc. sorry not a domain expert)
$endgroup$
– The Lyrist
Oct 24 '18 at 16:23












$begingroup$
Sure - but I want to understand why "it is reasonable", if there is a solid theory behind it and/or what happens at various levels of downsampling.
$endgroup$
– Frank
Oct 24 '18 at 21:03




$begingroup$
Sure - but I want to understand why "it is reasonable", if there is a solid theory behind it and/or what happens at various levels of downsampling.
$endgroup$
– Frank
Oct 24 '18 at 21:03










2 Answers
2






active

oldest

votes


















1












$begingroup$

It is typically called a class imbalance issue, where the occurrence of a label happens so infrequently that makes predictions unreliable.



For instance, if I know Vancouver, Canada rains 85% of the time in winter, I would simply predict that it is raining when it is winter + vancouver. You don't want your algorithm to favour one label over another because one label predominates.



One common strategy would be resampling. If you have enough data, downsampling could make more sense as (oversampling requires the creation of synthetic data (e.g., SMOTE, etc.)). so that the algorithm can properly learn the difference between the two classes and wouldn't favour one over another. 50-50 split between the two classes would probably be a good starting point, but it also depends on what is available. You still want your negative labels to be representative, and even an 80-20 split would be a vast improvement already.



Another common solution is to increase the penalty of incorrect predictions. The way to think about it, false positive = admin spending time investigating a false alarm; false negative = rouge activities went undetected. For different business, one cost could be more severe than another so you could potentially say getting a false negative is 1000x more costly to business, etc.



Not knowing your data, those are probably the first two things (separately or together) to try. Most ML packages could handle either strategy rather easily so that's why I think those are reasonable things to try.



There are many thorough articles and tutorials with additional strategies available. Try the keywords anomaly detection and class imbalance and it seems to give me some pretty good results.






share|improve this answer











$endgroup$













  • $begingroup$
    Note that it is not just the actual data you would need, but also the business context, as you point out in your third paragraph: it can happen that, depending on the business context, false positives/negatives are actually more or less costly. The lesson here seems to be that if some event is rare, we can't really include that fact in the learning, as the model will just be swamped by the other class abundance. I was somehow hoping to include "attacks are rare" in the learning.
    $endgroup$
    – Frank
    Oct 24 '18 at 22:07










  • $begingroup$
    Also your answer is very interesting, but doesn't quite get at what I was after: why is it ok to change the data set statistics and still expect good performance on the actual problem data set? Does it depend on the type of model? For example, are GBDTs tolerant to class imbalance, whereas logistic regression would perform poorly on real data if you intentionally weighted the training data set? I'll check out "class imbalance".
    $endgroup$
    – Frank
    Oct 24 '18 at 22:11










  • $begingroup$
    @Frank what is important is whether the training data is representative of the actual data. Downsampling in this sense is essentially, instead of giving the model 10000000 negative sample vs 200 positive samples, you give 200 of each, etc. If your 200 is representative enough of your actually data, and your model generalizes well, it is actually ok not to include all the available data in your training set.
    $endgroup$
    – The Lyrist
    Oct 24 '18 at 22:17



















0












$begingroup$

In your case you need to use precision and recall error metric to get more insights on error, as in skewed data sets it is inefficient to maintain accuracy by looking at the normal error metric that we use.



You can refer this link for the details, I personally found it helpful.



Happy to answer!






share|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40166%2fskewed-two-class-data-set%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    It is typically called a class imbalance issue, where the occurrence of a label happens so infrequently that makes predictions unreliable.



    For instance, if I know Vancouver, Canada rains 85% of the time in winter, I would simply predict that it is raining when it is winter + vancouver. You don't want your algorithm to favour one label over another because one label predominates.



    One common strategy would be resampling. If you have enough data, downsampling could make more sense as (oversampling requires the creation of synthetic data (e.g., SMOTE, etc.)). so that the algorithm can properly learn the difference between the two classes and wouldn't favour one over another. 50-50 split between the two classes would probably be a good starting point, but it also depends on what is available. You still want your negative labels to be representative, and even an 80-20 split would be a vast improvement already.



    Another common solution is to increase the penalty of incorrect predictions. The way to think about it, false positive = admin spending time investigating a false alarm; false negative = rouge activities went undetected. For different business, one cost could be more severe than another so you could potentially say getting a false negative is 1000x more costly to business, etc.



    Not knowing your data, those are probably the first two things (separately or together) to try. Most ML packages could handle either strategy rather easily so that's why I think those are reasonable things to try.



    There are many thorough articles and tutorials with additional strategies available. Try the keywords anomaly detection and class imbalance and it seems to give me some pretty good results.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Note that it is not just the actual data you would need, but also the business context, as you point out in your third paragraph: it can happen that, depending on the business context, false positives/negatives are actually more or less costly. The lesson here seems to be that if some event is rare, we can't really include that fact in the learning, as the model will just be swamped by the other class abundance. I was somehow hoping to include "attacks are rare" in the learning.
      $endgroup$
      – Frank
      Oct 24 '18 at 22:07










    • $begingroup$
      Also your answer is very interesting, but doesn't quite get at what I was after: why is it ok to change the data set statistics and still expect good performance on the actual problem data set? Does it depend on the type of model? For example, are GBDTs tolerant to class imbalance, whereas logistic regression would perform poorly on real data if you intentionally weighted the training data set? I'll check out "class imbalance".
      $endgroup$
      – Frank
      Oct 24 '18 at 22:11










    • $begingroup$
      @Frank what is important is whether the training data is representative of the actual data. Downsampling in this sense is essentially, instead of giving the model 10000000 negative sample vs 200 positive samples, you give 200 of each, etc. If your 200 is representative enough of your actually data, and your model generalizes well, it is actually ok not to include all the available data in your training set.
      $endgroup$
      – The Lyrist
      Oct 24 '18 at 22:17
















    1












    $begingroup$

    It is typically called a class imbalance issue, where the occurrence of a label happens so infrequently that makes predictions unreliable.



    For instance, if I know Vancouver, Canada rains 85% of the time in winter, I would simply predict that it is raining when it is winter + vancouver. You don't want your algorithm to favour one label over another because one label predominates.



    One common strategy would be resampling. If you have enough data, downsampling could make more sense as (oversampling requires the creation of synthetic data (e.g., SMOTE, etc.)). so that the algorithm can properly learn the difference between the two classes and wouldn't favour one over another. 50-50 split between the two classes would probably be a good starting point, but it also depends on what is available. You still want your negative labels to be representative, and even an 80-20 split would be a vast improvement already.



    Another common solution is to increase the penalty of incorrect predictions. The way to think about it, false positive = admin spending time investigating a false alarm; false negative = rouge activities went undetected. For different business, one cost could be more severe than another so you could potentially say getting a false negative is 1000x more costly to business, etc.



    Not knowing your data, those are probably the first two things (separately or together) to try. Most ML packages could handle either strategy rather easily so that's why I think those are reasonable things to try.



    There are many thorough articles and tutorials with additional strategies available. Try the keywords anomaly detection and class imbalance and it seems to give me some pretty good results.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Note that it is not just the actual data you would need, but also the business context, as you point out in your third paragraph: it can happen that, depending on the business context, false positives/negatives are actually more or less costly. The lesson here seems to be that if some event is rare, we can't really include that fact in the learning, as the model will just be swamped by the other class abundance. I was somehow hoping to include "attacks are rare" in the learning.
      $endgroup$
      – Frank
      Oct 24 '18 at 22:07










    • $begingroup$
      Also your answer is very interesting, but doesn't quite get at what I was after: why is it ok to change the data set statistics and still expect good performance on the actual problem data set? Does it depend on the type of model? For example, are GBDTs tolerant to class imbalance, whereas logistic regression would perform poorly on real data if you intentionally weighted the training data set? I'll check out "class imbalance".
      $endgroup$
      – Frank
      Oct 24 '18 at 22:11










    • $begingroup$
      @Frank what is important is whether the training data is representative of the actual data. Downsampling in this sense is essentially, instead of giving the model 10000000 negative sample vs 200 positive samples, you give 200 of each, etc. If your 200 is representative enough of your actually data, and your model generalizes well, it is actually ok not to include all the available data in your training set.
      $endgroup$
      – The Lyrist
      Oct 24 '18 at 22:17














    1












    1








    1





    $begingroup$

    It is typically called a class imbalance issue, where the occurrence of a label happens so infrequently that makes predictions unreliable.



    For instance, if I know Vancouver, Canada rains 85% of the time in winter, I would simply predict that it is raining when it is winter + vancouver. You don't want your algorithm to favour one label over another because one label predominates.



    One common strategy would be resampling. If you have enough data, downsampling could make more sense as (oversampling requires the creation of synthetic data (e.g., SMOTE, etc.)). so that the algorithm can properly learn the difference between the two classes and wouldn't favour one over another. 50-50 split between the two classes would probably be a good starting point, but it also depends on what is available. You still want your negative labels to be representative, and even an 80-20 split would be a vast improvement already.



    Another common solution is to increase the penalty of incorrect predictions. The way to think about it, false positive = admin spending time investigating a false alarm; false negative = rouge activities went undetected. For different business, one cost could be more severe than another so you could potentially say getting a false negative is 1000x more costly to business, etc.



    Not knowing your data, those are probably the first two things (separately or together) to try. Most ML packages could handle either strategy rather easily so that's why I think those are reasonable things to try.



    There are many thorough articles and tutorials with additional strategies available. Try the keywords anomaly detection and class imbalance and it seems to give me some pretty good results.






    share|improve this answer











    $endgroup$



    It is typically called a class imbalance issue, where the occurrence of a label happens so infrequently that makes predictions unreliable.



    For instance, if I know Vancouver, Canada rains 85% of the time in winter, I would simply predict that it is raining when it is winter + vancouver. You don't want your algorithm to favour one label over another because one label predominates.



    One common strategy would be resampling. If you have enough data, downsampling could make more sense as (oversampling requires the creation of synthetic data (e.g., SMOTE, etc.)). so that the algorithm can properly learn the difference between the two classes and wouldn't favour one over another. 50-50 split between the two classes would probably be a good starting point, but it also depends on what is available. You still want your negative labels to be representative, and even an 80-20 split would be a vast improvement already.



    Another common solution is to increase the penalty of incorrect predictions. The way to think about it, false positive = admin spending time investigating a false alarm; false negative = rouge activities went undetected. For different business, one cost could be more severe than another so you could potentially say getting a false negative is 1000x more costly to business, etc.



    Not knowing your data, those are probably the first two things (separately or together) to try. Most ML packages could handle either strategy rather easily so that's why I think those are reasonable things to try.



    There are many thorough articles and tutorials with additional strategies available. Try the keywords anomaly detection and class imbalance and it seems to give me some pretty good results.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Oct 24 '18 at 22:51

























    answered Oct 24 '18 at 21:39









    The LyristThe Lyrist

    419112




    419112












    • $begingroup$
      Note that it is not just the actual data you would need, but also the business context, as you point out in your third paragraph: it can happen that, depending on the business context, false positives/negatives are actually more or less costly. The lesson here seems to be that if some event is rare, we can't really include that fact in the learning, as the model will just be swamped by the other class abundance. I was somehow hoping to include "attacks are rare" in the learning.
      $endgroup$
      – Frank
      Oct 24 '18 at 22:07










    • $begingroup$
      Also your answer is very interesting, but doesn't quite get at what I was after: why is it ok to change the data set statistics and still expect good performance on the actual problem data set? Does it depend on the type of model? For example, are GBDTs tolerant to class imbalance, whereas logistic regression would perform poorly on real data if you intentionally weighted the training data set? I'll check out "class imbalance".
      $endgroup$
      – Frank
      Oct 24 '18 at 22:11










    • $begingroup$
      @Frank what is important is whether the training data is representative of the actual data. Downsampling in this sense is essentially, instead of giving the model 10000000 negative sample vs 200 positive samples, you give 200 of each, etc. If your 200 is representative enough of your actually data, and your model generalizes well, it is actually ok not to include all the available data in your training set.
      $endgroup$
      – The Lyrist
      Oct 24 '18 at 22:17


















    • $begingroup$
      Note that it is not just the actual data you would need, but also the business context, as you point out in your third paragraph: it can happen that, depending on the business context, false positives/negatives are actually more or less costly. The lesson here seems to be that if some event is rare, we can't really include that fact in the learning, as the model will just be swamped by the other class abundance. I was somehow hoping to include "attacks are rare" in the learning.
      $endgroup$
      – Frank
      Oct 24 '18 at 22:07










    • $begingroup$
      Also your answer is very interesting, but doesn't quite get at what I was after: why is it ok to change the data set statistics and still expect good performance on the actual problem data set? Does it depend on the type of model? For example, are GBDTs tolerant to class imbalance, whereas logistic regression would perform poorly on real data if you intentionally weighted the training data set? I'll check out "class imbalance".
      $endgroup$
      – Frank
      Oct 24 '18 at 22:11










    • $begingroup$
      @Frank what is important is whether the training data is representative of the actual data. Downsampling in this sense is essentially, instead of giving the model 10000000 negative sample vs 200 positive samples, you give 200 of each, etc. If your 200 is representative enough of your actually data, and your model generalizes well, it is actually ok not to include all the available data in your training set.
      $endgroup$
      – The Lyrist
      Oct 24 '18 at 22:17
















    $begingroup$
    Note that it is not just the actual data you would need, but also the business context, as you point out in your third paragraph: it can happen that, depending on the business context, false positives/negatives are actually more or less costly. The lesson here seems to be that if some event is rare, we can't really include that fact in the learning, as the model will just be swamped by the other class abundance. I was somehow hoping to include "attacks are rare" in the learning.
    $endgroup$
    – Frank
    Oct 24 '18 at 22:07




    $begingroup$
    Note that it is not just the actual data you would need, but also the business context, as you point out in your third paragraph: it can happen that, depending on the business context, false positives/negatives are actually more or less costly. The lesson here seems to be that if some event is rare, we can't really include that fact in the learning, as the model will just be swamped by the other class abundance. I was somehow hoping to include "attacks are rare" in the learning.
    $endgroup$
    – Frank
    Oct 24 '18 at 22:07












    $begingroup$
    Also your answer is very interesting, but doesn't quite get at what I was after: why is it ok to change the data set statistics and still expect good performance on the actual problem data set? Does it depend on the type of model? For example, are GBDTs tolerant to class imbalance, whereas logistic regression would perform poorly on real data if you intentionally weighted the training data set? I'll check out "class imbalance".
    $endgroup$
    – Frank
    Oct 24 '18 at 22:11




    $begingroup$
    Also your answer is very interesting, but doesn't quite get at what I was after: why is it ok to change the data set statistics and still expect good performance on the actual problem data set? Does it depend on the type of model? For example, are GBDTs tolerant to class imbalance, whereas logistic regression would perform poorly on real data if you intentionally weighted the training data set? I'll check out "class imbalance".
    $endgroup$
    – Frank
    Oct 24 '18 at 22:11












    $begingroup$
    @Frank what is important is whether the training data is representative of the actual data. Downsampling in this sense is essentially, instead of giving the model 10000000 negative sample vs 200 positive samples, you give 200 of each, etc. If your 200 is representative enough of your actually data, and your model generalizes well, it is actually ok not to include all the available data in your training set.
    $endgroup$
    – The Lyrist
    Oct 24 '18 at 22:17




    $begingroup$
    @Frank what is important is whether the training data is representative of the actual data. Downsampling in this sense is essentially, instead of giving the model 10000000 negative sample vs 200 positive samples, you give 200 of each, etc. If your 200 is representative enough of your actually data, and your model generalizes well, it is actually ok not to include all the available data in your training set.
    $endgroup$
    – The Lyrist
    Oct 24 '18 at 22:17











    0












    $begingroup$

    In your case you need to use precision and recall error metric to get more insights on error, as in skewed data sets it is inefficient to maintain accuracy by looking at the normal error metric that we use.



    You can refer this link for the details, I personally found it helpful.



    Happy to answer!






    share|improve this answer











    $endgroup$


















      0












      $begingroup$

      In your case you need to use precision and recall error metric to get more insights on error, as in skewed data sets it is inefficient to maintain accuracy by looking at the normal error metric that we use.



      You can refer this link for the details, I personally found it helpful.



      Happy to answer!






      share|improve this answer











      $endgroup$
















        0












        0








        0





        $begingroup$

        In your case you need to use precision and recall error metric to get more insights on error, as in skewed data sets it is inefficient to maintain accuracy by looking at the normal error metric that we use.



        You can refer this link for the details, I personally found it helpful.



        Happy to answer!






        share|improve this answer











        $endgroup$



        In your case you need to use precision and recall error metric to get more insights on error, as in skewed data sets it is inefficient to maintain accuracy by looking at the normal error metric that we use.



        You can refer this link for the details, I personally found it helpful.



        Happy to answer!







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 12 hours ago

























        answered 14 hours ago









        Ankit AgrawalAnkit Agrawal

        415




        415






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40166%2fskewed-two-class-data-set%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to label and detect the document text images

            Vallis Paradisi

            Tabula Rosettana