Intuitive explanation of Noise Contrastive Estimation (NCE) loss?












25












$begingroup$


I read about NCE (a form of candidate sampling) from these two sources:



Tensorflow writeup



Original Paper



Can someone help me with the following:




  1. A simple explanation of how NCE works (I found the above difficult to parse and get an understanding of, so something intuitive that leads to the math presented there would be great)

  2. After point 1 above, a naturally intuitive description of how this is different from Negative Sampling. I can see that there's a slight change in the formula but could not understand the math. I do have an intuitive understanding of negative sampling in the context of word2vec - we randomly choose some samples from the vocabulary V and update only those because |V| is large and this offers a speedup. Please correct if wrong.

  3. When to use which one and how is that decided? It would be great if you could include examples(possibly easy to understand applications)

  4. Is NCE better than Negative Sampling? Better in what manner?


Thank you.










share|improve this question









$endgroup$












  • $begingroup$
    may be my post may helps. nanjiang.quora.com/Noise-contrastive-Estimation and later experiment with theano can be found at my github.com/jiangnanHugo/language_modeling. I hope my understanding is right.
    $endgroup$
    – jiangnan hugo
    Oct 6 '16 at 12:03
















25












$begingroup$


I read about NCE (a form of candidate sampling) from these two sources:



Tensorflow writeup



Original Paper



Can someone help me with the following:




  1. A simple explanation of how NCE works (I found the above difficult to parse and get an understanding of, so something intuitive that leads to the math presented there would be great)

  2. After point 1 above, a naturally intuitive description of how this is different from Negative Sampling. I can see that there's a slight change in the formula but could not understand the math. I do have an intuitive understanding of negative sampling in the context of word2vec - we randomly choose some samples from the vocabulary V and update only those because |V| is large and this offers a speedup. Please correct if wrong.

  3. When to use which one and how is that decided? It would be great if you could include examples(possibly easy to understand applications)

  4. Is NCE better than Negative Sampling? Better in what manner?


Thank you.










share|improve this question









$endgroup$












  • $begingroup$
    may be my post may helps. nanjiang.quora.com/Noise-contrastive-Estimation and later experiment with theano can be found at my github.com/jiangnanHugo/language_modeling. I hope my understanding is right.
    $endgroup$
    – jiangnan hugo
    Oct 6 '16 at 12:03














25












25








25


17



$begingroup$


I read about NCE (a form of candidate sampling) from these two sources:



Tensorflow writeup



Original Paper



Can someone help me with the following:




  1. A simple explanation of how NCE works (I found the above difficult to parse and get an understanding of, so something intuitive that leads to the math presented there would be great)

  2. After point 1 above, a naturally intuitive description of how this is different from Negative Sampling. I can see that there's a slight change in the formula but could not understand the math. I do have an intuitive understanding of negative sampling in the context of word2vec - we randomly choose some samples from the vocabulary V and update only those because |V| is large and this offers a speedup. Please correct if wrong.

  3. When to use which one and how is that decided? It would be great if you could include examples(possibly easy to understand applications)

  4. Is NCE better than Negative Sampling? Better in what manner?


Thank you.










share|improve this question









$endgroup$




I read about NCE (a form of candidate sampling) from these two sources:



Tensorflow writeup



Original Paper



Can someone help me with the following:




  1. A simple explanation of how NCE works (I found the above difficult to parse and get an understanding of, so something intuitive that leads to the math presented there would be great)

  2. After point 1 above, a naturally intuitive description of how this is different from Negative Sampling. I can see that there's a slight change in the formula but could not understand the math. I do have an intuitive understanding of negative sampling in the context of word2vec - we randomly choose some samples from the vocabulary V and update only those because |V| is large and this offers a speedup. Please correct if wrong.

  3. When to use which one and how is that decided? It would be great if you could include examples(possibly easy to understand applications)

  4. Is NCE better than Negative Sampling? Better in what manner?


Thank you.







deep-learning tensorflow word-embeddings sampling loss-function






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Aug 5 '16 at 3:36









tejaskhottejaskhot

1,16541318




1,16541318












  • $begingroup$
    may be my post may helps. nanjiang.quora.com/Noise-contrastive-Estimation and later experiment with theano can be found at my github.com/jiangnanHugo/language_modeling. I hope my understanding is right.
    $endgroup$
    – jiangnan hugo
    Oct 6 '16 at 12:03


















  • $begingroup$
    may be my post may helps. nanjiang.quora.com/Noise-contrastive-Estimation and later experiment with theano can be found at my github.com/jiangnanHugo/language_modeling. I hope my understanding is right.
    $endgroup$
    – jiangnan hugo
    Oct 6 '16 at 12:03
















$begingroup$
may be my post may helps. nanjiang.quora.com/Noise-contrastive-Estimation and later experiment with theano can be found at my github.com/jiangnanHugo/language_modeling. I hope my understanding is right.
$endgroup$
– jiangnan hugo
Oct 6 '16 at 12:03




$begingroup$
may be my post may helps. nanjiang.quora.com/Noise-contrastive-Estimation and later experiment with theano can be found at my github.com/jiangnanHugo/language_modeling. I hope my understanding is right.
$endgroup$
– jiangnan hugo
Oct 6 '16 at 12:03










2 Answers
2






active

oldest

votes


















20












$begingroup$

Taken from this post:https://stats.stackexchange.com/a/245452/154812



The issue



There are some issues with learning the word vectors using an "standard" neural network. In this way, the word vectors are learned while the network learns to predict the next word given a window of words (the input of the network).



Predicting the next word is like predicting the class. That is, such a network is just a "standard" multinomial (multi-class) classifier. And this network must have as many output neurons as classes there are. When classes are actual words, the number of neurons is, well, huge.



A "standard" neural network is usually trained with a cross-entropy cost function which requires the values of the output neurons to represent probabilities - which means that the output "scores" computed by the network for each class have to be normalized, converted into actual probabilities for each class. This normalization step is achieved by means of the softmax function. Softmax is very costly when applied to a huge output layer.



The (a) solution



In order to deal with this issue, that is, the expensive computation of the softmax, Word2Vec uses a technique called noise-contrastive estimation. This technique was introduced by [A] (reformulated by [B]) then used in [C], [D], [E] to learn word embeddings from unlabelled natural language text.



The basic idea is to convert a multinomial classification problem (as it is the problem of predicting the next word) to a binary classification problem. That is, instead of using softmax to estimate a true probability distribution of the output word, a binary logistic regression (binary classification) is used instead.



For each training sample, the enhanced (optimized) classifier is fed a true pair (a center word and another word that appears in its context) and a number of kk randomly corrupted pairs (consisting of the center word and a randomly chosen word from the vocabulary). By learning to distinguish the true pairs from corrupted ones, the classifier will ultimately learn the word vectors.



This is important: instead of predicting the next word (the "standard" training technique), the optimized classifier simply predicts whether a pair of words is good or bad.



Word2Vec slightly customizes the process and calls it negative sampling. In Word2Vec, the words for the negative samples (used for the corrupted pairs) are drawn from a specially designed distribution, which favours less frequent words to be drawn more often.



References



[A] (2005) - Contrastive estimation: Training log-linear models on unlabeled data



[B] (2010) - Noise-contrastive estimation: A new estimation principle for unnormalized statistical models



[C] (2008) - A unified architecture for natural language processing: Deep neural networks with multitask learning



[D] (2012) - A fast and simple algorithm for training neural probabilistic language models.



[E] (2013) - Learning word embeddings efficiently with noise-contrastive estimation.






share|improve this answer











$endgroup$





















    1












    $begingroup$

    Basically, this is selecting a sample from the true distribution which consists of the true class and some other noisy class labels. Then taking the softmax over it.



    This is based on sampling words from true distribution and noise distribution.



    Here the basic Idea is to train logistic regression classifier which can separate the samples obtained from true distribution and sample obtained from noise distribution. Remember When we are talking about the samples obtained from the true distribution we are talking about only one sample which is the true class obtained from the model distribution.



    Here I have explained about NCE loss and how it differ from the NCE loss .



    Noise Contrastive Estimation : Solution for expensive Softmax .






    share|improve this answer











    $endgroup$









    • 1




      $begingroup$
      While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
      $endgroup$
      – tuomastik
      Jul 19 '17 at 6:35











    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f13216%2fintuitive-explanation-of-noise-contrastive-estimation-nce-loss%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    20












    $begingroup$

    Taken from this post:https://stats.stackexchange.com/a/245452/154812



    The issue



    There are some issues with learning the word vectors using an "standard" neural network. In this way, the word vectors are learned while the network learns to predict the next word given a window of words (the input of the network).



    Predicting the next word is like predicting the class. That is, such a network is just a "standard" multinomial (multi-class) classifier. And this network must have as many output neurons as classes there are. When classes are actual words, the number of neurons is, well, huge.



    A "standard" neural network is usually trained with a cross-entropy cost function which requires the values of the output neurons to represent probabilities - which means that the output "scores" computed by the network for each class have to be normalized, converted into actual probabilities for each class. This normalization step is achieved by means of the softmax function. Softmax is very costly when applied to a huge output layer.



    The (a) solution



    In order to deal with this issue, that is, the expensive computation of the softmax, Word2Vec uses a technique called noise-contrastive estimation. This technique was introduced by [A] (reformulated by [B]) then used in [C], [D], [E] to learn word embeddings from unlabelled natural language text.



    The basic idea is to convert a multinomial classification problem (as it is the problem of predicting the next word) to a binary classification problem. That is, instead of using softmax to estimate a true probability distribution of the output word, a binary logistic regression (binary classification) is used instead.



    For each training sample, the enhanced (optimized) classifier is fed a true pair (a center word and another word that appears in its context) and a number of kk randomly corrupted pairs (consisting of the center word and a randomly chosen word from the vocabulary). By learning to distinguish the true pairs from corrupted ones, the classifier will ultimately learn the word vectors.



    This is important: instead of predicting the next word (the "standard" training technique), the optimized classifier simply predicts whether a pair of words is good or bad.



    Word2Vec slightly customizes the process and calls it negative sampling. In Word2Vec, the words for the negative samples (used for the corrupted pairs) are drawn from a specially designed distribution, which favours less frequent words to be drawn more often.



    References



    [A] (2005) - Contrastive estimation: Training log-linear models on unlabeled data



    [B] (2010) - Noise-contrastive estimation: A new estimation principle for unnormalized statistical models



    [C] (2008) - A unified architecture for natural language processing: Deep neural networks with multitask learning



    [D] (2012) - A fast and simple algorithm for training neural probabilistic language models.



    [E] (2013) - Learning word embeddings efficiently with noise-contrastive estimation.






    share|improve this answer











    $endgroup$


















      20












      $begingroup$

      Taken from this post:https://stats.stackexchange.com/a/245452/154812



      The issue



      There are some issues with learning the word vectors using an "standard" neural network. In this way, the word vectors are learned while the network learns to predict the next word given a window of words (the input of the network).



      Predicting the next word is like predicting the class. That is, such a network is just a "standard" multinomial (multi-class) classifier. And this network must have as many output neurons as classes there are. When classes are actual words, the number of neurons is, well, huge.



      A "standard" neural network is usually trained with a cross-entropy cost function which requires the values of the output neurons to represent probabilities - which means that the output "scores" computed by the network for each class have to be normalized, converted into actual probabilities for each class. This normalization step is achieved by means of the softmax function. Softmax is very costly when applied to a huge output layer.



      The (a) solution



      In order to deal with this issue, that is, the expensive computation of the softmax, Word2Vec uses a technique called noise-contrastive estimation. This technique was introduced by [A] (reformulated by [B]) then used in [C], [D], [E] to learn word embeddings from unlabelled natural language text.



      The basic idea is to convert a multinomial classification problem (as it is the problem of predicting the next word) to a binary classification problem. That is, instead of using softmax to estimate a true probability distribution of the output word, a binary logistic regression (binary classification) is used instead.



      For each training sample, the enhanced (optimized) classifier is fed a true pair (a center word and another word that appears in its context) and a number of kk randomly corrupted pairs (consisting of the center word and a randomly chosen word from the vocabulary). By learning to distinguish the true pairs from corrupted ones, the classifier will ultimately learn the word vectors.



      This is important: instead of predicting the next word (the "standard" training technique), the optimized classifier simply predicts whether a pair of words is good or bad.



      Word2Vec slightly customizes the process and calls it negative sampling. In Word2Vec, the words for the negative samples (used for the corrupted pairs) are drawn from a specially designed distribution, which favours less frequent words to be drawn more often.



      References



      [A] (2005) - Contrastive estimation: Training log-linear models on unlabeled data



      [B] (2010) - Noise-contrastive estimation: A new estimation principle for unnormalized statistical models



      [C] (2008) - A unified architecture for natural language processing: Deep neural networks with multitask learning



      [D] (2012) - A fast and simple algorithm for training neural probabilistic language models.



      [E] (2013) - Learning word embeddings efficiently with noise-contrastive estimation.






      share|improve this answer











      $endgroup$
















        20












        20








        20





        $begingroup$

        Taken from this post:https://stats.stackexchange.com/a/245452/154812



        The issue



        There are some issues with learning the word vectors using an "standard" neural network. In this way, the word vectors are learned while the network learns to predict the next word given a window of words (the input of the network).



        Predicting the next word is like predicting the class. That is, such a network is just a "standard" multinomial (multi-class) classifier. And this network must have as many output neurons as classes there are. When classes are actual words, the number of neurons is, well, huge.



        A "standard" neural network is usually trained with a cross-entropy cost function which requires the values of the output neurons to represent probabilities - which means that the output "scores" computed by the network for each class have to be normalized, converted into actual probabilities for each class. This normalization step is achieved by means of the softmax function. Softmax is very costly when applied to a huge output layer.



        The (a) solution



        In order to deal with this issue, that is, the expensive computation of the softmax, Word2Vec uses a technique called noise-contrastive estimation. This technique was introduced by [A] (reformulated by [B]) then used in [C], [D], [E] to learn word embeddings from unlabelled natural language text.



        The basic idea is to convert a multinomial classification problem (as it is the problem of predicting the next word) to a binary classification problem. That is, instead of using softmax to estimate a true probability distribution of the output word, a binary logistic regression (binary classification) is used instead.



        For each training sample, the enhanced (optimized) classifier is fed a true pair (a center word and another word that appears in its context) and a number of kk randomly corrupted pairs (consisting of the center word and a randomly chosen word from the vocabulary). By learning to distinguish the true pairs from corrupted ones, the classifier will ultimately learn the word vectors.



        This is important: instead of predicting the next word (the "standard" training technique), the optimized classifier simply predicts whether a pair of words is good or bad.



        Word2Vec slightly customizes the process and calls it negative sampling. In Word2Vec, the words for the negative samples (used for the corrupted pairs) are drawn from a specially designed distribution, which favours less frequent words to be drawn more often.



        References



        [A] (2005) - Contrastive estimation: Training log-linear models on unlabeled data



        [B] (2010) - Noise-contrastive estimation: A new estimation principle for unnormalized statistical models



        [C] (2008) - A unified architecture for natural language processing: Deep neural networks with multitask learning



        [D] (2012) - A fast and simple algorithm for training neural probabilistic language models.



        [E] (2013) - Learning word embeddings efficiently with noise-contrastive estimation.






        share|improve this answer











        $endgroup$



        Taken from this post:https://stats.stackexchange.com/a/245452/154812



        The issue



        There are some issues with learning the word vectors using an "standard" neural network. In this way, the word vectors are learned while the network learns to predict the next word given a window of words (the input of the network).



        Predicting the next word is like predicting the class. That is, such a network is just a "standard" multinomial (multi-class) classifier. And this network must have as many output neurons as classes there are. When classes are actual words, the number of neurons is, well, huge.



        A "standard" neural network is usually trained with a cross-entropy cost function which requires the values of the output neurons to represent probabilities - which means that the output "scores" computed by the network for each class have to be normalized, converted into actual probabilities for each class. This normalization step is achieved by means of the softmax function. Softmax is very costly when applied to a huge output layer.



        The (a) solution



        In order to deal with this issue, that is, the expensive computation of the softmax, Word2Vec uses a technique called noise-contrastive estimation. This technique was introduced by [A] (reformulated by [B]) then used in [C], [D], [E] to learn word embeddings from unlabelled natural language text.



        The basic idea is to convert a multinomial classification problem (as it is the problem of predicting the next word) to a binary classification problem. That is, instead of using softmax to estimate a true probability distribution of the output word, a binary logistic regression (binary classification) is used instead.



        For each training sample, the enhanced (optimized) classifier is fed a true pair (a center word and another word that appears in its context) and a number of kk randomly corrupted pairs (consisting of the center word and a randomly chosen word from the vocabulary). By learning to distinguish the true pairs from corrupted ones, the classifier will ultimately learn the word vectors.



        This is important: instead of predicting the next word (the "standard" training technique), the optimized classifier simply predicts whether a pair of words is good or bad.



        Word2Vec slightly customizes the process and calls it negative sampling. In Word2Vec, the words for the negative samples (used for the corrupted pairs) are drawn from a specially designed distribution, which favours less frequent words to be drawn more often.



        References



        [A] (2005) - Contrastive estimation: Training log-linear models on unlabeled data



        [B] (2010) - Noise-contrastive estimation: A new estimation principle for unnormalized statistical models



        [C] (2008) - A unified architecture for natural language processing: Deep neural networks with multitask learning



        [D] (2012) - A fast and simple algorithm for training neural probabilistic language models.



        [E] (2013) - Learning word embeddings efficiently with noise-contrastive estimation.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 13 '17 at 12:44









        Community

        1




        1










        answered Mar 27 '17 at 12:57









        user154812user154812

        32434




        32434























            1












            $begingroup$

            Basically, this is selecting a sample from the true distribution which consists of the true class and some other noisy class labels. Then taking the softmax over it.



            This is based on sampling words from true distribution and noise distribution.



            Here the basic Idea is to train logistic regression classifier which can separate the samples obtained from true distribution and sample obtained from noise distribution. Remember When we are talking about the samples obtained from the true distribution we are talking about only one sample which is the true class obtained from the model distribution.



            Here I have explained about NCE loss and how it differ from the NCE loss .



            Noise Contrastive Estimation : Solution for expensive Softmax .






            share|improve this answer











            $endgroup$









            • 1




              $begingroup$
              While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
              $endgroup$
              – tuomastik
              Jul 19 '17 at 6:35
















            1












            $begingroup$

            Basically, this is selecting a sample from the true distribution which consists of the true class and some other noisy class labels. Then taking the softmax over it.



            This is based on sampling words from true distribution and noise distribution.



            Here the basic Idea is to train logistic regression classifier which can separate the samples obtained from true distribution and sample obtained from noise distribution. Remember When we are talking about the samples obtained from the true distribution we are talking about only one sample which is the true class obtained from the model distribution.



            Here I have explained about NCE loss and how it differ from the NCE loss .



            Noise Contrastive Estimation : Solution for expensive Softmax .






            share|improve this answer











            $endgroup$









            • 1




              $begingroup$
              While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
              $endgroup$
              – tuomastik
              Jul 19 '17 at 6:35














            1












            1








            1





            $begingroup$

            Basically, this is selecting a sample from the true distribution which consists of the true class and some other noisy class labels. Then taking the softmax over it.



            This is based on sampling words from true distribution and noise distribution.



            Here the basic Idea is to train logistic regression classifier which can separate the samples obtained from true distribution and sample obtained from noise distribution. Remember When we are talking about the samples obtained from the true distribution we are talking about only one sample which is the true class obtained from the model distribution.



            Here I have explained about NCE loss and how it differ from the NCE loss .



            Noise Contrastive Estimation : Solution for expensive Softmax .






            share|improve this answer











            $endgroup$



            Basically, this is selecting a sample from the true distribution which consists of the true class and some other noisy class labels. Then taking the softmax over it.



            This is based on sampling words from true distribution and noise distribution.



            Here the basic Idea is to train logistic regression classifier which can separate the samples obtained from true distribution and sample obtained from noise distribution. Remember When we are talking about the samples obtained from the true distribution we are talking about only one sample which is the true class obtained from the model distribution.



            Here I have explained about NCE loss and how it differ from the NCE loss .



            Noise Contrastive Estimation : Solution for expensive Softmax .







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited yesterday









            Rohola Zandie

            33




            33










            answered Jul 19 '17 at 4:01









            Shamane SiriwardhanaShamane Siriwardhana

            390219




            390219








            • 1




              $begingroup$
              While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
              $endgroup$
              – tuomastik
              Jul 19 '17 at 6:35














            • 1




              $begingroup$
              While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
              $endgroup$
              – tuomastik
              Jul 19 '17 at 6:35








            1




            1




            $begingroup$
            While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
            $endgroup$
            – tuomastik
            Jul 19 '17 at 6:35




            $begingroup$
            While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
            $endgroup$
            – tuomastik
            Jul 19 '17 at 6:35


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f13216%2fintuitive-explanation-of-noise-contrastive-estimation-nce-loss%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to label and detect the document text images

            Tabula Rosettana

            Aureus (color)