How to handle negative words in word2vec?












7












$begingroup$


I am training a big corpus using word2vec and averaging the word vectors to get sentence vectors. What is the best way to address negative words so that negative and positive sentences are away from each other? For e.g.: "After the fix code worked" and "After the fix code did not work" should ideally give sentence vectors which are far from each other. I heard one approach is to look for negative words like "not" and negate the next word vector. Can someone please clarify if that's a good approach or can suggest a better approach?










share|improve this question









$endgroup$












  • $begingroup$
    Don't average them; use a document vector model like paragraph2vec. See the sentiment analysis in the Experiments section for a performance evaluation.
    $endgroup$
    – Emre
    Dec 17 '16 at 17:32
















7












$begingroup$


I am training a big corpus using word2vec and averaging the word vectors to get sentence vectors. What is the best way to address negative words so that negative and positive sentences are away from each other? For e.g.: "After the fix code worked" and "After the fix code did not work" should ideally give sentence vectors which are far from each other. I heard one approach is to look for negative words like "not" and negate the next word vector. Can someone please clarify if that's a good approach or can suggest a better approach?










share|improve this question









$endgroup$












  • $begingroup$
    Don't average them; use a document vector model like paragraph2vec. See the sentiment analysis in the Experiments section for a performance evaluation.
    $endgroup$
    – Emre
    Dec 17 '16 at 17:32














7












7








7


5



$begingroup$


I am training a big corpus using word2vec and averaging the word vectors to get sentence vectors. What is the best way to address negative words so that negative and positive sentences are away from each other? For e.g.: "After the fix code worked" and "After the fix code did not work" should ideally give sentence vectors which are far from each other. I heard one approach is to look for negative words like "not" and negate the next word vector. Can someone please clarify if that's a good approach or can suggest a better approach?










share|improve this question









$endgroup$




I am training a big corpus using word2vec and averaging the word vectors to get sentence vectors. What is the best way to address negative words so that negative and positive sentences are away from each other? For e.g.: "After the fix code worked" and "After the fix code did not work" should ideally give sentence vectors which are far from each other. I heard one approach is to look for negative words like "not" and negate the next word vector. Can someone please clarify if that's a good approach or can suggest a better approach?







machine-learning neural-network nlp word2vec






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 17 '16 at 11:36









ShamyShamy

593




593












  • $begingroup$
    Don't average them; use a document vector model like paragraph2vec. See the sentiment analysis in the Experiments section for a performance evaluation.
    $endgroup$
    – Emre
    Dec 17 '16 at 17:32


















  • $begingroup$
    Don't average them; use a document vector model like paragraph2vec. See the sentiment analysis in the Experiments section for a performance evaluation.
    $endgroup$
    – Emre
    Dec 17 '16 at 17:32
















$begingroup$
Don't average them; use a document vector model like paragraph2vec. See the sentiment analysis in the Experiments section for a performance evaluation.
$endgroup$
– Emre
Dec 17 '16 at 17:32




$begingroup$
Don't average them; use a document vector model like paragraph2vec. See the sentiment analysis in the Experiments section for a performance evaluation.
$endgroup$
– Emre
Dec 17 '16 at 17:32










4 Answers
4






active

oldest

votes


















3












$begingroup$

When you look at the vectors that word2vec generates - negative words may have unique features but can be treated just like positive words. That is to say, as far as the NN is concerned - these are just similar words. You may have to construct "concept vectors" on top of the word vectors to do what you would like to do.



Your parts of speech tagging should automatically mark negating words as ADV. You can then train on these adverbs in conjunction to your verbs as a positive or negative output. Here's an example using spacy:-



import spacy

nlp = spacy.load('en') # this can take a while
sample_text = u'Do not go.'
parsed_text = nlp(sample_text)
token_text = [token.orth_ for token in parsed_text]
token_pos = [token.pos_ for token in parsed_text]


At this point token_text will be a list of your words and token_pos will be the POS tagging:-



Do - VERB
not - ADV
go - VERB
. - PUNCT


As you can see, "not" is tagged as ADV here. You can now feed this tagged output (or a better parse tree) into a second network to train for a negative or positive output.



Hope this helps.






share|improve this answer









$endgroup$





















    0












    $begingroup$

    There is a possibility of refining word2vec vectors, which as research shows capture both, semantic relatedness and semantic similarity, in such a way, that they would capture the relations between words such as antonymy or negation. You can take a look at Counter-Fitting method (or methods in it's related work). Their implementation should be available online.
    This may improve results of your sentiment analysis method.






    share|improve this answer









    $endgroup$





















      0












      $begingroup$

      You can check this link. A way of handling negation is suggested 1.






      share|improve this answer








      New contributor




      Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













      • $begingroup$
        Please don't give link-only answers. The link might become dead after a while. Add a summary of the link and add it as a source link. But always add the answer here as text.
        $endgroup$
        – Tasos
        2 mins ago



















      -1












      $begingroup$

      You can see this paper Querying Word Embeddings for Similarity and Relatedness.






      share|improve this answer











      $endgroup$














        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "557"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f15784%2fhow-to-handle-negative-words-in-word2vec%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        3












        $begingroup$

        When you look at the vectors that word2vec generates - negative words may have unique features but can be treated just like positive words. That is to say, as far as the NN is concerned - these are just similar words. You may have to construct "concept vectors" on top of the word vectors to do what you would like to do.



        Your parts of speech tagging should automatically mark negating words as ADV. You can then train on these adverbs in conjunction to your verbs as a positive or negative output. Here's an example using spacy:-



        import spacy

        nlp = spacy.load('en') # this can take a while
        sample_text = u'Do not go.'
        parsed_text = nlp(sample_text)
        token_text = [token.orth_ for token in parsed_text]
        token_pos = [token.pos_ for token in parsed_text]


        At this point token_text will be a list of your words and token_pos will be the POS tagging:-



        Do - VERB
        not - ADV
        go - VERB
        . - PUNCT


        As you can see, "not" is tagged as ADV here. You can now feed this tagged output (or a better parse tree) into a second network to train for a negative or positive output.



        Hope this helps.






        share|improve this answer









        $endgroup$


















          3












          $begingroup$

          When you look at the vectors that word2vec generates - negative words may have unique features but can be treated just like positive words. That is to say, as far as the NN is concerned - these are just similar words. You may have to construct "concept vectors" on top of the word vectors to do what you would like to do.



          Your parts of speech tagging should automatically mark negating words as ADV. You can then train on these adverbs in conjunction to your verbs as a positive or negative output. Here's an example using spacy:-



          import spacy

          nlp = spacy.load('en') # this can take a while
          sample_text = u'Do not go.'
          parsed_text = nlp(sample_text)
          token_text = [token.orth_ for token in parsed_text]
          token_pos = [token.pos_ for token in parsed_text]


          At this point token_text will be a list of your words and token_pos will be the POS tagging:-



          Do - VERB
          not - ADV
          go - VERB
          . - PUNCT


          As you can see, "not" is tagged as ADV here. You can now feed this tagged output (or a better parse tree) into a second network to train for a negative or positive output.



          Hope this helps.






          share|improve this answer









          $endgroup$
















            3












            3








            3





            $begingroup$

            When you look at the vectors that word2vec generates - negative words may have unique features but can be treated just like positive words. That is to say, as far as the NN is concerned - these are just similar words. You may have to construct "concept vectors" on top of the word vectors to do what you would like to do.



            Your parts of speech tagging should automatically mark negating words as ADV. You can then train on these adverbs in conjunction to your verbs as a positive or negative output. Here's an example using spacy:-



            import spacy

            nlp = spacy.load('en') # this can take a while
            sample_text = u'Do not go.'
            parsed_text = nlp(sample_text)
            token_text = [token.orth_ for token in parsed_text]
            token_pos = [token.pos_ for token in parsed_text]


            At this point token_text will be a list of your words and token_pos will be the POS tagging:-



            Do - VERB
            not - ADV
            go - VERB
            . - PUNCT


            As you can see, "not" is tagged as ADV here. You can now feed this tagged output (or a better parse tree) into a second network to train for a negative or positive output.



            Hope this helps.






            share|improve this answer









            $endgroup$



            When you look at the vectors that word2vec generates - negative words may have unique features but can be treated just like positive words. That is to say, as far as the NN is concerned - these are just similar words. You may have to construct "concept vectors" on top of the word vectors to do what you would like to do.



            Your parts of speech tagging should automatically mark negating words as ADV. You can then train on these adverbs in conjunction to your verbs as a positive or negative output. Here's an example using spacy:-



            import spacy

            nlp = spacy.load('en') # this can take a while
            sample_text = u'Do not go.'
            parsed_text = nlp(sample_text)
            token_text = [token.orth_ for token in parsed_text]
            token_pos = [token.pos_ for token in parsed_text]


            At this point token_text will be a list of your words and token_pos will be the POS tagging:-



            Do - VERB
            not - ADV
            go - VERB
            . - PUNCT


            As you can see, "not" is tagged as ADV here. You can now feed this tagged output (or a better parse tree) into a second network to train for a negative or positive output.



            Hope this helps.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 18 '16 at 15:21









            Daniel WeeDaniel Wee

            662




            662























                0












                $begingroup$

                There is a possibility of refining word2vec vectors, which as research shows capture both, semantic relatedness and semantic similarity, in such a way, that they would capture the relations between words such as antonymy or negation. You can take a look at Counter-Fitting method (or methods in it's related work). Their implementation should be available online.
                This may improve results of your sentiment analysis method.






                share|improve this answer









                $endgroup$


















                  0












                  $begingroup$

                  There is a possibility of refining word2vec vectors, which as research shows capture both, semantic relatedness and semantic similarity, in such a way, that they would capture the relations between words such as antonymy or negation. You can take a look at Counter-Fitting method (or methods in it's related work). Their implementation should be available online.
                  This may improve results of your sentiment analysis method.






                  share|improve this answer









                  $endgroup$
















                    0












                    0








                    0





                    $begingroup$

                    There is a possibility of refining word2vec vectors, which as research shows capture both, semantic relatedness and semantic similarity, in such a way, that they would capture the relations between words such as antonymy or negation. You can take a look at Counter-Fitting method (or methods in it's related work). Their implementation should be available online.
                    This may improve results of your sentiment analysis method.






                    share|improve this answer









                    $endgroup$



                    There is a possibility of refining word2vec vectors, which as research shows capture both, semantic relatedness and semantic similarity, in such a way, that they would capture the relations between words such as antonymy or negation. You can take a look at Counter-Fitting method (or methods in it's related work). Their implementation should be available online.
                    This may improve results of your sentiment analysis method.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Apr 2 '18 at 14:45









                    Smarty77Smarty77

                    1012




                    1012























                        0












                        $begingroup$

                        You can check this link. A way of handling negation is suggested 1.






                        share|improve this answer








                        New contributor




                        Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$













                        • $begingroup$
                          Please don't give link-only answers. The link might become dead after a while. Add a summary of the link and add it as a source link. But always add the answer here as text.
                          $endgroup$
                          – Tasos
                          2 mins ago
















                        0












                        $begingroup$

                        You can check this link. A way of handling negation is suggested 1.






                        share|improve this answer








                        New contributor




                        Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$













                        • $begingroup$
                          Please don't give link-only answers. The link might become dead after a while. Add a summary of the link and add it as a source link. But always add the answer here as text.
                          $endgroup$
                          – Tasos
                          2 mins ago














                        0












                        0








                        0





                        $begingroup$

                        You can check this link. A way of handling negation is suggested 1.






                        share|improve this answer








                        New contributor




                        Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$



                        You can check this link. A way of handling negation is suggested 1.







                        share|improve this answer








                        New contributor




                        Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.









                        share|improve this answer



                        share|improve this answer






                        New contributor




                        Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.









                        answered 20 mins ago









                        Behzad MirzababaeiBehzad Mirzababaei

                        1




                        1




                        New contributor




                        Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.





                        New contributor





                        Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        Behzad Mirzababaei is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.












                        • $begingroup$
                          Please don't give link-only answers. The link might become dead after a while. Add a summary of the link and add it as a source link. But always add the answer here as text.
                          $endgroup$
                          – Tasos
                          2 mins ago


















                        • $begingroup$
                          Please don't give link-only answers. The link might become dead after a while. Add a summary of the link and add it as a source link. But always add the answer here as text.
                          $endgroup$
                          – Tasos
                          2 mins ago
















                        $begingroup$
                        Please don't give link-only answers. The link might become dead after a while. Add a summary of the link and add it as a source link. But always add the answer here as text.
                        $endgroup$
                        – Tasos
                        2 mins ago




                        $begingroup$
                        Please don't give link-only answers. The link might become dead after a while. Add a summary of the link and add it as a source link. But always add the answer here as text.
                        $endgroup$
                        – Tasos
                        2 mins ago











                        -1












                        $begingroup$

                        You can see this paper Querying Word Embeddings for Similarity and Relatedness.






                        share|improve this answer











                        $endgroup$


















                          -1












                          $begingroup$

                          You can see this paper Querying Word Embeddings for Similarity and Relatedness.






                          share|improve this answer











                          $endgroup$
















                            -1












                            -1








                            -1





                            $begingroup$

                            You can see this paper Querying Word Embeddings for Similarity and Relatedness.






                            share|improve this answer











                            $endgroup$



                            You can see this paper Querying Word Embeddings for Similarity and Relatedness.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 3 '18 at 14:46









                            Stephen Rauch

                            1,52551330




                            1,52551330










                            answered Nov 3 '18 at 14:21









                            Fatma.S.GadelrabFatma.S.Gadelrab

                            11




                            11






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Data Science Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f15784%2fhow-to-handle-negative-words-in-word2vec%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How to label and detect the document text images

                                Tabula Rosettana

                                Aureus (color)