How to get all distinct words within a set of lines?












1















I would like to extract a list of distinct words from a set of lines. Is there a way of doing this ?



Say for example I have lines that look like this:



[
[(isPhysicallySettledFxFwd, NO,"Y"),(isPhysicallySettledFxFwd,isPhysicallySettledFxSwap,"N")],
[(isPhysicallySettledFxSwap,NO,"Y"),(isPhysicallySettledFxSwap, isPhysicallySettledCommodity,"Y")],
[(isPhysicallySettledCommodity,NO,"Y"),(isPhysicallySettledCommodity,YES,"Y")]
]


Then i would get a list of distinct words, looking this:



isPhysicallySettledFxFwd
isPhysicallySettledFxSwap
isPhysicallySettledCommodity
NO
YES
Y
N
(
)
"
[
]
,


I am not sure how to even start, apart from copying the lines to Excel and doing lots of manipulations...










share|improve this question







New contributor




user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    1















    I would like to extract a list of distinct words from a set of lines. Is there a way of doing this ?



    Say for example I have lines that look like this:



    [
    [(isPhysicallySettledFxFwd, NO,"Y"),(isPhysicallySettledFxFwd,isPhysicallySettledFxSwap,"N")],
    [(isPhysicallySettledFxSwap,NO,"Y"),(isPhysicallySettledFxSwap, isPhysicallySettledCommodity,"Y")],
    [(isPhysicallySettledCommodity,NO,"Y"),(isPhysicallySettledCommodity,YES,"Y")]
    ]


    Then i would get a list of distinct words, looking this:



    isPhysicallySettledFxFwd
    isPhysicallySettledFxSwap
    isPhysicallySettledCommodity
    NO
    YES
    Y
    N
    (
    )
    "
    [
    ]
    ,


    I am not sure how to even start, apart from copying the lines to Excel and doing lots of manipulations...










    share|improve this question







    New contributor




    user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      1












      1








      1








      I would like to extract a list of distinct words from a set of lines. Is there a way of doing this ?



      Say for example I have lines that look like this:



      [
      [(isPhysicallySettledFxFwd, NO,"Y"),(isPhysicallySettledFxFwd,isPhysicallySettledFxSwap,"N")],
      [(isPhysicallySettledFxSwap,NO,"Y"),(isPhysicallySettledFxSwap, isPhysicallySettledCommodity,"Y")],
      [(isPhysicallySettledCommodity,NO,"Y"),(isPhysicallySettledCommodity,YES,"Y")]
      ]


      Then i would get a list of distinct words, looking this:



      isPhysicallySettledFxFwd
      isPhysicallySettledFxSwap
      isPhysicallySettledCommodity
      NO
      YES
      Y
      N
      (
      )
      "
      [
      ]
      ,


      I am not sure how to even start, apart from copying the lines to Excel and doing lots of manipulations...










      share|improve this question







      New contributor




      user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      I would like to extract a list of distinct words from a set of lines. Is there a way of doing this ?



      Say for example I have lines that look like this:



      [
      [(isPhysicallySettledFxFwd, NO,"Y"),(isPhysicallySettledFxFwd,isPhysicallySettledFxSwap,"N")],
      [(isPhysicallySettledFxSwap,NO,"Y"),(isPhysicallySettledFxSwap, isPhysicallySettledCommodity,"Y")],
      [(isPhysicallySettledCommodity,NO,"Y"),(isPhysicallySettledCommodity,YES,"Y")]
      ]


      Then i would get a list of distinct words, looking this:



      isPhysicallySettledFxFwd
      isPhysicallySettledFxSwap
      isPhysicallySettledCommodity
      NO
      YES
      Y
      N
      (
      )
      "
      [
      ]
      ,


      I am not sure how to even start, apart from copying the lines to Excel and doing lots of manipulations...







      regular-expression functions vi-words list






      share|improve this question







      New contributor




      user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 1 hour ago









      user3203476user3203476

      1083




      1083




      New contributor




      user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          2














          You can do something like this:



          :let a=
          :%s/w+/=add(a, submatch(0))/gn
          :new
          :put =uniq(sort(a))


          This will first declare a list a to work with. Then we run a :%s command, to capture all word-characters (w+) and act on all matches (g flag of the :s command), but won't actually replace (n flag). We use a sub-replace-expression(=) in the replacement part, to store the captured submatch in list a.



          And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.



          You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.






          share|improve this answer
























          • how wonderful ! thank you !!

            – user3203476
            29 mins ago



















          1














          Maybe this:



          :%s/W/rr/g
          :sort u
          :g/^s*$/d


          The first puts a line break before and after each non-word character.



          The second command sorts the entire file with the option "unique", so all duplicate lines are removed.



          The third command deletes all lines that are empty or only contain whitespaces.






          share|improve this answer
























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "599"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            user3203476 is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fvi.stackexchange.com%2fquestions%2f19653%2fhow-to-get-all-distinct-words-within-a-set-of-lines%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            You can do something like this:



            :let a=
            :%s/w+/=add(a, submatch(0))/gn
            :new
            :put =uniq(sort(a))


            This will first declare a list a to work with. Then we run a :%s command, to capture all word-characters (w+) and act on all matches (g flag of the :s command), but won't actually replace (n flag). We use a sub-replace-expression(=) in the replacement part, to store the captured submatch in list a.



            And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.



            You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.






            share|improve this answer
























            • how wonderful ! thank you !!

              – user3203476
              29 mins ago
















            2














            You can do something like this:



            :let a=
            :%s/w+/=add(a, submatch(0))/gn
            :new
            :put =uniq(sort(a))


            This will first declare a list a to work with. Then we run a :%s command, to capture all word-characters (w+) and act on all matches (g flag of the :s command), but won't actually replace (n flag). We use a sub-replace-expression(=) in the replacement part, to store the captured submatch in list a.



            And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.



            You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.






            share|improve this answer
























            • how wonderful ! thank you !!

              – user3203476
              29 mins ago














            2












            2








            2







            You can do something like this:



            :let a=
            :%s/w+/=add(a, submatch(0))/gn
            :new
            :put =uniq(sort(a))


            This will first declare a list a to work with. Then we run a :%s command, to capture all word-characters (w+) and act on all matches (g flag of the :s command), but won't actually replace (n flag). We use a sub-replace-expression(=) in the replacement part, to store the captured submatch in list a.



            And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.



            You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.






            share|improve this answer













            You can do something like this:



            :let a=
            :%s/w+/=add(a, submatch(0))/gn
            :new
            :put =uniq(sort(a))


            This will first declare a list a to work with. Then we run a :%s command, to capture all word-characters (w+) and act on all matches (g flag of the :s command), but won't actually replace (n flag). We use a sub-replace-expression(=) in the replacement part, to store the captured submatch in list a.



            And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.



            You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 48 mins ago









            Christian BrabandtChristian Brabandt

            16.2k2646




            16.2k2646













            • how wonderful ! thank you !!

              – user3203476
              29 mins ago



















            • how wonderful ! thank you !!

              – user3203476
              29 mins ago

















            how wonderful ! thank you !!

            – user3203476
            29 mins ago





            how wonderful ! thank you !!

            – user3203476
            29 mins ago











            1














            Maybe this:



            :%s/W/rr/g
            :sort u
            :g/^s*$/d


            The first puts a line break before and after each non-word character.



            The second command sorts the entire file with the option "unique", so all duplicate lines are removed.



            The third command deletes all lines that are empty or only contain whitespaces.






            share|improve this answer




























              1














              Maybe this:



              :%s/W/rr/g
              :sort u
              :g/^s*$/d


              The first puts a line break before and after each non-word character.



              The second command sorts the entire file with the option "unique", so all duplicate lines are removed.



              The third command deletes all lines that are empty or only contain whitespaces.






              share|improve this answer


























                1












                1








                1







                Maybe this:



                :%s/W/rr/g
                :sort u
                :g/^s*$/d


                The first puts a line break before and after each non-word character.



                The second command sorts the entire file with the option "unique", so all duplicate lines are removed.



                The third command deletes all lines that are empty or only contain whitespaces.






                share|improve this answer













                Maybe this:



                :%s/W/rr/g
                :sort u
                :g/^s*$/d


                The first puts a line break before and after each non-word character.



                The second command sorts the entire file with the option "unique", so all duplicate lines are removed.



                The third command deletes all lines that are empty or only contain whitespaces.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 27 mins ago









                RalfRalf

                3,6651317




                3,6651317






















                    user3203476 is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    user3203476 is a new contributor. Be nice, and check out our Code of Conduct.













                    user3203476 is a new contributor. Be nice, and check out our Code of Conduct.












                    user3203476 is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Vi and Vim Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fvi.stackexchange.com%2fquestions%2f19653%2fhow-to-get-all-distinct-words-within-a-set-of-lines%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to label and detect the document text images

                    Vallis Paradisi

                    Tabula Rosettana