Delete all lines which don't have n characters before delimiter





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







10















I have a very long text file (from here) which should contain 6 hexadecimal characters then a 'break' (which appears as one character and doesn't seem to show up properly in the code markdown below) followed by a few words:



00107B  Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
5080 Cisco Systems, Inc
0E+00 ASUSTek COMPUTER INC.
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
2354 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc


I've done some looking around and can't see something which would work in this situation. My question is, how can I use grep/sed/awk/perl to delete all lines of this text file which do not start with exactly 6 hexadecimal characters and then a 'break'?



P.S. For bonus points, what's the best way of sorting the file alphabetically and numerically according to the hex characters (i.e. 000000 -> FFFFFF)? Should I just use sort?










share|improve this question































    10















    I have a very long text file (from here) which should contain 6 hexadecimal characters then a 'break' (which appears as one character and doesn't seem to show up properly in the code markdown below) followed by a few words:



    00107B  Cisco Systems, Inc
    00906D Cisco Systems, Inc
    0090BF Cisco Systems, Inc
    5080 Cisco Systems, Inc
    0E+00 ASUSTek COMPUTER INC.
    000C6E ASUSTek COMPUTER INC.
    001BFC ASUSTek COMPUTER INC.
    001E8C ASUSTek COMPUTER INC.
    0015F2 ASUSTek COMPUTER INC.
    2354 ASUSTek COMPUTER INC.
    001FC6 ASUSTek COMPUTER INC.
    60182E ShenZhen Protruly Electronic Ltd co.
    F4CFE2 Cisco Systems, Inc
    501CBF Cisco Systems, Inc


    I've done some looking around and can't see something which would work in this situation. My question is, how can I use grep/sed/awk/perl to delete all lines of this text file which do not start with exactly 6 hexadecimal characters and then a 'break'?



    P.S. For bonus points, what's the best way of sorting the file alphabetically and numerically according to the hex characters (i.e. 000000 -> FFFFFF)? Should I just use sort?










    share|improve this question



























      10












      10








      10








      I have a very long text file (from here) which should contain 6 hexadecimal characters then a 'break' (which appears as one character and doesn't seem to show up properly in the code markdown below) followed by a few words:



      00107B  Cisco Systems, Inc
      00906D Cisco Systems, Inc
      0090BF Cisco Systems, Inc
      5080 Cisco Systems, Inc
      0E+00 ASUSTek COMPUTER INC.
      000C6E ASUSTek COMPUTER INC.
      001BFC ASUSTek COMPUTER INC.
      001E8C ASUSTek COMPUTER INC.
      0015F2 ASUSTek COMPUTER INC.
      2354 ASUSTek COMPUTER INC.
      001FC6 ASUSTek COMPUTER INC.
      60182E ShenZhen Protruly Electronic Ltd co.
      F4CFE2 Cisco Systems, Inc
      501CBF Cisco Systems, Inc


      I've done some looking around and can't see something which would work in this situation. My question is, how can I use grep/sed/awk/perl to delete all lines of this text file which do not start with exactly 6 hexadecimal characters and then a 'break'?



      P.S. For bonus points, what's the best way of sorting the file alphabetically and numerically according to the hex characters (i.e. 000000 -> FFFFFF)? Should I just use sort?










      share|improve this question
















      I have a very long text file (from here) which should contain 6 hexadecimal characters then a 'break' (which appears as one character and doesn't seem to show up properly in the code markdown below) followed by a few words:



      00107B  Cisco Systems, Inc
      00906D Cisco Systems, Inc
      0090BF Cisco Systems, Inc
      5080 Cisco Systems, Inc
      0E+00 ASUSTek COMPUTER INC.
      000C6E ASUSTek COMPUTER INC.
      001BFC ASUSTek COMPUTER INC.
      001E8C ASUSTek COMPUTER INC.
      0015F2 ASUSTek COMPUTER INC.
      2354 ASUSTek COMPUTER INC.
      001FC6 ASUSTek COMPUTER INC.
      60182E ShenZhen Protruly Electronic Ltd co.
      F4CFE2 Cisco Systems, Inc
      501CBF Cisco Systems, Inc


      I've done some looking around and can't see something which would work in this situation. My question is, how can I use grep/sed/awk/perl to delete all lines of this text file which do not start with exactly 6 hexadecimal characters and then a 'break'?



      P.S. For bonus points, what's the best way of sorting the file alphabetically and numerically according to the hex characters (i.e. 000000 -> FFFFFF)? Should I just use sort?







      text-processing sed grep text-formatting






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 13 hours ago









      codeforester

      405418




      405418










      asked 15 hours ago









      RoccoRocco

      735




      735






















          2 Answers
          2






          active

          oldest

          votes


















          12














          $ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
          00107B Cisco Systems, Inc
          00906D Cisco Systems, Inc
          0090BF Cisco Systems, Inc
          000C6E ASUSTek COMPUTER INC.
          001BFC ASUSTek COMPUTER INC.
          001E8C ASUSTek COMPUTER INC.
          0015F2 ASUSTek COMPUTER INC.
          001FC6 ASUSTek COMPUTER INC.
          60182E ShenZhen Protruly Electronic Ltd co.
          F4CFE2 Cisco Systems, Inc
          501CBF Cisco Systems, Inc


          This uses awk to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]] pattern matches a hexadecimal digit, and {6} requires six of them. Together with the anchoring to the start and end of the field with ^ and $ respectively, this will only match on the wanted lines.



          Redirect to some file to save it under a new name.



          Note that this seems to work with GNU awk (commonly found on Linux), but not with awk on e.g. OpenBSD, or mawk.





          A similar approach with sed:



          $ sed -n '/^[[:xdigit:]]{6}>/p' file
          00107B Cisco Systems, Inc
          00906D Cisco Systems, Inc
          0090BF Cisco Systems, Inc
          000C6E ASUSTek COMPUTER INC.
          001BFC ASUSTek COMPUTER INC.
          001E8C ASUSTek COMPUTER INC.
          0015F2 ASUSTek COMPUTER INC.
          001FC6 ASUSTek COMPUTER INC.
          60182E ShenZhen Protruly Electronic Ltd co.
          F4CFE2 Cisco Systems, Inc
          501CBF Cisco Systems, Inc


          In this expression, > is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The > pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.





          For sorting the resulting data, just pipe the result trough sort, or sort -f if your hexadecimal numbers uses both upper and lower case letters






          share|improve this answer


























          • Perfect, thank you very much. Exactly what I was looking for!

            – Rocco
            14 hours ago



















          7














          And for completeness, you can do this with grep too:



          $ grep -E '^[[:xdigit:]]{6}b' oui.txt 
          00107B Cisco Systems, Inc
          00906D Cisco Systems, Inc
          0090BF Cisco Systems, Inc
          000C6E ASUSTek COMPUTER INC.
          001BFC ASUSTek COMPUTER INC.
          001E8C ASUSTek COMPUTER INC.
          0015F2 ASUSTek COMPUTER INC.
          001FC6 ASUSTek COMPUTER INC.
          60182E ShenZhen Protruly Electronic Ltd co.
          F4CFE2 Cisco Systems, Inc
          501CBF Cisco Systems, Inc
          $


          This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b).






          share|improve this answer
























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "106"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511695%2fdelete-all-lines-which-dont-have-n-characters-before-delimiter%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            12














            $ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc


            This uses awk to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]] pattern matches a hexadecimal digit, and {6} requires six of them. Together with the anchoring to the start and end of the field with ^ and $ respectively, this will only match on the wanted lines.



            Redirect to some file to save it under a new name.



            Note that this seems to work with GNU awk (commonly found on Linux), but not with awk on e.g. OpenBSD, or mawk.





            A similar approach with sed:



            $ sed -n '/^[[:xdigit:]]{6}>/p' file
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc


            In this expression, > is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The > pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.





            For sorting the resulting data, just pipe the result trough sort, or sort -f if your hexadecimal numbers uses both upper and lower case letters






            share|improve this answer


























            • Perfect, thank you very much. Exactly what I was looking for!

              – Rocco
              14 hours ago
















            12














            $ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc


            This uses awk to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]] pattern matches a hexadecimal digit, and {6} requires six of them. Together with the anchoring to the start and end of the field with ^ and $ respectively, this will only match on the wanted lines.



            Redirect to some file to save it under a new name.



            Note that this seems to work with GNU awk (commonly found on Linux), but not with awk on e.g. OpenBSD, or mawk.





            A similar approach with sed:



            $ sed -n '/^[[:xdigit:]]{6}>/p' file
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc


            In this expression, > is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The > pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.





            For sorting the resulting data, just pipe the result trough sort, or sort -f if your hexadecimal numbers uses both upper and lower case letters






            share|improve this answer


























            • Perfect, thank you very much. Exactly what I was looking for!

              – Rocco
              14 hours ago














            12












            12








            12







            $ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc


            This uses awk to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]] pattern matches a hexadecimal digit, and {6} requires six of them. Together with the anchoring to the start and end of the field with ^ and $ respectively, this will only match on the wanted lines.



            Redirect to some file to save it under a new name.



            Note that this seems to work with GNU awk (commonly found on Linux), but not with awk on e.g. OpenBSD, or mawk.





            A similar approach with sed:



            $ sed -n '/^[[:xdigit:]]{6}>/p' file
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc


            In this expression, > is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The > pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.





            For sorting the resulting data, just pipe the result trough sort, or sort -f if your hexadecimal numbers uses both upper and lower case letters






            share|improve this answer















            $ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc


            This uses awk to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]] pattern matches a hexadecimal digit, and {6} requires six of them. Together with the anchoring to the start and end of the field with ^ and $ respectively, this will only match on the wanted lines.



            Redirect to some file to save it under a new name.



            Note that this seems to work with GNU awk (commonly found on Linux), but not with awk on e.g. OpenBSD, or mawk.





            A similar approach with sed:



            $ sed -n '/^[[:xdigit:]]{6}>/p' file
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc


            In this expression, > is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The > pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.





            For sorting the resulting data, just pipe the result trough sort, or sort -f if your hexadecimal numbers uses both upper and lower case letters







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 14 hours ago

























            answered 15 hours ago









            KusalanandaKusalananda

            141k17262438




            141k17262438













            • Perfect, thank you very much. Exactly what I was looking for!

              – Rocco
              14 hours ago



















            • Perfect, thank you very much. Exactly what I was looking for!

              – Rocco
              14 hours ago

















            Perfect, thank you very much. Exactly what I was looking for!

            – Rocco
            14 hours ago





            Perfect, thank you very much. Exactly what I was looking for!

            – Rocco
            14 hours ago













            7














            And for completeness, you can do this with grep too:



            $ grep -E '^[[:xdigit:]]{6}b' oui.txt 
            00107B Cisco Systems, Inc
            00906D Cisco Systems, Inc
            0090BF Cisco Systems, Inc
            000C6E ASUSTek COMPUTER INC.
            001BFC ASUSTek COMPUTER INC.
            001E8C ASUSTek COMPUTER INC.
            0015F2 ASUSTek COMPUTER INC.
            001FC6 ASUSTek COMPUTER INC.
            60182E ShenZhen Protruly Electronic Ltd co.
            F4CFE2 Cisco Systems, Inc
            501CBF Cisco Systems, Inc
            $


            This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b).






            share|improve this answer




























              7














              And for completeness, you can do this with grep too:



              $ grep -E '^[[:xdigit:]]{6}b' oui.txt 
              00107B Cisco Systems, Inc
              00906D Cisco Systems, Inc
              0090BF Cisco Systems, Inc
              000C6E ASUSTek COMPUTER INC.
              001BFC ASUSTek COMPUTER INC.
              001E8C ASUSTek COMPUTER INC.
              0015F2 ASUSTek COMPUTER INC.
              001FC6 ASUSTek COMPUTER INC.
              60182E ShenZhen Protruly Electronic Ltd co.
              F4CFE2 Cisco Systems, Inc
              501CBF Cisco Systems, Inc
              $


              This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b).






              share|improve this answer


























                7












                7








                7







                And for completeness, you can do this with grep too:



                $ grep -E '^[[:xdigit:]]{6}b' oui.txt 
                00107B Cisco Systems, Inc
                00906D Cisco Systems, Inc
                0090BF Cisco Systems, Inc
                000C6E ASUSTek COMPUTER INC.
                001BFC ASUSTek COMPUTER INC.
                001E8C ASUSTek COMPUTER INC.
                0015F2 ASUSTek COMPUTER INC.
                001FC6 ASUSTek COMPUTER INC.
                60182E ShenZhen Protruly Electronic Ltd co.
                F4CFE2 Cisco Systems, Inc
                501CBF Cisco Systems, Inc
                $


                This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b).






                share|improve this answer













                And for completeness, you can do this with grep too:



                $ grep -E '^[[:xdigit:]]{6}b' oui.txt 
                00107B Cisco Systems, Inc
                00906D Cisco Systems, Inc
                0090BF Cisco Systems, Inc
                000C6E ASUSTek COMPUTER INC.
                001BFC ASUSTek COMPUTER INC.
                001E8C ASUSTek COMPUTER INC.
                0015F2 ASUSTek COMPUTER INC.
                001FC6 ASUSTek COMPUTER INC.
                60182E ShenZhen Protruly Electronic Ltd co.
                F4CFE2 Cisco Systems, Inc
                501CBF Cisco Systems, Inc
                $


                This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b).







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 9 hours ago









                Digital TraumaDigital Trauma

                6,10211730




                6,10211730






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Unix & Linux Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511695%2fdelete-all-lines-which-dont-have-n-characters-before-delimiter%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to label and detect the document text images

                    Vallis Paradisi

                    Tabula Rosettana