Delete all lines which don't have n characters before delimiter
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have a very long text file (from here) which should contain 6 hexadecimal characters then a 'break' (which appears as one character and doesn't seem to show up properly in the code markdown below) followed by a few words:
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
5080 Cisco Systems, Inc
0E+00 ASUSTek COMPUTER INC.
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
2354 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
I've done some looking around and can't see something which would work in this situation. My question is, how can I use grep
/sed
/awk
/perl
to delete all lines of this text file which do not start with exactly 6 hexadecimal characters and then a 'break'?
P.S. For bonus points, what's the best way of sorting the file alphabetically and numerically according to the hex characters (i.e. 000000
-> FFFFFF
)? Should I just use sort
?
text-processing sed grep text-formatting
add a comment |
I have a very long text file (from here) which should contain 6 hexadecimal characters then a 'break' (which appears as one character and doesn't seem to show up properly in the code markdown below) followed by a few words:
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
5080 Cisco Systems, Inc
0E+00 ASUSTek COMPUTER INC.
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
2354 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
I've done some looking around and can't see something which would work in this situation. My question is, how can I use grep
/sed
/awk
/perl
to delete all lines of this text file which do not start with exactly 6 hexadecimal characters and then a 'break'?
P.S. For bonus points, what's the best way of sorting the file alphabetically and numerically according to the hex characters (i.e. 000000
-> FFFFFF
)? Should I just use sort
?
text-processing sed grep text-formatting
add a comment |
I have a very long text file (from here) which should contain 6 hexadecimal characters then a 'break' (which appears as one character and doesn't seem to show up properly in the code markdown below) followed by a few words:
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
5080 Cisco Systems, Inc
0E+00 ASUSTek COMPUTER INC.
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
2354 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
I've done some looking around and can't see something which would work in this situation. My question is, how can I use grep
/sed
/awk
/perl
to delete all lines of this text file which do not start with exactly 6 hexadecimal characters and then a 'break'?
P.S. For bonus points, what's the best way of sorting the file alphabetically and numerically according to the hex characters (i.e. 000000
-> FFFFFF
)? Should I just use sort
?
text-processing sed grep text-formatting
I have a very long text file (from here) which should contain 6 hexadecimal characters then a 'break' (which appears as one character and doesn't seem to show up properly in the code markdown below) followed by a few words:
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
5080 Cisco Systems, Inc
0E+00 ASUSTek COMPUTER INC.
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
2354 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
I've done some looking around and can't see something which would work in this situation. My question is, how can I use grep
/sed
/awk
/perl
to delete all lines of this text file which do not start with exactly 6 hexadecimal characters and then a 'break'?
P.S. For bonus points, what's the best way of sorting the file alphabetically and numerically according to the hex characters (i.e. 000000
-> FFFFFF
)? Should I just use sort
?
text-processing sed grep text-formatting
text-processing sed grep text-formatting
edited 13 hours ago
codeforester
405418
405418
asked 15 hours ago
RoccoRocco
735
735
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
This uses awk
to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]]
pattern matches a hexadecimal digit, and {6}
requires six of them. Together with the anchoring to the start and end of the field with ^
and $
respectively, this will only match on the wanted lines.
Redirect to some file to save it under a new name.
Note that this seems to work with GNU awk
(commonly found on Linux), but not with awk
on e.g. OpenBSD, or mawk
.
A similar approach with sed
:
$ sed -n '/^[[:xdigit:]]{6}>/p' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
In this expression, >
is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The >
pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.
For sorting the resulting data, just pipe the result trough sort
, or sort -f
if your hexadecimal numbers uses both upper and lower case letters
Perfect, thank you very much. Exactly what I was looking for!
– Rocco
14 hours ago
add a comment |
And for completeness, you can do this with grep too:
$ grep -E '^[[:xdigit:]]{6}b' oui.txt
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
$
This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b
).
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511695%2fdelete-all-lines-which-dont-have-n-characters-before-delimiter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
This uses awk
to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]]
pattern matches a hexadecimal digit, and {6}
requires six of them. Together with the anchoring to the start and end of the field with ^
and $
respectively, this will only match on the wanted lines.
Redirect to some file to save it under a new name.
Note that this seems to work with GNU awk
(commonly found on Linux), but not with awk
on e.g. OpenBSD, or mawk
.
A similar approach with sed
:
$ sed -n '/^[[:xdigit:]]{6}>/p' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
In this expression, >
is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The >
pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.
For sorting the resulting data, just pipe the result trough sort
, or sort -f
if your hexadecimal numbers uses both upper and lower case letters
Perfect, thank you very much. Exactly what I was looking for!
– Rocco
14 hours ago
add a comment |
$ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
This uses awk
to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]]
pattern matches a hexadecimal digit, and {6}
requires six of them. Together with the anchoring to the start and end of the field with ^
and $
respectively, this will only match on the wanted lines.
Redirect to some file to save it under a new name.
Note that this seems to work with GNU awk
(commonly found on Linux), but not with awk
on e.g. OpenBSD, or mawk
.
A similar approach with sed
:
$ sed -n '/^[[:xdigit:]]{6}>/p' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
In this expression, >
is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The >
pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.
For sorting the resulting data, just pipe the result trough sort
, or sort -f
if your hexadecimal numbers uses both upper and lower case letters
Perfect, thank you very much. Exactly what I was looking for!
– Rocco
14 hours ago
add a comment |
$ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
This uses awk
to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]]
pattern matches a hexadecimal digit, and {6}
requires six of them. Together with the anchoring to the start and end of the field with ^
and $
respectively, this will only match on the wanted lines.
Redirect to some file to save it under a new name.
Note that this seems to work with GNU awk
(commonly found on Linux), but not with awk
on e.g. OpenBSD, or mawk
.
A similar approach with sed
:
$ sed -n '/^[[:xdigit:]]{6}>/p' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
In this expression, >
is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The >
pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.
For sorting the resulting data, just pipe the result trough sort
, or sort -f
if your hexadecimal numbers uses both upper and lower case letters
$ awk '$1 ~ /^[[:xdigit:]]{6}$/' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
This uses awk
to extract the lines that contains exactly six hexadecimal digits in the first field. The [[:xdigit:]]
pattern matches a hexadecimal digit, and {6}
requires six of them. Together with the anchoring to the start and end of the field with ^
and $
respectively, this will only match on the wanted lines.
Redirect to some file to save it under a new name.
Note that this seems to work with GNU awk
(commonly found on Linux), but not with awk
on e.g. OpenBSD, or mawk
.
A similar approach with sed
:
$ sed -n '/^[[:xdigit:]]{6}>/p' file
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
In this expression, >
is used to match the end of the hexadecimal number. This ensures that longer numbers are not matched. The >
pattern matches a word boundary, i.e. the zero-width space between a word character and a non-word character.
For sorting the resulting data, just pipe the result trough sort
, or sort -f
if your hexadecimal numbers uses both upper and lower case letters
edited 14 hours ago
answered 15 hours ago
Kusalananda♦Kusalananda
141k17262438
141k17262438
Perfect, thank you very much. Exactly what I was looking for!
– Rocco
14 hours ago
add a comment |
Perfect, thank you very much. Exactly what I was looking for!
– Rocco
14 hours ago
Perfect, thank you very much. Exactly what I was looking for!
– Rocco
14 hours ago
Perfect, thank you very much. Exactly what I was looking for!
– Rocco
14 hours ago
add a comment |
And for completeness, you can do this with grep too:
$ grep -E '^[[:xdigit:]]{6}b' oui.txt
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
$
This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b
).
add a comment |
And for completeness, you can do this with grep too:
$ grep -E '^[[:xdigit:]]{6}b' oui.txt
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
$
This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b
).
add a comment |
And for completeness, you can do this with grep too:
$ grep -E '^[[:xdigit:]]{6}b' oui.txt
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
$
This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b
).
And for completeness, you can do this with grep too:
$ grep -E '^[[:xdigit:]]{6}b' oui.txt
00107B Cisco Systems, Inc
00906D Cisco Systems, Inc
0090BF Cisco Systems, Inc
000C6E ASUSTek COMPUTER INC.
001BFC ASUSTek COMPUTER INC.
001E8C ASUSTek COMPUTER INC.
0015F2 ASUSTek COMPUTER INC.
001FC6 ASUSTek COMPUTER INC.
60182E ShenZhen Protruly Electronic Ltd co.
F4CFE2 Cisco Systems, Inc
501CBF Cisco Systems, Inc
$
This extended grep expression searches for exactly 6 hex digits at the beginning of each line, followed immediately by a non-whitespace-to-whitespace boundary (b
).
answered 9 hours ago
Digital TraumaDigital Trauma
6,10211730
6,10211730
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511695%2fdelete-all-lines-which-dont-have-n-characters-before-delimiter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown