How can I detect patterns and/or keywords or phrases?
$begingroup$
I am collecting data in a database via php from apache.
I am interested in detecting patterns in each column for now.
For example manual examination of the data shows the pattern phpmyadmin is various forms and capitalization and at different positions in the text. Also to detect any other patterns.
How would I detect that programmatically using the computer instead of my brain?
I am going to need a detailed explanation as I am brand new to doing this kind of thing.
dataset
New contributor
$endgroup$
add a comment |
$begingroup$
I am collecting data in a database via php from apache.
I am interested in detecting patterns in each column for now.
For example manual examination of the data shows the pattern phpmyadmin is various forms and capitalization and at different positions in the text. Also to detect any other patterns.
How would I detect that programmatically using the computer instead of my brain?
I am going to need a detailed explanation as I am brand new to doing this kind of thing.
dataset
New contributor
$endgroup$
$begingroup$
Using regex expressions + string library?
$endgroup$
– Aditya
19 hours ago
add a comment |
$begingroup$
I am collecting data in a database via php from apache.
I am interested in detecting patterns in each column for now.
For example manual examination of the data shows the pattern phpmyadmin is various forms and capitalization and at different positions in the text. Also to detect any other patterns.
How would I detect that programmatically using the computer instead of my brain?
I am going to need a detailed explanation as I am brand new to doing this kind of thing.
dataset
New contributor
$endgroup$
I am collecting data in a database via php from apache.
I am interested in detecting patterns in each column for now.
For example manual examination of the data shows the pattern phpmyadmin is various forms and capitalization and at different positions in the text. Also to detect any other patterns.
How would I detect that programmatically using the computer instead of my brain?
I am going to need a detailed explanation as I am brand new to doing this kind of thing.
dataset
dataset
New contributor
New contributor
New contributor
asked 23 hours ago
cybernardcybernard
101
101
New contributor
New contributor
$begingroup$
Using regex expressions + string library?
$endgroup$
– Aditya
19 hours ago
add a comment |
$begingroup$
Using regex expressions + string library?
$endgroup$
– Aditya
19 hours ago
$begingroup$
Using regex expressions + string library?
$endgroup$
– Aditya
19 hours ago
$begingroup$
Using regex expressions + string library?
$endgroup$
– Aditya
19 hours ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.
BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)
For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:
the 2
cat 1
sat 1
on 1
mat 1
To get frequencies, you just divide the resulting vectors by the total count of words.
$endgroup$
$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
cybernard is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48724%2fhow-can-i-detect-patterns-and-or-keywords-or-phrases%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.
BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)
For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:
the 2
cat 1
sat 1
on 1
mat 1
To get frequencies, you just divide the resulting vectors by the total count of words.
$endgroup$
$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago
add a comment |
$begingroup$
Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.
BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)
For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:
the 2
cat 1
sat 1
on 1
mat 1
To get frequencies, you just divide the resulting vectors by the total count of words.
$endgroup$
$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago
add a comment |
$begingroup$
Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.
BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)
For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:
the 2
cat 1
sat 1
on 1
mat 1
To get frequencies, you just divide the resulting vectors by the total count of words.
$endgroup$
Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.
BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)
For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:
the 2
cat 1
sat 1
on 1
mat 1
To get frequencies, you just divide the resulting vectors by the total count of words.
answered 16 hours ago
qmeeusqmeeus
19118
19118
$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago
add a comment |
$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago
$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago
$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago
add a comment |
cybernard is a new contributor. Be nice, and check out our Code of Conduct.
cybernard is a new contributor. Be nice, and check out our Code of Conduct.
cybernard is a new contributor. Be nice, and check out our Code of Conduct.
cybernard is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48724%2fhow-can-i-detect-patterns-and-or-keywords-or-phrases%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Using regex expressions + string library?
$endgroup$
– Aditya
19 hours ago