How can I detect patterns and/or keywords or phrases?












0












$begingroup$


I am collecting data in a database via php from apache.



I am interested in detecting patterns in each column for now.



For example manual examination of the data shows the pattern phpmyadmin is various forms and capitalization and at different positions in the text. Also to detect any other patterns.



How would I detect that programmatically using the computer instead of my brain?



I am going to need a detailed explanation as I am brand new to doing this kind of thing.










share|improve this question







New contributor




cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Using regex expressions + string library?
    $endgroup$
    – Aditya
    19 hours ago


















0












$begingroup$


I am collecting data in a database via php from apache.



I am interested in detecting patterns in each column for now.



For example manual examination of the data shows the pattern phpmyadmin is various forms and capitalization and at different positions in the text. Also to detect any other patterns.



How would I detect that programmatically using the computer instead of my brain?



I am going to need a detailed explanation as I am brand new to doing this kind of thing.










share|improve this question







New contributor




cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Using regex expressions + string library?
    $endgroup$
    – Aditya
    19 hours ago
















0












0








0





$begingroup$


I am collecting data in a database via php from apache.



I am interested in detecting patterns in each column for now.



For example manual examination of the data shows the pattern phpmyadmin is various forms and capitalization and at different positions in the text. Also to detect any other patterns.



How would I detect that programmatically using the computer instead of my brain?



I am going to need a detailed explanation as I am brand new to doing this kind of thing.










share|improve this question







New contributor




cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I am collecting data in a database via php from apache.



I am interested in detecting patterns in each column for now.



For example manual examination of the data shows the pattern phpmyadmin is various forms and capitalization and at different positions in the text. Also to detect any other patterns.



How would I detect that programmatically using the computer instead of my brain?



I am going to need a detailed explanation as I am brand new to doing this kind of thing.







dataset






share|improve this question







New contributor




cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 23 hours ago









cybernardcybernard

101




101




New contributor




cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






cybernard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    Using regex expressions + string library?
    $endgroup$
    – Aditya
    19 hours ago




















  • $begingroup$
    Using regex expressions + string library?
    $endgroup$
    – Aditya
    19 hours ago


















$begingroup$
Using regex expressions + string library?
$endgroup$
– Aditya
19 hours ago






$begingroup$
Using regex expressions + string library?
$endgroup$
– Aditya
19 hours ago












1 Answer
1






active

oldest

votes


















0












$begingroup$

Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.



BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)



For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:



the 2
cat 1
sat 1
on 1
mat 1


To get frequencies, you just divide the resulting vectors by the total count of words.






share|improve this answer









$endgroup$













  • $begingroup$
    A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
    $endgroup$
    – cybernard
    11 hours ago














Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






cybernard is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48724%2fhow-can-i-detect-patterns-and-or-keywords-or-phrases%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.



BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)



For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:



the 2
cat 1
sat 1
on 1
mat 1


To get frequencies, you just divide the resulting vectors by the total count of words.






share|improve this answer









$endgroup$













  • $begingroup$
    A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
    $endgroup$
    – cybernard
    11 hours ago


















0












$begingroup$

Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.



BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)



For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:



the 2
cat 1
sat 1
on 1
mat 1


To get frequencies, you just divide the resulting vectors by the total count of words.






share|improve this answer









$endgroup$













  • $begingroup$
    A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
    $endgroup$
    – cybernard
    11 hours ago
















0












0








0





$begingroup$

Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.



BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)



For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:



the 2
cat 1
sat 1
on 1
mat 1


To get frequencies, you just divide the resulting vectors by the total count of words.






share|improve this answer









$endgroup$



Depends on what you want to do and what you define as a "pattern". If you are interested in frequent terms, then tokenize and count the words. If you want to compare various forms of the same terms, I suggest you build two term matrices, one where you take the input as is and one where you take a version of the input that has been transformed to lower case.



BTW, a term frequency matrix is simply a matrix where the rows are your examples (I guess the columns in your database) and the columns are the discovered tokens (ie, the words)



For example, in the phrase 'the cat sat on the mat', the corresponding row vector of word counts would be:



the 2
cat 1
sat 1
on 1
mat 1


To get frequencies, you just divide the resulting vectors by the total count of words.







share|improve this answer












share|improve this answer



share|improve this answer










answered 16 hours ago









qmeeusqmeeus

19118




19118












  • $begingroup$
    A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
    $endgroup$
    – cybernard
    11 hours ago




















  • $begingroup$
    A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
    $endgroup$
    – cybernard
    11 hours ago


















$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago






$begingroup$
A pattern is any sequence of characters that repeat together. Like phpmyadmin is a sequence of 9 letters that repeat together. I detected "phpmyadmin" manually, I would like to detect it programmatically. Excluding single digit matches of common characters which is ascii 32-127.
$endgroup$
– cybernard
11 hours ago












cybernard is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















cybernard is a new contributor. Be nice, and check out our Code of Conduct.













cybernard is a new contributor. Be nice, and check out our Code of Conduct.












cybernard is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48724%2fhow-can-i-detect-patterns-and-or-keywords-or-phrases%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to label and detect the document text images

Vallis Paradisi

Tabula Rosettana