How to extract and classify data from a column in excel?












2












$begingroup$


I have a column in an Excel sheet that contains a lot of data separated by || delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.



A single cell looks like this:



EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5


Not every cell has the same number of classes or even the same type of classes.
Another example:



COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5


I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.



I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.



Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?










share|improve this question









New contributor




Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    What do you mean by breakdown of entities?
    $endgroup$
    – tm1212
    11 hours ago










  • $begingroup$
    It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
    $endgroup$
    – Robin Nicole
    3 hours ago
















2












$begingroup$


I have a column in an Excel sheet that contains a lot of data separated by || delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.



A single cell looks like this:



EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5


Not every cell has the same number of classes or even the same type of classes.
Another example:



COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5


I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.



I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.



Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?










share|improve this question









New contributor




Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    What do you mean by breakdown of entities?
    $endgroup$
    – tm1212
    11 hours ago










  • $begingroup$
    It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
    $endgroup$
    – Robin Nicole
    3 hours ago














2












2








2


0



$begingroup$


I have a column in an Excel sheet that contains a lot of data separated by || delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.



A single cell looks like this:



EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5


Not every cell has the same number of classes or even the same type of classes.
Another example:



COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5


I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.



I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.



Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?










share|improve this question









New contributor




Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I have a column in an Excel sheet that contains a lot of data separated by || delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.



A single cell looks like this:



EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5


Not every cell has the same number of classes or even the same type of classes.
Another example:



COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5


I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.



I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.



Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?







python classification text named-entity-recognition






share|improve this question









New contributor




Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 8 hours ago









tuomastik

751418




751418






New contributor




Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 13 hours ago









Arjun AroraArjun Arora

111




111




New contributor




Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Arjun Arora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    What do you mean by breakdown of entities?
    $endgroup$
    – tm1212
    11 hours ago










  • $begingroup$
    It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
    $endgroup$
    – Robin Nicole
    3 hours ago


















  • $begingroup$
    What do you mean by breakdown of entities?
    $endgroup$
    – tm1212
    11 hours ago










  • $begingroup$
    It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
    $endgroup$
    – Robin Nicole
    3 hours ago
















$begingroup$
What do you mean by breakdown of entities?
$endgroup$
– tm1212
11 hours ago




$begingroup$
What do you mean by breakdown of entities?
$endgroup$
– tm1212
11 hours ago












$begingroup$
It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
$endgroup$
– Robin Nicole
3 hours ago




$begingroup$
It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
$endgroup$
– Robin Nicole
3 hours ago










0






active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45376%2fhow-to-extract-and-classify-data-from-a-column-in-excel%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes








Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.













Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.












Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45376%2fhow-to-extract-and-classify-data-from-a-column-in-excel%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to label and detect the document text images

Vallis Paradisi

Tabula Rosettana