How to extract and classify data from a column in excel?
$begingroup$
I have a column in an Excel sheet that contains a lot of data separated by ||
delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.
A single cell looks like this:
EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5
Not every cell has the same number of classes or even the same type of classes.
Another example:
COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5
I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.
I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.
Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?
python classification text named-entity-recognition
New contributor
$endgroup$
add a comment |
$begingroup$
I have a column in an Excel sheet that contains a lot of data separated by ||
delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.
A single cell looks like this:
EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5
Not every cell has the same number of classes or even the same type of classes.
Another example:
COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5
I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.
I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.
Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?
python classification text named-entity-recognition
New contributor
$endgroup$
$begingroup$
What do you mean by breakdown of entities?
$endgroup$
– tm1212
11 hours ago
$begingroup$
It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
$endgroup$
– Robin Nicole
3 hours ago
add a comment |
$begingroup$
I have a column in an Excel sheet that contains a lot of data separated by ||
delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.
A single cell looks like this:
EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5
Not every cell has the same number of classes or even the same type of classes.
Another example:
COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5
I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.
I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.
Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?
python classification text named-entity-recognition
New contributor
$endgroup$
I have a column in an Excel sheet that contains a lot of data separated by ||
delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.
A single cell looks like this:
EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5
Not every cell has the same number of classes or even the same type of classes.
Another example:
COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5
I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.
I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.
Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?
python classification text named-entity-recognition
python classification text named-entity-recognition
New contributor
New contributor
edited 8 hours ago
tuomastik
751418
751418
New contributor
asked 13 hours ago
Arjun AroraArjun Arora
111
111
New contributor
New contributor
$begingroup$
What do you mean by breakdown of entities?
$endgroup$
– tm1212
11 hours ago
$begingroup$
It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
$endgroup$
– Robin Nicole
3 hours ago
add a comment |
$begingroup$
What do you mean by breakdown of entities?
$endgroup$
– tm1212
11 hours ago
$begingroup$
It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
$endgroup$
– Robin Nicole
3 hours ago
$begingroup$
What do you mean by breakdown of entities?
$endgroup$
– tm1212
11 hours ago
$begingroup$
What do you mean by breakdown of entities?
$endgroup$
– tm1212
11 hours ago
$begingroup$
It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
$endgroup$
– Robin Nicole
3 hours ago
$begingroup$
It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
$endgroup$
– Robin Nicole
3 hours ago
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45376%2fhow-to-extract-and-classify-data-from-a-column-in-excel%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.
Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.
Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.
Arjun Arora is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45376%2fhow-to-extract-and-classify-data-from-a-column-in-excel%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
What do you mean by breakdown of entities?
$endgroup$
– tm1212
11 hours ago
$begingroup$
It is data preprocessing. I would use python or R which are more suited to advanced data analytics like the one you are doing.
$endgroup$
– Robin Nicole
3 hours ago