imbalanced dataset in text classififaction

I have a data set collected from Facebook consists of 10 class, each class have 2500 posts, but when count number of unique words in each class, they has different count as shown in the figure word count in each class

Is this an imbalanced problem due to word count , or balanced according number of posts. and what is the best solution if it imbalanced?

asked 13 hours ago

mtesta010

New contributor

$begingroup$
Could you please post your approach/code here?
$endgroup$
– Sunil
11 hours ago

$begingroup$
which code??I ask a general question based on number of samples??
$endgroup$
– mtesta010
10 hours ago

add a comment |

Is this an imbalanced problem due to word count , or balanced according number of posts. and what is the best solution if it imbalanced?

asked 13 hours ago

mtesta010

New contributor

$begingroup$
Could you please post your approach/code here?
$endgroup$
– Sunil
11 hours ago

$begingroup$
which code??I ask a general question based on number of samples??
$endgroup$
– mtesta010
10 hours ago

add a comment |

Is this an imbalanced problem due to word count , or balanced according number of posts. and what is the best solution if it imbalanced?

asked 13 hours ago

mtesta010

New contributor

Is this an imbalanced problem due to word count , or balanced according number of posts. and what is the best solution if it imbalanced?

python nlp class-imbalance imbalanced-learn

asked 13 hours ago

mtesta010

New contributor

asked 13 hours ago

mtesta010

New contributor

asked 13 hours ago

mtesta010

New contributor

asked 13 hours ago

mtesta010

asked 13 hours ago

mtesta010

New contributor

mtesta010 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Could you please post your approach/code here?
$endgroup$
– Sunil
11 hours ago

$begingroup$
which code??I ask a general question based on number of samples??
$endgroup$
– mtesta010
10 hours ago

add a comment |

$begingroup$
Could you please post your approach/code here?
$endgroup$
– Sunil
11 hours ago

$begingroup$
which code??I ask a general question based on number of samples??
$endgroup$
– mtesta010
10 hours ago

Could you please post your approach/code here?

– Sunil
11 hours ago

which code??I ask a general question based on number of samples??

– mtesta010
10 hours ago

add a comment |

2 Answers
2

active

oldest

votes

I don't now wether I got your question right. But if you count all words within a class, for example, the word "the" is counted everytime it appears. However, if you count the unique words the word "the" is counted once. This is why your counts differ from your plot. Each class can have a different number of unique words.

answered 10 hours ago

matze

112

New contributor

$begingroup$
count of unique words after remove stop words,the count differ because posts lengths are different
$endgroup$
– mtesta010
10 hours ago

add a comment |

Thank you for your message Ahmed. There are things to point out:

Is this an imbalanced problem? Which problem? THIS is not a problem. This is data.

What analysis is going to be done? In some cases you need posts and in some you need these keywords.

What method is going to be done for that analysis? Some methods get keywords as input and some get posts.

But about the numbers themselves; Not necessarily. The smallest class has 20% of the largest population and moreover, the scale is pretty high (20000 samples). So it is not necessarily an imbalanced class distribution. Again, see what you want to do with this data. That determines the answer much more accurate.

Hope it helped. If you write about the task you want to do I can post the solution here.

Cheers,

answered 4 hours ago

Kasra Manshaei

3,7041035

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

mtesta010 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45163%2fimbalanced-dataset-in-text-classififaction%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered 10 hours ago

matze

112

New contributor

$begingroup$
count of unique words after remove stop words,the count differ because posts lengths are different
$endgroup$
– mtesta010
10 hours ago

add a comment |

answered 10 hours ago

matze

112

New contributor

$begingroup$
count of unique words after remove stop words,the count differ because posts lengths are different
$endgroup$
– mtesta010
10 hours ago

add a comment |

answered 10 hours ago

matze

112

New contributor

answered 10 hours ago

matze

112

New contributor

answered 10 hours ago

matze

112

New contributor

answered 10 hours ago

matze

112

answered 10 hours ago

matze

112

New contributor

matze is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
count of unique words after remove stop words,the count differ because posts lengths are different
$endgroup$
– mtesta010
10 hours ago

add a comment |

$begingroup$
count of unique words after remove stop words,the count differ because posts lengths are different
$endgroup$
– mtesta010
10 hours ago

count of unique words after remove stop words,the count differ because posts lengths are different

– mtesta010
10 hours ago

add a comment |

Thank you for your message Ahmed. There are things to point out:

Is this an imbalanced problem? Which problem? THIS is not a problem. This is data.

What analysis is going to be done? In some cases you need posts and in some you need these keywords.

What method is going to be done for that analysis? Some methods get keywords as input and some get posts.

Hope it helped. If you write about the task you want to do I can post the solution here.

Cheers,

answered 4 hours ago

Kasra Manshaei

3,7041035

add a comment |

Thank you for your message Ahmed. There are things to point out:

Is this an imbalanced problem? Which problem? THIS is not a problem. This is data.

What analysis is going to be done? In some cases you need posts and in some you need these keywords.

What method is going to be done for that analysis? Some methods get keywords as input and some get posts.

Hope it helped. If you write about the task you want to do I can post the solution here.

Cheers,

answered 4 hours ago

Kasra Manshaei

3,7041035

add a comment |

Thank you for your message Ahmed. There are things to point out:

Is this an imbalanced problem? Which problem? THIS is not a problem. This is data.

What analysis is going to be done? In some cases you need posts and in some you need these keywords.

What method is going to be done for that analysis? Some methods get keywords as input and some get posts.

Hope it helped. If you write about the task you want to do I can post the solution here.

Cheers,

answered 4 hours ago

Kasra Manshaei

3,7041035

Thank you for your message Ahmed. There are things to point out:

Is this an imbalanced problem? Which problem? THIS is not a problem. This is data.

What analysis is going to be done? In some cases you need posts and in some you need these keywords.

What method is going to be done for that analysis? Some methods get keywords as input and some get posts.

Hope it helped. If you write about the task you want to do I can post the solution here.

Cheers,

answered 4 hours ago

Kasra Manshaei

3,7041035

answered 4 hours ago

Kasra Manshaei

3,7041035

answered 4 hours ago

Kasra Manshaei

3,7041035

answered 4 hours ago

Kasra Manshaei

3,7041035

add a comment |

mtesta010 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

mtesta010 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk