What kind of a fit would be suitable for this?
$begingroup$
Below is a scatter plot of the data set I am dealing with. The X axis is the total number of words per essay for a particular individual, and they Y axis is the number of unique words. In principle, the number of unique words should approach the individuals vocabulary.
I am attempting to find that individual's vocabulary from the data below, but I don't know what kind of a fit would work. A logarithm would have no limit, a quadratic fit doesn't make sense (the gradient should remain non-negative over the entire domain).
In short, I am looking for a decent model to fit the data below, and don't know where to start.
Thank you.
python scikit-learn regression linear-regression model-selection
New contributor
$endgroup$
add a comment |
$begingroup$
Below is a scatter plot of the data set I am dealing with. The X axis is the total number of words per essay for a particular individual, and they Y axis is the number of unique words. In principle, the number of unique words should approach the individuals vocabulary.
I am attempting to find that individual's vocabulary from the data below, but I don't know what kind of a fit would work. A logarithm would have no limit, a quadratic fit doesn't make sense (the gradient should remain non-negative over the entire domain).
In short, I am looking for a decent model to fit the data below, and don't know where to start.
Thank you.
python scikit-learn regression linear-regression model-selection
New contributor
$endgroup$
add a comment |
$begingroup$
Below is a scatter plot of the data set I am dealing with. The X axis is the total number of words per essay for a particular individual, and they Y axis is the number of unique words. In principle, the number of unique words should approach the individuals vocabulary.
I am attempting to find that individual's vocabulary from the data below, but I don't know what kind of a fit would work. A logarithm would have no limit, a quadratic fit doesn't make sense (the gradient should remain non-negative over the entire domain).
In short, I am looking for a decent model to fit the data below, and don't know where to start.
Thank you.
python scikit-learn regression linear-regression model-selection
New contributor
$endgroup$
Below is a scatter plot of the data set I am dealing with. The X axis is the total number of words per essay for a particular individual, and they Y axis is the number of unique words. In principle, the number of unique words should approach the individuals vocabulary.
I am attempting to find that individual's vocabulary from the data below, but I don't know what kind of a fit would work. A logarithm would have no limit, a quadratic fit doesn't make sense (the gradient should remain non-negative over the entire domain).
In short, I am looking for a decent model to fit the data below, and don't know where to start.
Thank you.
python scikit-learn regression linear-regression model-selection
python scikit-learn regression linear-regression model-selection
New contributor
New contributor
New contributor
asked 2 days ago
MirMir
132
132
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
In my opinion, this estimation cannot be achieved merely based on this plot, because:
From 4000 words onward, the unique words are increasing linearly around 250 per 2K words: (4K, 1.25K), (6K, 1.5K), (8K, 1.75K), (10K, 2K), (12K, 2.25K). So there is not enough evidence to hypothesize an upper-bound for this linear growth,
On average, an adult knows 20K-35K unique words, but this plot goes only up to 2K which is far behind the final expected value. The extrapolation from 2K to 20K is unreliable.
Vocabulary of Shakespeare
The estimation of a person's vocabulary is quite complicated. Below is a paper that estimates the vocabulary of Shakespeare. He had used 31K unique words in all of his writtings. The paper estimates that he knew at least 35K more words which he did not use (at least 66K vocabulary). As you see, the estimated vocabulary is only twice the observation, which sheds light on unreliability of going from 2K to 20K and beyond.
1976 Estimating the number of unseen species - How many words did Shakespeare know
$endgroup$
1
$begingroup$
That paper was immensely helpful! And I agree with you, this estimation is not as trivial as I initially thought. The good thing is I have some more data on the way. Thank you very much!
$endgroup$
– Mir
16 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Mir is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46938%2fwhat-kind-of-a-fit-would-be-suitable-for-this%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
In my opinion, this estimation cannot be achieved merely based on this plot, because:
From 4000 words onward, the unique words are increasing linearly around 250 per 2K words: (4K, 1.25K), (6K, 1.5K), (8K, 1.75K), (10K, 2K), (12K, 2.25K). So there is not enough evidence to hypothesize an upper-bound for this linear growth,
On average, an adult knows 20K-35K unique words, but this plot goes only up to 2K which is far behind the final expected value. The extrapolation from 2K to 20K is unreliable.
Vocabulary of Shakespeare
The estimation of a person's vocabulary is quite complicated. Below is a paper that estimates the vocabulary of Shakespeare. He had used 31K unique words in all of his writtings. The paper estimates that he knew at least 35K more words which he did not use (at least 66K vocabulary). As you see, the estimated vocabulary is only twice the observation, which sheds light on unreliability of going from 2K to 20K and beyond.
1976 Estimating the number of unseen species - How many words did Shakespeare know
$endgroup$
1
$begingroup$
That paper was immensely helpful! And I agree with you, this estimation is not as trivial as I initially thought. The good thing is I have some more data on the way. Thank you very much!
$endgroup$
– Mir
16 hours ago
add a comment |
$begingroup$
In my opinion, this estimation cannot be achieved merely based on this plot, because:
From 4000 words onward, the unique words are increasing linearly around 250 per 2K words: (4K, 1.25K), (6K, 1.5K), (8K, 1.75K), (10K, 2K), (12K, 2.25K). So there is not enough evidence to hypothesize an upper-bound for this linear growth,
On average, an adult knows 20K-35K unique words, but this plot goes only up to 2K which is far behind the final expected value. The extrapolation from 2K to 20K is unreliable.
Vocabulary of Shakespeare
The estimation of a person's vocabulary is quite complicated. Below is a paper that estimates the vocabulary of Shakespeare. He had used 31K unique words in all of his writtings. The paper estimates that he knew at least 35K more words which he did not use (at least 66K vocabulary). As you see, the estimated vocabulary is only twice the observation, which sheds light on unreliability of going from 2K to 20K and beyond.
1976 Estimating the number of unseen species - How many words did Shakespeare know
$endgroup$
1
$begingroup$
That paper was immensely helpful! And I agree with you, this estimation is not as trivial as I initially thought. The good thing is I have some more data on the way. Thank you very much!
$endgroup$
– Mir
16 hours ago
add a comment |
$begingroup$
In my opinion, this estimation cannot be achieved merely based on this plot, because:
From 4000 words onward, the unique words are increasing linearly around 250 per 2K words: (4K, 1.25K), (6K, 1.5K), (8K, 1.75K), (10K, 2K), (12K, 2.25K). So there is not enough evidence to hypothesize an upper-bound for this linear growth,
On average, an adult knows 20K-35K unique words, but this plot goes only up to 2K which is far behind the final expected value. The extrapolation from 2K to 20K is unreliable.
Vocabulary of Shakespeare
The estimation of a person's vocabulary is quite complicated. Below is a paper that estimates the vocabulary of Shakespeare. He had used 31K unique words in all of his writtings. The paper estimates that he knew at least 35K more words which he did not use (at least 66K vocabulary). As you see, the estimated vocabulary is only twice the observation, which sheds light on unreliability of going from 2K to 20K and beyond.
1976 Estimating the number of unseen species - How many words did Shakespeare know
$endgroup$
In my opinion, this estimation cannot be achieved merely based on this plot, because:
From 4000 words onward, the unique words are increasing linearly around 250 per 2K words: (4K, 1.25K), (6K, 1.5K), (8K, 1.75K), (10K, 2K), (12K, 2.25K). So there is not enough evidence to hypothesize an upper-bound for this linear growth,
On average, an adult knows 20K-35K unique words, but this plot goes only up to 2K which is far behind the final expected value. The extrapolation from 2K to 20K is unreliable.
Vocabulary of Shakespeare
The estimation of a person's vocabulary is quite complicated. Below is a paper that estimates the vocabulary of Shakespeare. He had used 31K unique words in all of his writtings. The paper estimates that he knew at least 35K more words which he did not use (at least 66K vocabulary). As you see, the estimated vocabulary is only twice the observation, which sheds light on unreliability of going from 2K to 20K and beyond.
1976 Estimating the number of unseen species - How many words did Shakespeare know
answered 2 days ago
EsmailianEsmailian
5966
5966
1
$begingroup$
That paper was immensely helpful! And I agree with you, this estimation is not as trivial as I initially thought. The good thing is I have some more data on the way. Thank you very much!
$endgroup$
– Mir
16 hours ago
add a comment |
1
$begingroup$
That paper was immensely helpful! And I agree with you, this estimation is not as trivial as I initially thought. The good thing is I have some more data on the way. Thank you very much!
$endgroup$
– Mir
16 hours ago
1
1
$begingroup$
That paper was immensely helpful! And I agree with you, this estimation is not as trivial as I initially thought. The good thing is I have some more data on the way. Thank you very much!
$endgroup$
– Mir
16 hours ago
$begingroup$
That paper was immensely helpful! And I agree with you, this estimation is not as trivial as I initially thought. The good thing is I have some more data on the way. Thank you very much!
$endgroup$
– Mir
16 hours ago
add a comment |
Mir is a new contributor. Be nice, and check out our Code of Conduct.
Mir is a new contributor. Be nice, and check out our Code of Conduct.
Mir is a new contributor. Be nice, and check out our Code of Conduct.
Mir is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46938%2fwhat-kind-of-a-fit-would-be-suitable-for-this%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown