What does “baseline” mean in the context of machine learning?

What does "baseline" mean in the context of machine learning and data science?

Someone wrote me:

Hint: An appropriate baseline will give an RMSE of approximately 200.

I don't get this. Does he mean that if my predictive model on the training data has a RMSE below 500, it's good?

And what could be a "baseline approach"?

edited 17 hours ago

nbro

286417

asked Apr 26 '18 at 23:17

Meiiso

4115

add a comment |

What does "baseline" mean in the context of machine learning and data science?

Someone wrote me:

Hint: An appropriate baseline will give an RMSE of approximately 200.

I don't get this. Does he mean that if my predictive model on the training data has a RMSE below 500, it's good?

And what could be a "baseline approach"?

edited 17 hours ago

nbro

286417

asked Apr 26 '18 at 23:17

Meiiso

4115

add a comment |

What does "baseline" mean in the context of machine learning and data science?

Someone wrote me:

Hint: An appropriate baseline will give an RMSE of approximately 200.

I don't get this. Does he mean that if my predictive model on the training data has a RMSE below 500, it's good?

And what could be a "baseline approach"?

edited 17 hours ago

nbro

286417

asked Apr 26 '18 at 23:17

Meiiso

4115

What does "baseline" mean in the context of machine learning and data science?

Someone wrote me:

Hint: An appropriate baseline will give an RMSE of approximately 200.

I don't get this. Does he mean that if my predictive model on the training data has a RMSE below 500, it's good?

And what could be a "baseline approach"?

machine-learning regression predictive-modeling terminology

edited 17 hours ago

nbro

286417

asked Apr 26 '18 at 23:17

Meiiso

4115

edited 17 hours ago

nbro

286417

asked Apr 26 '18 at 23:17

Meiiso

4115

edited 17 hours ago

nbro

286417

edited 17 hours ago

nbro

286417

edited 17 hours ago

nbro

286417

asked Apr 26 '18 at 23:17

Meiiso

4115

asked Apr 26 '18 at 23:17

Meiiso

4115

asked Apr 26 '18 at 23:17

Meiiso

4115

add a comment |

2 Answers
2

active

oldest

votes

A baseline is the result of a very basic model/solution. You generally create a baseline and then try to make more complex solutions in order to get a better result.
If you achieve a better score than the baseline, it is good.

answered Apr 27 '18 at 0:12

Carl Rynegardh

30119

$begingroup$
well, but what does that mean exactly for my point? For my two quotes
$endgroup$
– Meiiso
Apr 27 '18 at 8:46

2

$begingroup$
Since the baseline is 200, you want a better score. In your case a better score means the lower the better. You want to get below 200. I'm assuming that you are dealing with a regression. The first thing to use for a baseline would be an ordinary least squares regression.
$endgroup$
– Carl Rynegardh
Apr 27 '18 at 9:08

add a comment |

A baseline is a method that uses heuristics, simple summary statistics, randomness, or machine learning to create predictions for a dataset. You can use these predictions to measure the baseline's performance (e.g., accuracy)-- this metric will then become what you compare any other machine learning algorithm against.

In more detail:

A machine learning algorithm tries to learn a function that models the relationship between the input (feature) data and the target variable (or label). When you test it, you will typically measure performance in one way or another. For example, your algorithm may be 75% accurate. But what does this mean? You can infer this meaning by comparing with a baseline's performance.

Typical baselines include those supported by scikit-learn's "dummy" estimators:

Classification baselines:

“stratified”: generates predictions by respecting the training set’s class distribution.

“most_frequent”: always predicts the most frequent label in the training set.

“prior”: always predicts the class that maximizes the class prior.

“uniform”: generates predictions uniformly at random.

“constant”: always predicts a constant label that is provided by the user.

This is useful for metrics that evaluate a non-majority class.

Regression baselines:

“median”: always predicts the median of the training set

“quantile”: always predicts a specified quantile of the training set,provided with the quantile parameter.

“constant”: always predicts a constant value that is provided by the user.

In general, you will want your approach to outperform the baselines you have selected. In the example above, you would want your 75% accuracy to be higher than any baseline you have run on the same data.

Finally, if you are dealing with a specific domain of machine learning (such as recommender systems), then you will typically pick baselines that are current state-of-the-art(SoTA) approaches - since you will usually want to demonstrate that your approach does better than these. For example, while you evaluate a new collaborative filtering algorithm, you may want to compare it to matrix factorization -- which itself is a learning algorithm, but is now a popular baseline since it has been so successful in recommender system research.

answered Apr 27 '18 at 0:13

Aditya

1,4161525

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f30912%2fwhat-does-baseline-mean-in-the-context-of-machine-learning%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered Apr 27 '18 at 0:12

Carl Rynegardh

30119

$begingroup$
well, but what does that mean exactly for my point? For my two quotes
$endgroup$
– Meiiso
Apr 27 '18 at 8:46

2

$begingroup$
Since the baseline is 200, you want a better score. In your case a better score means the lower the better. You want to get below 200. I'm assuming that you are dealing with a regression. The first thing to use for a baseline would be an ordinary least squares regression.
$endgroup$
– Carl Rynegardh
Apr 27 '18 at 9:08

add a comment |

answered Apr 27 '18 at 0:12

Carl Rynegardh

30119

$begingroup$
well, but what does that mean exactly for my point? For my two quotes
$endgroup$
– Meiiso
Apr 27 '18 at 8:46

2

$begingroup$
Since the baseline is 200, you want a better score. In your case a better score means the lower the better. You want to get below 200. I'm assuming that you are dealing with a regression. The first thing to use for a baseline would be an ordinary least squares regression.
$endgroup$
– Carl Rynegardh
Apr 27 '18 at 9:08

add a comment |

answered Apr 27 '18 at 0:12

Carl Rynegardh

30119

answered Apr 27 '18 at 0:12

Carl Rynegardh

30119

answered Apr 27 '18 at 0:12

Carl Rynegardh

30119

answered Apr 27 '18 at 0:12

Carl Rynegardh

30119

answered Apr 27 '18 at 0:12

Carl Rynegardh

30119

$begingroup$
well, but what does that mean exactly for my point? For my two quotes
$endgroup$
– Meiiso
Apr 27 '18 at 8:46

2

$begingroup$
Since the baseline is 200, you want a better score. In your case a better score means the lower the better. You want to get below 200. I'm assuming that you are dealing with a regression. The first thing to use for a baseline would be an ordinary least squares regression.
$endgroup$
– Carl Rynegardh
Apr 27 '18 at 9:08

add a comment |

$begingroup$
well, but what does that mean exactly for my point? For my two quotes
$endgroup$
– Meiiso
Apr 27 '18 at 8:46

2

$begingroup$
Since the baseline is 200, you want a better score. In your case a better score means the lower the better. You want to get below 200. I'm assuming that you are dealing with a regression. The first thing to use for a baseline would be an ordinary least squares regression.
$endgroup$
– Carl Rynegardh
Apr 27 '18 at 9:08

well, but what does that mean exactly for my point? For my two quotes

– Meiiso
Apr 27 '18 at 8:46

Since the baseline is 200, you want a better score. In your case a better score means the lower the better. You want to get below 200. I'm assuming that you are dealing with a regression. The first thing to use for a baseline would be an ordinary least squares regression.

– Carl Rynegardh
Apr 27 '18 at 9:08

add a comment |

In more detail:

Typical baselines include those supported by scikit-learn's "dummy" estimators:

Classification baselines:

“stratified”: generates predictions by respecting the training set’s class distribution.

“most_frequent”: always predicts the most frequent label in the training set.

“prior”: always predicts the class that maximizes the class prior.

“uniform”: generates predictions uniformly at random.

“constant”: always predicts a constant label that is provided by the user.

This is useful for metrics that evaluate a non-majority class.

Regression baselines:

“median”: always predicts the median of the training set

“quantile”: always predicts a specified quantile of the training set,provided with the quantile parameter.

“constant”: always predicts a constant value that is provided by the user.

answered Apr 27 '18 at 0:13

Aditya

1,4161525

add a comment |

In more detail:

Typical baselines include those supported by scikit-learn's "dummy" estimators:

Classification baselines:

“stratified”: generates predictions by respecting the training set’s class distribution.

“most_frequent”: always predicts the most frequent label in the training set.

“prior”: always predicts the class that maximizes the class prior.

“uniform”: generates predictions uniformly at random.

“constant”: always predicts a constant label that is provided by the user.

This is useful for metrics that evaluate a non-majority class.

Regression baselines:

“median”: always predicts the median of the training set

“quantile”: always predicts a specified quantile of the training set,provided with the quantile parameter.

“constant”: always predicts a constant value that is provided by the user.

answered Apr 27 '18 at 0:13

Aditya

1,4161525

add a comment |

In more detail:

Typical baselines include those supported by scikit-learn's "dummy" estimators:

Classification baselines:

“stratified”: generates predictions by respecting the training set’s class distribution.

“most_frequent”: always predicts the most frequent label in the training set.

“prior”: always predicts the class that maximizes the class prior.

“uniform”: generates predictions uniformly at random.

“constant”: always predicts a constant label that is provided by the user.

This is useful for metrics that evaluate a non-majority class.

Regression baselines:

“median”: always predicts the median of the training set

“quantile”: always predicts a specified quantile of the training set,provided with the quantile parameter.

“constant”: always predicts a constant value that is provided by the user.

answered Apr 27 '18 at 0:13

Aditya

1,4161525

In more detail:

Typical baselines include those supported by scikit-learn's "dummy" estimators:

Classification baselines:

“stratified”: generates predictions by respecting the training set’s class distribution.

“most_frequent”: always predicts the most frequent label in the training set.

“prior”: always predicts the class that maximizes the class prior.

“uniform”: generates predictions uniformly at random.

“constant”: always predicts a constant label that is provided by the user.

This is useful for metrics that evaluate a non-majority class.

Regression baselines:

“median”: always predicts the median of the training set

“quantile”: always predicts a specified quantile of the training set,provided with the quantile parameter.

“constant”: always predicts a constant value that is provided by the user.

answered Apr 27 '18 at 0:13

Aditya

1,4161525

answered Apr 27 '18 at 0:13

Aditya

1,4161525

answered Apr 27 '18 at 0:13

Aditya

1,4161525

answered Apr 27 '18 at 0:13

Aditya

1,4161525

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk