Difference between output of probabilistic and ordinary least squares regressions

If I execute the commands

my_reg = LinearRegression()

lin.reg.fit(X,Y)

I train my model. To my understanding training a model is calculating coefficient estimators.

I do not really understand the difference between this and e.g.

scipy.stats.linregress(X,Y)

calculating a 'normal' regression that also gives me the coefficient estimators and all the other statistics connected with it.

Could anyone tell me what is the difference here?

edited yesterday

Esmailian

6187

asked 2 days ago

ruedi

1212

New contributor

add a comment |

If I execute the commands

my_reg = LinearRegression()

lin.reg.fit(X,Y)

I train my model. To my understanding training a model is calculating coefficient estimators.

I do not really understand the difference between this and e.g.

scipy.stats.linregress(X,Y)

calculating a 'normal' regression that also gives me the coefficient estimators and all the other statistics connected with it.

Could anyone tell me what is the difference here?

edited yesterday

Esmailian

6187

asked 2 days ago

ruedi

1212

New contributor

add a comment |

If I execute the commands

my_reg = LinearRegression()

lin.reg.fit(X,Y)

I train my model. To my understanding training a model is calculating coefficient estimators.

I do not really understand the difference between this and e.g.

scipy.stats.linregress(X,Y)

calculating a 'normal' regression that also gives me the coefficient estimators and all the other statistics connected with it.

Could anyone tell me what is the difference here?

edited yesterday

Esmailian

6187

asked 2 days ago

ruedi

1212

New contributor

If I execute the commands

my_reg = LinearRegression()

lin.reg.fit(X,Y)

I train my model. To my understanding training a model is calculating coefficient estimators.

I do not really understand the difference between this and e.g.

scipy.stats.linregress(X,Y)

calculating a 'normal' regression that also gives me the coefficient estimators and all the other statistics connected with it.

Could anyone tell me what is the difference here?

machine-learning linear-regression

edited yesterday

Esmailian

6187

asked 2 days ago

ruedi

1212

New contributor

edited yesterday

Esmailian

6187

asked 2 days ago

ruedi

1212

New contributor

edited yesterday

Esmailian

6187

edited yesterday

Esmailian

6187

edited yesterday

Esmailian

6187

asked 2 days ago

ruedi

1212

New contributor

asked 2 days ago

ruedi

1212

asked 2 days ago

ruedi

1212

New contributor

ruedi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

They both solve the exact same objective, which is minimizing the mean squared error. However the second method can answer "how confident it is that slope is not zero, i.e. $Y$ is correlated with $X$?" via p-value.

In detail

Lets denote the data as $(X, Y) = {(x_n, y_n)|x_n in mathbb{R}^D, y_n in mathbb{R}}$. And the regression as $hat{y} = Ax+B$.

Extra quantities returned by scipy.stats.linregress(X,Y) are: rvalue ($r$), and pvalue ($p$).

In statistics, $r^2$ (known as r-squared) measures the "goodness-of-fit" . That is, as regression $hat{y}=Ax+B$ gets closer to observation $y$, $r^2$ gets closer to $1$. Since it is a function of $y$ and $hat{y}$, it can be calculated for the first method too. So no difference here.

However, $p$ is specific to second method. scipy.stats.linregress(X,Y) adds a normality assumption to noise, i.e. assumes $epsilon sim N(0, sigma^2)$ where $$epsilon = y - overbrace{Ax+B}^{hat{y}}$$
On the basis of this assumption, it can answer an additional question: "how confident it is that the slope is not zero?". The first method cannot answer this question.

For example, suppose the estimated slope is $2.1$ for both methods, we still cannot tell whether this slope is significant or $Y$ is actually independent of $X$. Unless we look at the value of $p$. For example, for $p < 0.01$ we are confident (at significance level $0.01$) that $Y$ is correlated with $X$, but for $p > 0.1$ we cannot be confident, i.e. slope $2.1$ could be due to chance and $Y$ might be independent of $X$.

This link gives more details on how p-value is actually calculated in second method.

edited 22 hours ago

answered 2 days ago

Esmailian

6187

add a comment |

There is no difference in the conceptual sense - both methods calculate linear regression coefficients. The difference lies in the interface - while through scipy.stats you gain the coefficients directly (and it is up to you to put them into an equation to calculate the predictions), scikit-learn wraps them into a model object so that you can use it in a similar fashion to other ML models such as decision trees, for example. (Actually, you can obtain the regression coefficients from the fitted scikit-learn model using my_reg.coef_.)

answered 2 days ago

Jan Šimbera

1962

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

ruedi is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46977%2fdifference-between-output-of-probabilistic-and-ordinary-least-squares-regression%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

In detail

Lets denote the data as $(X, Y) = {(x_n, y_n)|x_n in mathbb{R}^D, y_n in mathbb{R}}$. And the regression as $hat{y} = Ax+B$.

Extra quantities returned by scipy.stats.linregress(X,Y) are: rvalue ($r$), and pvalue ($p$).

This link gives more details on how p-value is actually calculated in second method.

edited 22 hours ago

answered 2 days ago

Esmailian

6187

add a comment |

In detail

Lets denote the data as $(X, Y) = {(x_n, y_n)|x_n in mathbb{R}^D, y_n in mathbb{R}}$. And the regression as $hat{y} = Ax+B$.

Extra quantities returned by scipy.stats.linregress(X,Y) are: rvalue ($r$), and pvalue ($p$).

This link gives more details on how p-value is actually calculated in second method.

edited 22 hours ago

answered 2 days ago

Esmailian

6187

add a comment |

In detail

Lets denote the data as $(X, Y) = {(x_n, y_n)|x_n in mathbb{R}^D, y_n in mathbb{R}}$. And the regression as $hat{y} = Ax+B$.

Extra quantities returned by scipy.stats.linregress(X,Y) are: rvalue ($r$), and pvalue ($p$).

This link gives more details on how p-value is actually calculated in second method.

edited 22 hours ago

answered 2 days ago

Esmailian

6187

In detail

Lets denote the data as $(X, Y) = {(x_n, y_n)|x_n in mathbb{R}^D, y_n in mathbb{R}}$. And the regression as $hat{y} = Ax+B$.

Extra quantities returned by scipy.stats.linregress(X,Y) are: rvalue ($r$), and pvalue ($p$).

This link gives more details on how p-value is actually calculated in second method.

edited 22 hours ago

answered 2 days ago

Esmailian

6187

edited 22 hours ago

answered 2 days ago

Esmailian

6187

answered 2 days ago

Esmailian

6187

answered 2 days ago

Esmailian

6187

add a comment |

answered 2 days ago

Jan Šimbera

1962

New contributor

add a comment |

answered 2 days ago

Jan Šimbera

1962

New contributor

add a comment |

answered 2 days ago

Jan Šimbera

1962

New contributor

answered 2 days ago

Jan Šimbera

1962

New contributor

answered 2 days ago

Jan Šimbera

1962

New contributor

answered 2 days ago

Jan Šimbera

1962

answered 2 days ago

Jan Šimbera

1962

New contributor

Jan Šimbera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

ruedi is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

ruedi is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk