What is purpose of partial derivatives in loss calculation (linear regression)?

I am studying ML and data science stuff from scratch. As a part of the course, I am studying how the models are derived. And for most of them, starting with the simplest - linear regression, we take partial derivatives. I understood its implementation part, however, I am a bit confused about why we need to take partial derivative there.

Is there any specific reason behind it? Can we use any other methodology to compute linear regression loss function?

asked 2 days ago

aB9

1011

New contributor

add a comment |

Is there any specific reason behind it? Can we use any other methodology to compute linear regression loss function?

asked 2 days ago

aB9

1011

New contributor

add a comment |

Is there any specific reason behind it? Can we use any other methodology to compute linear regression loss function?

asked 2 days ago

aB9

1011

New contributor

Is there any specific reason behind it? Can we use any other methodology to compute linear regression loss function?

machine-learning linear-regression machine-learning-model loss-function

asked 2 days ago

aB9

1011

New contributor

asked 2 days ago

aB9

1011

New contributor

asked 2 days ago

aB9

1011

New contributor

asked 2 days ago

aB9

1011

asked 2 days ago

aB9

1011

New contributor

aB9 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

The underlying idea behind machine learning is to come up with more or less complicated algorithms such that, given a set of input data, one is able to produce some sort of output; this output in turn depends on some parameters (upon which the model is specified). The objective is to choose the aforementioned parameters so that the algorithm output is as close as possible to the actual result. Now let $y_i$ and $f(x_i, beta)$ be the actual and the predicted value, respectively, (in correspondence of the variable $x_i$): the previous sentence translates into trying to choose the parameters $beta$ (whatever they mean) such that the error you commit is the smallest, namely $L(y | x, beta)$ is minimised, where the function $L$ is any way we decide to estimate the error between predictions and actuals. In the literature $L$ is referred to as loss function and for most practical purposes, especially for polynomial models like linear regression, it reduces to the sum of squares
$$
L(y|x) = sum_{i=1}^N (y_i -f(x_i,beta))^2.
$$
For each choice of parameters $beta$ and function $f(x_i)$ the above takes different values; we are looking for, once we decide to fix the form of $f$, the set of $beta$ such that the above is the smallest. Assuming the loss function is differentiable, local minima must satisfy the condition such that the collection of partial derivatives with respect to the variable in question (in this case $beta$) must vanish; as such, at the end of the day one comes down to essentially taking derivatives and equating them to zero.

In case of linear regression one assumes $f(x_i, beta) = sum_{j=1}^M x_i^j beta_j$: plugging this expression into the loss function and taking derivatives gives back the familiar expressions for the coefficients that one learns in school. Likewise for more complicated models: the form of the function $f$ might be more complicated, there might be analytical problems related to the computational minimisation of the loss, there might be a bunch of more parameters connected to each other in a somewhat complex way (for instance for neural networks), nevertheless the underlying argument still holds.

answered 2 days ago

gented

26816

add a comment |

It all comes down on how backward propagation works. Ultimately you need to know how much each part of the equation contributed to the final error and then modify the values of such part of the equation.

In the case of linear regression is a bit more trivial, but when you start stacking one linear regression after other (and that is essentially a neural network) then you need to know how much each coefficient of each layer contributes to the final error. If you were to use a full derivative instead of a partial one, you could not determine the error contribution with such grain-level precision.

I understand this is not a good mathematical explanation, but it should give you an intuition of why you need the partial derivative.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

aB9 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46813%2fwhat-is-purpose-of-partial-derivatives-in-loss-calculation-linear-regression%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered 2 days ago

gented

26816

add a comment |

answered 2 days ago

gented

26816

add a comment |

answered 2 days ago

gented

26816

answered 2 days ago

gented

26816

answered 2 days ago

gented

26816

answered 2 days ago

gented

26816

answered 2 days ago

gented

26816

add a comment |

I understand this is not a good mathematical explanation, but it should give you an intuition of why you need the partial derivative.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

add a comment |

I understand this is not a good mathematical explanation, but it should give you an intuition of why you need the partial derivative.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

add a comment |

I understand this is not a good mathematical explanation, but it should give you an intuition of why you need the partial derivative.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

I understand this is not a good mathematical explanation, but it should give you an intuition of why you need the partial derivative.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

answered 2 days ago

Juan Antonio Gomez Moriano

656213

answered 2 days ago

Juan Antonio Gomez Moriano

656213

answered 2 days ago

Juan Antonio Gomez Moriano

656213

add a comment |

aB9 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

aB9 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk