what exactly happens during each epoch in neural network training
$begingroup$
- Across different epochs, which of the following is/are updated?
initial weights (initial ConvNet filter matrices, initial fully connected weights)
hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...
- The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?
neural-network deep-learning hyperparameter-tuning epochs
$endgroup$
add a comment |
$begingroup$
- Across different epochs, which of the following is/are updated?
initial weights (initial ConvNet filter matrices, initial fully connected weights)
hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...
- The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?
neural-network deep-learning hyperparameter-tuning epochs
$endgroup$
add a comment |
$begingroup$
- Across different epochs, which of the following is/are updated?
initial weights (initial ConvNet filter matrices, initial fully connected weights)
hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...
- The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?
neural-network deep-learning hyperparameter-tuning epochs
$endgroup$
- Across different epochs, which of the following is/are updated?
initial weights (initial ConvNet filter matrices, initial fully connected weights)
hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...
- The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?
neural-network deep-learning hyperparameter-tuning epochs
neural-network deep-learning hyperparameter-tuning epochs
asked 2 days ago
feynmanfeynman
578
578
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).
$endgroup$
$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago
1
$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46924%2fwhat-exactly-happens-during-each-epoch-in-neural-network-training%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).
$endgroup$
$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago
1
$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago
|
show 1 more comment
$begingroup$
You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).
$endgroup$
$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago
1
$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago
|
show 1 more comment
$begingroup$
You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).
$endgroup$
You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).
answered 2 days ago
Victor OliveiraVictor Oliveira
1707
1707
$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago
1
$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago
|
show 1 more comment
$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago
1
$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago
1
1
$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago
$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago
$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago
|
show 1 more comment
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46924%2fwhat-exactly-happens-during-each-epoch-in-neural-network-training%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown