what exactly happens during each epoch in neural network training

Across different epochs, which of the following is/are updated?

initial weights (initial ConvNet filter matrices, initial fully connected weights)

hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...

The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?

asked 2 days ago

feynman

578

add a comment |

Across different epochs, which of the following is/are updated?

initial weights (initial ConvNet filter matrices, initial fully connected weights)

hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...

The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?

asked 2 days ago

feynman

578

add a comment |

Across different epochs, which of the following is/are updated?

initial weights (initial ConvNet filter matrices, initial fully connected weights)

hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...

The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?

asked 2 days ago

feynman

578

Across different epochs, which of the following is/are updated?

initial weights (initial ConvNet filter matrices, initial fully connected weights)

hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...

The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?

neural-network deep-learning hyperparameter-tuning epochs

asked 2 days ago

feynman

578

asked 2 days ago

feynman

578

asked 2 days ago

feynman

578

asked 2 days ago

feynman

578

asked 2 days ago

feynman

578

add a comment |

1 Answer
1

active

oldest

votes

You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.

You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).

answered 2 days ago

Victor Oliveira

1707

$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago

1

$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago

$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46924%2fwhat-exactly-happens-during-each-epoch-in-neural-network-training%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.

You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).

answered 2 days ago

Victor Oliveira

1707

$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago

1

$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago

$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago

|
show 1 more comment

You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.

You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).

answered 2 days ago

Victor Oliveira

1707

$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago

1

$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago

$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago

|
show 1 more comment

You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.

You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).

answered 2 days ago

Victor Oliveira

1707

You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.

You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).

answered 2 days ago

Victor Oliveira

1707

answered 2 days ago

Victor Oliveira

1707

answered 2 days ago

Victor Oliveira

1707

answered 2 days ago

Victor Oliveira

1707

$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago

1

$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago

$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago

|
show 1 more comment

$begingroup$
1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
$endgroup$
– feynman
2 days ago

1

$begingroup$
No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
$endgroup$
– Victor Oliveira
2 days ago

$begingroup$
that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
$endgroup$
– feynman
2 days ago

$begingroup$
That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
$endgroup$
– Victor Oliveira
2 days ago

1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?

– feynman
2 days ago

2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?

– feynman
2 days ago

No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.

– Victor Oliveira
2 days ago

that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?

– feynman
2 days ago

That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.

– Victor Oliveira
2 days ago

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk