what exactly happens during each epoch in neural network training












1












$begingroup$



  1. Across different epochs, which of the following is/are updated?


initial weights (initial ConvNet filter matrices, initial fully connected weights)



hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...




  1. The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?










share|improve this question









$endgroup$

















    1












    $begingroup$



    1. Across different epochs, which of the following is/are updated?


    initial weights (initial ConvNet filter matrices, initial fully connected weights)



    hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...




    1. The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?










    share|improve this question









    $endgroup$















      1












      1








      1





      $begingroup$



      1. Across different epochs, which of the following is/are updated?


      initial weights (initial ConvNet filter matrices, initial fully connected weights)



      hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...




      1. The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?










      share|improve this question









      $endgroup$





      1. Across different epochs, which of the following is/are updated?


      initial weights (initial ConvNet filter matrices, initial fully connected weights)



      hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers...




      1. The lost function calculated from the last epoch appears to be the initial value of the lost function for the current epoch. Why?







      neural-network deep-learning hyperparameter-tuning epochs






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 2 days ago









      feynmanfeynman

      578




      578






















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$


          1. You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
            The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.


          2. You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
            Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).







          share|improve this answer









          $endgroup$













          • $begingroup$
            1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
            $endgroup$
            – feynman
            2 days ago












          • $begingroup$
            2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
            $endgroup$
            – feynman
            2 days ago






          • 1




            $begingroup$
            No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
            $endgroup$
            – Victor Oliveira
            2 days ago










          • $begingroup$
            that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
            $endgroup$
            – feynman
            2 days ago










          • $begingroup$
            That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
            $endgroup$
            – Victor Oliveira
            2 days ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46924%2fwhat-exactly-happens-during-each-epoch-in-neural-network-training%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$


          1. You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
            The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.


          2. You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
            Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).







          share|improve this answer









          $endgroup$













          • $begingroup$
            1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
            $endgroup$
            – feynman
            2 days ago












          • $begingroup$
            2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
            $endgroup$
            – feynman
            2 days ago






          • 1




            $begingroup$
            No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
            $endgroup$
            – Victor Oliveira
            2 days ago










          • $begingroup$
            that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
            $endgroup$
            – feynman
            2 days ago










          • $begingroup$
            That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
            $endgroup$
            – Victor Oliveira
            2 days ago
















          1












          $begingroup$


          1. You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
            The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.


          2. You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
            Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).







          share|improve this answer









          $endgroup$













          • $begingroup$
            1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
            $endgroup$
            – feynman
            2 days ago












          • $begingroup$
            2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
            $endgroup$
            – feynman
            2 days ago






          • 1




            $begingroup$
            No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
            $endgroup$
            – Victor Oliveira
            2 days ago










          • $begingroup$
            that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
            $endgroup$
            – feynman
            2 days ago










          • $begingroup$
            That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
            $endgroup$
            – Victor Oliveira
            2 days ago














          1












          1








          1





          $begingroup$


          1. You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
            The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.


          2. You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
            Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).







          share|improve this answer









          $endgroup$




          1. You are updating your network parameters, that is, weights for fully connected layers, for the filters in the convolution operations, etc.
            The hyperparameters are fixed once you start training your network. Hyperparameters are not intrinsic to the learning process and is something that the practitioner should tune carefully with GridSearch, Bayesian Optimization and Cross-Validation techniques.


          2. You have just one loss function during training, and at each batch procesing you update your weights correcting your network and, at least theoretically, diminishing your loss function. So after the first epoch, you have reached a certain value, that will be update on the next epoch.
            Think as you are on the top of a mountain, and you are climbing down, to no get tired, you count 10 steps and rest a little, after 10 steps you are not on the top again, you are going down from where you stopped, right? That is an analogy (I think it is bad, but if you understand it is ok haha).








          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          Victor OliveiraVictor Oliveira

          1707




          1707












          • $begingroup$
            1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
            $endgroup$
            – feynman
            2 days ago












          • $begingroup$
            2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
            $endgroup$
            – feynman
            2 days ago






          • 1




            $begingroup$
            No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
            $endgroup$
            – Victor Oliveira
            2 days ago










          • $begingroup$
            that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
            $endgroup$
            – feynman
            2 days ago










          • $begingroup$
            That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
            $endgroup$
            – Victor Oliveira
            2 days ago


















          • $begingroup$
            1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
            $endgroup$
            – feynman
            2 days ago












          • $begingroup$
            2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
            $endgroup$
            – feynman
            2 days ago






          • 1




            $begingroup$
            No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
            $endgroup$
            – Victor Oliveira
            2 days ago










          • $begingroup$
            that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
            $endgroup$
            – feynman
            2 days ago










          • $begingroup$
            That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
            $endgroup$
            – Victor Oliveira
            2 days ago
















          $begingroup$
          1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
          $endgroup$
          – feynman
          2 days ago






          $begingroup$
          1. so validation datasets dont affect any hyper parameters? are initial weights all re-randomized in each epoch?
          $endgroup$
          – feynman
          2 days ago














          $begingroup$
          2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
          $endgroup$
          – feynman
          2 days ago




          $begingroup$
          2. if all initial weights are re-randomized in each epoch, doesnt the lost function also start from an initial high value? if all initial weights are re-randomized, the training in each epoch should b a new training, irrespective of the last epoch?
          $endgroup$
          – feynman
          2 days ago




          1




          1




          $begingroup$
          No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
          $endgroup$
          – Victor Oliveira
          2 days ago




          $begingroup$
          No, the weights are not re-randomized at each epoch. They are on the start training process only. The weights that the second epoch will update are the ones from the last epoch, and It do not start again.
          $endgroup$
          – Victor Oliveira
          2 days ago












          $begingroup$
          that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
          $endgroup$
          – feynman
          2 days ago




          $begingroup$
          that makes more sense. but now that after the 1st epoch the cost function was already minimized, how will the cost function further decrease during the 2nd epoch?
          $endgroup$
          – feynman
          2 days ago












          $begingroup$
          That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
          $endgroup$
          – Victor Oliveira
          2 days ago




          $begingroup$
          That is the point. Why are the model minimizing a loss function? Because we update our weights in a direction where we get more points 'right' and less wrong. And how is the intensity we update these weights? Through the partial derivatives of our loss function multiplied by A LEARNING RATE, therefore we update our loss function/weights through small steps, and not all at once. This is becausa in the second epoch we still have leverage to further decrease our loss.
          $endgroup$
          – Victor Oliveira
          2 days ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46924%2fwhat-exactly-happens-during-each-epoch-in-neural-network-training%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Callistus I

          Tabula Rosettana

          How to label and detect the document text images