What are the possible approaches to fixing Overfitting on a CNN?












1












$begingroup$


Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.



(50000, 120, 120) - training 
(2983, 120, 120) - testing


And my model currently looks like the following - I've been testing/trying different methods.



    model = Sequential()
model.add(Conv2D(64, kernel_size=3, use_bias=False,
input_shape=(size, size, 1)))
model.add(BatchNormalization())
model.add(Activation("relu"))

model.add(Conv2D(32, kernel_size=3, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))

model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())

model.add(Dense(128, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))

model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))


#TODO: Add in a lower learning rate - 0.001
adam = optimizers.adam(lr=0.01)
model.compile(optimizer=adam, loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, validation_data=(x_test, y_test),
epochs=number_of_epochs, verbose=1)


After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.



How many epochs: 10
Train on 50000 samples, validate on 2939 samples
Epoch 1/10
50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100
Epoch 2/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059
Epoch 3/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158
Epoch 4/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209
Epoch 5/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066
Epoch 6/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086
Epoch 7/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770
Epoch 8/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743
Epoch 9/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767
Epoch 10/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981
Model Saved


I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.










share|improve this question











$endgroup$

















    1












    $begingroup$


    Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.



    (50000, 120, 120) - training 
    (2983, 120, 120) - testing


    And my model currently looks like the following - I've been testing/trying different methods.



        model = Sequential()
    model.add(Conv2D(64, kernel_size=3, use_bias=False,
    input_shape=(size, size, 1)))
    model.add(BatchNormalization())
    model.add(Activation("relu"))

    model.add(Conv2D(32, kernel_size=3, use_bias=False))
    model.add(BatchNormalization())
    model.add(Activation("relu"))

    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())

    model.add(Dense(128, use_bias=False))
    model.add(BatchNormalization())
    model.add(Activation("relu"))

    model.add(Dropout(0.5))
    model.add(Dense(10, activation='softmax'))


    #TODO: Add in a lower learning rate - 0.001
    adam = optimizers.adam(lr=0.01)
    model.compile(optimizer=adam, loss='categorical_crossentropy',
    metrics=['accuracy'])
    model.fit(x_train, y_train, validation_data=(x_test, y_test),
    epochs=number_of_epochs, verbose=1)


    After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.



    How many epochs: 10
    Train on 50000 samples, validate on 2939 samples
    Epoch 1/10
    50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100
    Epoch 2/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059
    Epoch 3/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158
    Epoch 4/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209
    Epoch 5/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066
    Epoch 6/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086
    Epoch 7/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770
    Epoch 8/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743
    Epoch 9/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767
    Epoch 10/10
    50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981
    Model Saved


    I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.










    share|improve this question











    $endgroup$















      1












      1








      1





      $begingroup$


      Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.



      (50000, 120, 120) - training 
      (2983, 120, 120) - testing


      And my model currently looks like the following - I've been testing/trying different methods.



          model = Sequential()
      model.add(Conv2D(64, kernel_size=3, use_bias=False,
      input_shape=(size, size, 1)))
      model.add(BatchNormalization())
      model.add(Activation("relu"))

      model.add(Conv2D(32, kernel_size=3, use_bias=False))
      model.add(BatchNormalization())
      model.add(Activation("relu"))

      model.add(MaxPooling2D(pool_size=(2, 2)))
      model.add(Dropout(0.25))
      model.add(Flatten())

      model.add(Dense(128, use_bias=False))
      model.add(BatchNormalization())
      model.add(Activation("relu"))

      model.add(Dropout(0.5))
      model.add(Dense(10, activation='softmax'))


      #TODO: Add in a lower learning rate - 0.001
      adam = optimizers.adam(lr=0.01)
      model.compile(optimizer=adam, loss='categorical_crossentropy',
      metrics=['accuracy'])
      model.fit(x_train, y_train, validation_data=(x_test, y_test),
      epochs=number_of_epochs, verbose=1)


      After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.



      How many epochs: 10
      Train on 50000 samples, validate on 2939 samples
      Epoch 1/10
      50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100
      Epoch 2/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059
      Epoch 3/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158
      Epoch 4/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209
      Epoch 5/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066
      Epoch 6/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086
      Epoch 7/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770
      Epoch 8/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743
      Epoch 9/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767
      Epoch 10/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981
      Model Saved


      I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.










      share|improve this question











      $endgroup$




      Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.



      (50000, 120, 120) - training 
      (2983, 120, 120) - testing


      And my model currently looks like the following - I've been testing/trying different methods.



          model = Sequential()
      model.add(Conv2D(64, kernel_size=3, use_bias=False,
      input_shape=(size, size, 1)))
      model.add(BatchNormalization())
      model.add(Activation("relu"))

      model.add(Conv2D(32, kernel_size=3, use_bias=False))
      model.add(BatchNormalization())
      model.add(Activation("relu"))

      model.add(MaxPooling2D(pool_size=(2, 2)))
      model.add(Dropout(0.25))
      model.add(Flatten())

      model.add(Dense(128, use_bias=False))
      model.add(BatchNormalization())
      model.add(Activation("relu"))

      model.add(Dropout(0.5))
      model.add(Dense(10, activation='softmax'))


      #TODO: Add in a lower learning rate - 0.001
      adam = optimizers.adam(lr=0.01)
      model.compile(optimizer=adam, loss='categorical_crossentropy',
      metrics=['accuracy'])
      model.fit(x_train, y_train, validation_data=(x_test, y_test),
      epochs=number_of_epochs, verbose=1)


      After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.



      How many epochs: 10
      Train on 50000 samples, validate on 2939 samples
      Epoch 1/10
      50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100
      Epoch 2/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059
      Epoch 3/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158
      Epoch 4/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209
      Epoch 5/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066
      Epoch 6/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086
      Epoch 7/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770
      Epoch 8/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743
      Epoch 9/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767
      Epoch 10/10
      50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981
      Model Saved


      I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.







      machine-learning deep-learning keras cnn overfitting






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 15 '18 at 8:54









      Media

      7,36762161




      7,36762161










      asked Dec 15 '18 at 8:22









      BearsBeetBattlestarBearsBeetBattlestar

      224




      224






















          3 Answers
          3






          active

          oldest

          votes


















          2












          $begingroup$

          Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.



          Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:00












          • $begingroup$
            No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
            $endgroup$
            – Media
            Dec 15 '18 at 9:04












          • $begingroup$
            And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
            $endgroup$
            – Media
            Dec 15 '18 at 9:07










          • $begingroup$
            I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:08



















          1












          $begingroup$

          To deal with overfitting, you need to use regularization during the training:





          1. Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
            If you are not sure what you need, just use L2.



            Keras - Usage of regularizers



          2. Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).



          3. Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:



            Keras Image Preprocessing




          If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).



          Further corrections and improvements (nothing to do with overfitting):




          • Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

          • Add an additional dense layer or 2 (only if the results are not good enough).






          share|improve this answer









          $endgroup$













          • $begingroup$
            When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 0:33










          • $begingroup$
            also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 1:01










          • $begingroup$
            1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            2nd comment: Yes
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:39



















          0












          $begingroup$

          @BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
          Validation loss increases and validation accuracy decreases



          Can I know exactly how did you resolve your issue?






          share|improve this answer








          New contributor




          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













          • $begingroup$
            What you write here is not an answer it can be written as a comment for the question
            $endgroup$
            – Alireza Zolanvari
            13 hours ago






          • 1




            $begingroup$
            This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
            $endgroup$
            – oW_
            9 hours ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f42657%2fwhat-are-the-possible-approaches-to-fixing-overfitting-on-a-cnn%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.



          Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:00












          • $begingroup$
            No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
            $endgroup$
            – Media
            Dec 15 '18 at 9:04












          • $begingroup$
            And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
            $endgroup$
            – Media
            Dec 15 '18 at 9:07










          • $begingroup$
            I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:08
















          2












          $begingroup$

          Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.



          Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:00












          • $begingroup$
            No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
            $endgroup$
            – Media
            Dec 15 '18 at 9:04












          • $begingroup$
            And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
            $endgroup$
            – Media
            Dec 15 '18 at 9:07










          • $begingroup$
            I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:08














          2












          2








          2





          $begingroup$

          Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.



          Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.






          share|improve this answer











          $endgroup$



          Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.



          Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 15 '18 at 9:06

























          answered Dec 15 '18 at 8:53









          MediaMedia

          7,36762161




          7,36762161












          • $begingroup$
            Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:00












          • $begingroup$
            No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
            $endgroup$
            – Media
            Dec 15 '18 at 9:04












          • $begingroup$
            And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
            $endgroup$
            – Media
            Dec 15 '18 at 9:07










          • $begingroup$
            I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:08


















          • $begingroup$
            Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:00












          • $begingroup$
            No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
            $endgroup$
            – Media
            Dec 15 '18 at 9:04












          • $begingroup$
            And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
            $endgroup$
            – Media
            Dec 15 '18 at 9:07










          • $begingroup$
            I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
            $endgroup$
            – BearsBeetBattlestar
            Dec 15 '18 at 9:08
















          $begingroup$
          Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
          $endgroup$
          – BearsBeetBattlestar
          Dec 15 '18 at 9:00






          $begingroup$
          Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
          $endgroup$
          – BearsBeetBattlestar
          Dec 15 '18 at 9:00














          $begingroup$
          No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
          $endgroup$
          – Media
          Dec 15 '18 at 9:04






          $begingroup$
          No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
          $endgroup$
          – Media
          Dec 15 '18 at 9:04














          $begingroup$
          And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
          $endgroup$
          – Media
          Dec 15 '18 at 9:07




          $begingroup$
          And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
          $endgroup$
          – Media
          Dec 15 '18 at 9:07












          $begingroup$
          I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
          $endgroup$
          – BearsBeetBattlestar
          Dec 15 '18 at 9:08




          $begingroup$
          I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
          $endgroup$
          – BearsBeetBattlestar
          Dec 15 '18 at 9:08











          1












          $begingroup$

          To deal with overfitting, you need to use regularization during the training:





          1. Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
            If you are not sure what you need, just use L2.



            Keras - Usage of regularizers



          2. Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).



          3. Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:



            Keras Image Preprocessing




          If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).



          Further corrections and improvements (nothing to do with overfitting):




          • Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

          • Add an additional dense layer or 2 (only if the results are not good enough).






          share|improve this answer









          $endgroup$













          • $begingroup$
            When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 0:33










          • $begingroup$
            also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 1:01










          • $begingroup$
            1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            2nd comment: Yes
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:39
















          1












          $begingroup$

          To deal with overfitting, you need to use regularization during the training:





          1. Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
            If you are not sure what you need, just use L2.



            Keras - Usage of regularizers



          2. Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).



          3. Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:



            Keras Image Preprocessing




          If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).



          Further corrections and improvements (nothing to do with overfitting):




          • Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

          • Add an additional dense layer or 2 (only if the results are not good enough).






          share|improve this answer









          $endgroup$













          • $begingroup$
            When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 0:33










          • $begingroup$
            also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 1:01










          • $begingroup$
            1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            2nd comment: Yes
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:39














          1












          1








          1





          $begingroup$

          To deal with overfitting, you need to use regularization during the training:





          1. Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
            If you are not sure what you need, just use L2.



            Keras - Usage of regularizers



          2. Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).



          3. Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:



            Keras Image Preprocessing




          If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).



          Further corrections and improvements (nothing to do with overfitting):




          • Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

          • Add an additional dense layer or 2 (only if the results are not good enough).






          share|improve this answer









          $endgroup$



          To deal with overfitting, you need to use regularization during the training:





          1. Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
            If you are not sure what you need, just use L2.



            Keras - Usage of regularizers



          2. Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).



          3. Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:



            Keras Image Preprocessing




          If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).



          Further corrections and improvements (nothing to do with overfitting):




          • Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

          • Add an additional dense layer or 2 (only if the results are not good enough).







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Dec 15 '18 at 11:13









          Mark.FMark.F

          9661418




          9661418












          • $begingroup$
            When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 0:33










          • $begingroup$
            also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 1:01










          • $begingroup$
            1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            2nd comment: Yes
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:39


















          • $begingroup$
            When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 0:33










          • $begingroup$
            also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
            $endgroup$
            – BearsBeetBattlestar
            Dec 16 '18 at 1:01










          • $begingroup$
            1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            2nd comment: Yes
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:38










          • $begingroup$
            Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
            $endgroup$
            – Mark.F
            Dec 16 '18 at 9:39
















          $begingroup$
          When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
          $endgroup$
          – BearsBeetBattlestar
          Dec 16 '18 at 0:33




          $begingroup$
          When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
          $endgroup$
          – BearsBeetBattlestar
          Dec 16 '18 at 0:33












          $begingroup$
          also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
          $endgroup$
          – BearsBeetBattlestar
          Dec 16 '18 at 1:01




          $begingroup$
          also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
          $endgroup$
          – BearsBeetBattlestar
          Dec 16 '18 at 1:01












          $begingroup$
          1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
          $endgroup$
          – Mark.F
          Dec 16 '18 at 9:38




          $begingroup$
          1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
          $endgroup$
          – Mark.F
          Dec 16 '18 at 9:38












          $begingroup$
          2nd comment: Yes
          $endgroup$
          – Mark.F
          Dec 16 '18 at 9:38




          $begingroup$
          2nd comment: Yes
          $endgroup$
          – Mark.F
          Dec 16 '18 at 9:38












          $begingroup$
          Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
          $endgroup$
          – Mark.F
          Dec 16 '18 at 9:39




          $begingroup$
          Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
          $endgroup$
          – Mark.F
          Dec 16 '18 at 9:39











          0












          $begingroup$

          @BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
          Validation loss increases and validation accuracy decreases



          Can I know exactly how did you resolve your issue?






          share|improve this answer








          New contributor




          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













          • $begingroup$
            What you write here is not an answer it can be written as a comment for the question
            $endgroup$
            – Alireza Zolanvari
            13 hours ago






          • 1




            $begingroup$
            This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
            $endgroup$
            – oW_
            9 hours ago
















          0












          $begingroup$

          @BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
          Validation loss increases and validation accuracy decreases



          Can I know exactly how did you resolve your issue?






          share|improve this answer








          New contributor




          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













          • $begingroup$
            What you write here is not an answer it can be written as a comment for the question
            $endgroup$
            – Alireza Zolanvari
            13 hours ago






          • 1




            $begingroup$
            This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
            $endgroup$
            – oW_
            9 hours ago














          0












          0








          0





          $begingroup$

          @BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
          Validation loss increases and validation accuracy decreases



          Can I know exactly how did you resolve your issue?






          share|improve this answer








          New contributor




          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$



          @BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
          Validation loss increases and validation accuracy decreases



          Can I know exactly how did you resolve your issue?







          share|improve this answer








          New contributor




          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 13 hours ago









          strangerstranger

          112




          112




          New contributor




          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.












          • $begingroup$
            What you write here is not an answer it can be written as a comment for the question
            $endgroup$
            – Alireza Zolanvari
            13 hours ago






          • 1




            $begingroup$
            This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
            $endgroup$
            – oW_
            9 hours ago


















          • $begingroup$
            What you write here is not an answer it can be written as a comment for the question
            $endgroup$
            – Alireza Zolanvari
            13 hours ago






          • 1




            $begingroup$
            This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
            $endgroup$
            – oW_
            9 hours ago
















          $begingroup$
          What you write here is not an answer it can be written as a comment for the question
          $endgroup$
          – Alireza Zolanvari
          13 hours ago




          $begingroup$
          What you write here is not an answer it can be written as a comment for the question
          $endgroup$
          – Alireza Zolanvari
          13 hours ago




          1




          1




          $begingroup$
          This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
          $endgroup$
          – oW_
          9 hours ago




          $begingroup$
          This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
          $endgroup$
          – oW_
          9 hours ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f42657%2fwhat-are-the-possible-approaches-to-fixing-overfitting-on-a-cnn%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Vallis Paradisi

          Tabula Rosettana