What are the possible approaches to fixing Overfitting on a CNN?

Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.

(50000, 120, 120) - training 

(2983, 120, 120) - testing

And my model currently looks like the following - I've been testing/trying different methods.

    model = Sequential()

    model.add(Conv2D(64, kernel_size=3, use_bias=False,

                     input_shape=(size, size, 1)))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(Conv2D(32, kernel_size=3, use_bias=False))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Dropout(0.25))

    model.add(Flatten())



    model.add(Dense(128, use_bias=False))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(Dropout(0.5))

    model.add(Dense(10, activation='softmax'))





    #TODO: Add in a lower learning rate - 0.001

    adam = optimizers.adam(lr=0.01)

    model.compile(optimizer=adam, loss='categorical_crossentropy',

                  metrics=['accuracy'])

    model.fit(x_train, y_train, validation_data=(x_test, y_test),

              epochs=number_of_epochs, verbose=1)

After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.

How many epochs: 10

Train on 50000 samples, validate on 2939 samples

Epoch 1/10

50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100

Epoch 2/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059

Epoch 3/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158

Epoch 4/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209

Epoch 5/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066

Epoch 6/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086

Epoch 7/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770

Epoch 8/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743

Epoch 9/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767

Epoch 10/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981

Model Saved

I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.

edited Dec 15 '18 at 8:54

Media

7,36762161

asked Dec 15 '18 at 8:22

BearsBeetBattlestar

224

add a comment |

Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.

(50000, 120, 120) - training 

(2983, 120, 120) - testing

And my model currently looks like the following - I've been testing/trying different methods.

    model = Sequential()

    model.add(Conv2D(64, kernel_size=3, use_bias=False,

                     input_shape=(size, size, 1)))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(Conv2D(32, kernel_size=3, use_bias=False))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Dropout(0.25))

    model.add(Flatten())



    model.add(Dense(128, use_bias=False))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(Dropout(0.5))

    model.add(Dense(10, activation='softmax'))





    #TODO: Add in a lower learning rate - 0.001

    adam = optimizers.adam(lr=0.01)

    model.compile(optimizer=adam, loss='categorical_crossentropy',

                  metrics=['accuracy'])

    model.fit(x_train, y_train, validation_data=(x_test, y_test),

              epochs=number_of_epochs, verbose=1)

How many epochs: 10

Train on 50000 samples, validate on 2939 samples

Epoch 1/10

50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100

Epoch 2/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059

Epoch 3/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158

Epoch 4/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209

Epoch 5/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066

Epoch 6/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086

Epoch 7/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770

Epoch 8/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743

Epoch 9/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767

Epoch 10/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981

Model Saved

edited Dec 15 '18 at 8:54

Media

7,36762161

asked Dec 15 '18 at 8:22

BearsBeetBattlestar

224

add a comment |

Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.

(50000, 120, 120) - training 

(2983, 120, 120) - testing

And my model currently looks like the following - I've been testing/trying different methods.

    model = Sequential()

    model.add(Conv2D(64, kernel_size=3, use_bias=False,

                     input_shape=(size, size, 1)))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(Conv2D(32, kernel_size=3, use_bias=False))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Dropout(0.25))

    model.add(Flatten())



    model.add(Dense(128, use_bias=False))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(Dropout(0.5))

    model.add(Dense(10, activation='softmax'))





    #TODO: Add in a lower learning rate - 0.001

    adam = optimizers.adam(lr=0.01)

    model.compile(optimizer=adam, loss='categorical_crossentropy',

                  metrics=['accuracy'])

    model.fit(x_train, y_train, validation_data=(x_test, y_test),

              epochs=number_of_epochs, verbose=1)

How many epochs: 10

Train on 50000 samples, validate on 2939 samples

Epoch 1/10

50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100

Epoch 2/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059

Epoch 3/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158

Epoch 4/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209

Epoch 5/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066

Epoch 6/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086

Epoch 7/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770

Epoch 8/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743

Epoch 9/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767

Epoch 10/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981

Model Saved

edited Dec 15 '18 at 8:54

Media

7,36762161

asked Dec 15 '18 at 8:22

BearsBeetBattlestar

224

Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.

(50000, 120, 120) - training 

(2983, 120, 120) - testing

And my model currently looks like the following - I've been testing/trying different methods.

    model = Sequential()

    model.add(Conv2D(64, kernel_size=3, use_bias=False,

                     input_shape=(size, size, 1)))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(Conv2D(32, kernel_size=3, use_bias=False))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Dropout(0.25))

    model.add(Flatten())



    model.add(Dense(128, use_bias=False))

    model.add(BatchNormalization())

    model.add(Activation("relu"))



    model.add(Dropout(0.5))

    model.add(Dense(10, activation='softmax'))





    #TODO: Add in a lower learning rate - 0.001

    adam = optimizers.adam(lr=0.01)

    model.compile(optimizer=adam, loss='categorical_crossentropy',

                  metrics=['accuracy'])

    model.fit(x_train, y_train, validation_data=(x_test, y_test),

              epochs=number_of_epochs, verbose=1)

How many epochs: 10

Train on 50000 samples, validate on 2939 samples

Epoch 1/10

50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100

Epoch 2/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059

Epoch 3/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158

Epoch 4/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209

Epoch 5/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066

Epoch 6/10

50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086

Epoch 7/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770

Epoch 8/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743

Epoch 9/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767

Epoch 10/10

50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981

Model Saved

machine-learning deep-learning keras cnn overfitting

edited Dec 15 '18 at 8:54

Media

7,36762161

asked Dec 15 '18 at 8:22

BearsBeetBattlestar

224

edited Dec 15 '18 at 8:54

Media

7,36762161

asked Dec 15 '18 at 8:22

BearsBeetBattlestar

224

edited Dec 15 '18 at 8:54

Media

7,36762161

edited Dec 15 '18 at 8:54

Media

7,36762161

edited Dec 15 '18 at 8:54

Media

7,36762161

asked Dec 15 '18 at 8:22

BearsBeetBattlestar

224

asked Dec 15 '18 at 8:22

BearsBeetBattlestar

224

asked Dec 15 '18 at 8:22

BearsBeetBattlestar

224

add a comment |

3 Answers
3

active

oldest

votes

Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.

Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.

edited Dec 15 '18 at 9:06

answered Dec 15 '18 at 8:53

Media

7,36762161

$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00

$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04

$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07

$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08

add a comment |

To deal with overfitting, you need to use regularization during the training:

Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.

Keras - Usage of regularizers

Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).

Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:

Keras Image Preprocessing

If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).

Further corrections and improvements (nothing to do with overfitting):

Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

Add an additional dense layer or 2 (only if the results are not good enough).

answered Dec 15 '18 at 11:13

Mark.F

9661418

$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33

$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01

$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39

add a comment |

@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases

Can I know exactly how did you resolve your issue?

answered 13 hours ago

stranger

112

New contributor

$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago

1

$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f42657%2fwhat-are-the-possible-approaches-to-fixing-overfitting-on-a-cnn%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.

edited Dec 15 '18 at 9:06

answered Dec 15 '18 at 8:53

Media

7,36762161

$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00

$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04

$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07

$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08

add a comment |

Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.

edited Dec 15 '18 at 9:06

answered Dec 15 '18 at 8:53

Media

7,36762161

$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00

$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04

$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07

$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08

add a comment |

Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.

edited Dec 15 '18 at 9:06

answered Dec 15 '18 at 8:53

Media

7,36762161

Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.

edited Dec 15 '18 at 9:06

answered Dec 15 '18 at 8:53

Media

7,36762161

edited Dec 15 '18 at 9:06

answered Dec 15 '18 at 8:53

Media

7,36762161

answered Dec 15 '18 at 8:53

Media

7,36762161

answered Dec 15 '18 at 8:53

Media

7,36762161

$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00

$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04

$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07

$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08

add a comment |

$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00

$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04

$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07

$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08

Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?

– BearsBeetBattlestar
Dec 15 '18 at 9:00

No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.

– Media
Dec 15 '18 at 9:04

And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!

– Media
Dec 15 '18 at 9:07

I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.

– BearsBeetBattlestar
Dec 15 '18 at 9:08

add a comment |

To deal with overfitting, you need to use regularization during the training:

Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.

Keras - Usage of regularizers

Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).

Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:

Keras Image Preprocessing

If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).

Further corrections and improvements (nothing to do with overfitting):

Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

Add an additional dense layer or 2 (only if the results are not good enough).

answered Dec 15 '18 at 11:13

Mark.F

9661418

$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33

$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01

$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39

add a comment |

To deal with overfitting, you need to use regularization during the training:

Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.

Keras - Usage of regularizers

Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).

Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:

Keras Image Preprocessing

If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).

Further corrections and improvements (nothing to do with overfitting):

Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

Add an additional dense layer or 2 (only if the results are not good enough).

answered Dec 15 '18 at 11:13

Mark.F

9661418

$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33

$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01

$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39

add a comment |

To deal with overfitting, you need to use regularization during the training:

Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.

Keras - Usage of regularizers

Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).

Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:

Keras Image Preprocessing

If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).

Further corrections and improvements (nothing to do with overfitting):

Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

Add an additional dense layer or 2 (only if the results are not good enough).

answered Dec 15 '18 at 11:13

Mark.F

9661418

To deal with overfitting, you need to use regularization during the training:

Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.

Keras - Usage of regularizers

Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).

Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:

Keras Image Preprocessing

If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).

Further corrections and improvements (nothing to do with overfitting):

Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.

Add an additional dense layer or 2 (only if the results are not good enough).

answered Dec 15 '18 at 11:13

Mark.F

9661418

answered Dec 15 '18 at 11:13

Mark.F

9661418

answered Dec 15 '18 at 11:13

Mark.F

9661418

answered Dec 15 '18 at 11:13

Mark.F

9661418

$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33

$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01

$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39

add a comment |

$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33

$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01

$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38

$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39

When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...

– BearsBeetBattlestar
Dec 16 '18 at 0:33

also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))

– BearsBeetBattlestar
Dec 16 '18 at 1:01

1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.

– Mark.F
Dec 16 '18 at 9:38

2nd comment: Yes

– Mark.F
Dec 16 '18 at 9:38

Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))

– Mark.F
Dec 16 '18 at 9:39

add a comment |

@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases

Can I know exactly how did you resolve your issue?

answered 13 hours ago

stranger

112

New contributor

$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago

1

$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago

add a comment |

@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases

Can I know exactly how did you resolve your issue?

answered 13 hours ago

stranger

112

New contributor

$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago

1

$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago

add a comment |

@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases

Can I know exactly how did you resolve your issue?

answered 13 hours ago

stranger

112

New contributor

@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases

Can I know exactly how did you resolve your issue?

answered 13 hours ago

stranger

112

New contributor

answered 13 hours ago

stranger

112

New contributor

answered 13 hours ago

stranger

112

answered 13 hours ago

stranger

112

New contributor

stranger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago

1

$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago

add a comment |

$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago

1

$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago

What you write here is not an answer it can be written as a comment for the question

– Alireza Zolanvari
13 hours ago

This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review

– oW_
9 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk