What are the possible approaches to fixing Overfitting on a CNN?
$begingroup$
Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.
(50000, 120, 120) - training
(2983, 120, 120) - testing
And my model currently looks like the following - I've been testing/trying different methods.
model = Sequential()
model.add(Conv2D(64, kernel_size=3, use_bias=False,
input_shape=(size, size, 1)))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Conv2D(32, kernel_size=3, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
#TODO: Add in a lower learning rate - 0.001
adam = optimizers.adam(lr=0.01)
model.compile(optimizer=adam, loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, validation_data=(x_test, y_test),
epochs=number_of_epochs, verbose=1)
After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.
How many epochs: 10
Train on 50000 samples, validate on 2939 samples
Epoch 1/10
50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100
Epoch 2/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059
Epoch 3/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158
Epoch 4/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209
Epoch 5/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066
Epoch 6/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086
Epoch 7/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770
Epoch 8/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743
Epoch 9/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767
Epoch 10/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981
Model Saved
I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.
machine-learning deep-learning keras cnn overfitting
$endgroup$
add a comment |
$begingroup$
Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.
(50000, 120, 120) - training
(2983, 120, 120) - testing
And my model currently looks like the following - I've been testing/trying different methods.
model = Sequential()
model.add(Conv2D(64, kernel_size=3, use_bias=False,
input_shape=(size, size, 1)))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Conv2D(32, kernel_size=3, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
#TODO: Add in a lower learning rate - 0.001
adam = optimizers.adam(lr=0.01)
model.compile(optimizer=adam, loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, validation_data=(x_test, y_test),
epochs=number_of_epochs, verbose=1)
After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.
How many epochs: 10
Train on 50000 samples, validate on 2939 samples
Epoch 1/10
50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100
Epoch 2/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059
Epoch 3/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158
Epoch 4/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209
Epoch 5/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066
Epoch 6/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086
Epoch 7/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770
Epoch 8/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743
Epoch 9/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767
Epoch 10/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981
Model Saved
I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.
machine-learning deep-learning keras cnn overfitting
$endgroup$
add a comment |
$begingroup$
Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.
(50000, 120, 120) - training
(2983, 120, 120) - testing
And my model currently looks like the following - I've been testing/trying different methods.
model = Sequential()
model.add(Conv2D(64, kernel_size=3, use_bias=False,
input_shape=(size, size, 1)))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Conv2D(32, kernel_size=3, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
#TODO: Add in a lower learning rate - 0.001
adam = optimizers.adam(lr=0.01)
model.compile(optimizer=adam, loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, validation_data=(x_test, y_test),
epochs=number_of_epochs, verbose=1)
After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.
How many epochs: 10
Train on 50000 samples, validate on 2939 samples
Epoch 1/10
50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100
Epoch 2/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059
Epoch 3/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158
Epoch 4/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209
Epoch 5/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066
Epoch 6/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086
Epoch 7/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770
Epoch 8/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743
Epoch 9/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767
Epoch 10/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981
Model Saved
I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.
machine-learning deep-learning keras cnn overfitting
$endgroup$
Currently I am trying to make a cnn that would allow for age detection on facial images. My dataset has the following shape where the images are grayscale.
(50000, 120, 120) - training
(2983, 120, 120) - testing
And my model currently looks like the following - I've been testing/trying different methods.
model = Sequential()
model.add(Conv2D(64, kernel_size=3, use_bias=False,
input_shape=(size, size, 1)))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Conv2D(32, kernel_size=3, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
#TODO: Add in a lower learning rate - 0.001
adam = optimizers.adam(lr=0.01)
model.compile(optimizer=adam, loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, validation_data=(x_test, y_test),
epochs=number_of_epochs, verbose=1)
After running my data on just 10 epochs I started to initially see decent values but at the end of the run my results were the following and it has me concerned that my model is definitely over fitting.
How many epochs: 10
Train on 50000 samples, validate on 2939 samples
Epoch 1/10
50000/50000 [==============================] - 144s 3ms/step - loss: 1.7640 - acc: 0.3625 - val_loss: 1.6128 - val_acc: 0.4100
Epoch 2/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5815 - acc: 0.4059 - val_loss: 1.5682 - val_acc: 0.4059
Epoch 3/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.5026 - acc: 0.4264 - val_loss: 1.6673 - val_acc: 0.4158
Epoch 4/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.3996 - acc: 0.4641 - val_loss: 1.5618 - val_acc: 0.4209
Epoch 5/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.2478 - acc: 0.5226 - val_loss: 1.6530 - val_acc: 0.4066
Epoch 6/10
50000/50000 [==============================] - 141s 3ms/step - loss: 1.0619 - acc: 0.5954 - val_loss: 1.6661 - val_acc: 0.4086
Epoch 7/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.8695 - acc: 0.6750 - val_loss: 1.7392 - val_acc: 0.3770
Epoch 8/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.7054 - acc: 0.7368 - val_loss: 1.8634 - val_acc: 0.3743
Epoch 9/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5876 - acc: 0.7848 - val_loss: 1.8785 - val_acc: 0.3767
Epoch 10/10
50000/50000 [==============================] - 141s 3ms/step - loss: 0.5012 - acc: 0.8194 - val_loss: 2.2673 - val_acc: 0.3981
Model Saved
I assume the issue might be related to the number of images I have for each output class, but other then that I am a bit stuck in moving forward. Is there something wrong in my understanding/implementation? Any advice or critique would be well appreciated this is more of a learning project for me.
machine-learning deep-learning keras cnn overfitting
machine-learning deep-learning keras cnn overfitting
edited Dec 15 '18 at 8:54
Media
7,36762161
7,36762161
asked Dec 15 '18 at 8:22
BearsBeetBattlestarBearsBeetBattlestar
224
224
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.
Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.
$endgroup$
$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00
$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04
$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07
$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08
add a comment |
$begingroup$
To deal with overfitting, you need to use regularization during the training:
Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.
Keras - Usage of regularizers
Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).
Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:
Keras Image Preprocessing
If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).
Further corrections and improvements (nothing to do with overfitting):
- Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.
- Add an additional dense layer or 2 (only if the results are not good enough).
$endgroup$
$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33
$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01
$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39
add a comment |
$begingroup$
@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases
Can I know exactly how did you resolve your issue?
New contributor
$endgroup$
$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago
1
$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f42657%2fwhat-are-the-possible-approaches-to-fixing-overfitting-on-a-cnn%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.
Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.
$endgroup$
$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00
$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04
$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07
$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08
add a comment |
$begingroup$
Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.
Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.
$endgroup$
$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00
$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04
$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07
$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08
add a comment |
$begingroup$
Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.
Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.
$endgroup$
Try to use dropout after your dense layers not after maxpooling layers. Whatever comes before dense layers can be considered as the inputs of a classification layer. So keep them otherwise it somehow means you are loosing appropriate information. You should also be aware that you should not use dropout after the last layer.
Also you can add another dense layer, two hidden dense layers, for classification. It seems your data is not easy to learn.
edited Dec 15 '18 at 9:06
answered Dec 15 '18 at 8:53
MediaMedia
7,36762161
7,36762161
$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00
$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04
$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07
$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08
add a comment |
$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00
$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04
$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07
$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08
$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00
$begingroup$
Should I then move the dropout that's currently after maxpooling to be after the first dense layer? Or in general should I just put them after the last dense? And when you say not to use a dense layer after the last layer are you referring to the one with softmax?
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:00
$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04
$begingroup$
No, don't use them after pooling layers at all. In a CNN, You have convolutional stuff, then, you have dense layers. Suppose you have Dense layer #1, #2 and output. Use dropout after #1 and #2.
$endgroup$
– Media
Dec 15 '18 at 9:04
$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07
$begingroup$
And when you say not to use a dense layer after the last layer are you referring to the one with softmax? It was my mistake, I edited!
$endgroup$
– Media
Dec 15 '18 at 9:07
$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08
$begingroup$
I'm sorry I didnt see the edit, I will go and test out what you've mentioned hopefully it works out better.
$endgroup$
– BearsBeetBattlestar
Dec 15 '18 at 9:08
add a comment |
$begingroup$
To deal with overfitting, you need to use regularization during the training:
Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.
Keras - Usage of regularizers
Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).
Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:
Keras Image Preprocessing
If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).
Further corrections and improvements (nothing to do with overfitting):
- Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.
- Add an additional dense layer or 2 (only if the results are not good enough).
$endgroup$
$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33
$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01
$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39
add a comment |
$begingroup$
To deal with overfitting, you need to use regularization during the training:
Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.
Keras - Usage of regularizers
Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).
Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:
Keras Image Preprocessing
If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).
Further corrections and improvements (nothing to do with overfitting):
- Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.
- Add an additional dense layer or 2 (only if the results are not good enough).
$endgroup$
$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33
$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01
$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39
add a comment |
$begingroup$
To deal with overfitting, you need to use regularization during the training:
Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.
Keras - Usage of regularizers
Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).
Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:
Keras Image Preprocessing
If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).
Further corrections and improvements (nothing to do with overfitting):
- Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.
- Add an additional dense layer or 2 (only if the results are not good enough).
$endgroup$
To deal with overfitting, you need to use regularization during the training:
Weight regularization - The first thing you have to do (practically always) is to use regularization on the weights of the model. L1 or L2 regularization update the general loss function by adding another term known as the regularization term. As a result thee values of weights decrease because it assumes that a neural network with smaller weights leads to simpler models. Therefore, it will also reduce overfitting.
If you are not sure what you need, just use L2.
Keras - Usage of regularizers
Dropout - Add dropout layers after dense layers (by the way, there are also advantages to using dropout after the convolution layers, it helps with occlusions). Just make sure not to use it at the final dense layer (the one with the same size as the number of classes).
Data Augmentation - The simplest way to reduce overfitting is to increase the size of the training data. Use data augmentation to potentially expend your training set to "infinity". Keras's data augmentation is really simple an easy to use:
Keras Image Preprocessing
If you implement these 3 steps, you will see drastic improvements (probably even just after the first one).
Further corrections and improvements (nothing to do with overfitting):
- Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer.
- Add an additional dense layer or 2 (only if the results are not good enough).
answered Dec 15 '18 at 11:13
Mark.FMark.F
9661418
9661418
$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33
$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01
$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39
add a comment |
$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33
$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01
$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39
$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33
$begingroup$
When you mention "Your batch normalization layer should come after the non-linear activation, or more accurately, it needs to come before the next convolution layer." isn't that what I'm already doing? As in adding it right after my first conv2d layer and before the second one? Or am I misunderstanding...
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 0:33
$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01
$begingroup$
also would the regularization go something like this in my context model.add(Dense(64, use_bias=False, kernel_regularizer=regularizers.l2( 0.01)))
$endgroup$
– BearsBeetBattlestar
Dec 16 '18 at 1:01
$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
1st comment: No, currently you are using the batch normalization before the non-linear activatoin: Conv->BatchNorm->ReLU. It needs to be Conv->ReLU->BatchNorm.
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
2nd comment: Yes
$endgroup$
– Mark.F
Dec 16 '18 at 9:38
$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39
$begingroup$
Last comment, in Keras you can insert to ReLU activation as part of the CONV layer: model.add(Conv2D( 96, (11,11), padding='valid', kernel_regularizer=regularizers.l2(weight_decay), activation='relu'))
$endgroup$
– Mark.F
Dec 16 '18 at 9:39
add a comment |
$begingroup$
@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases
Can I know exactly how did you resolve your issue?
New contributor
$endgroup$
$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago
1
$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago
add a comment |
$begingroup$
@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases
Can I know exactly how did you resolve your issue?
New contributor
$endgroup$
$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago
1
$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago
add a comment |
$begingroup$
@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases
Can I know exactly how did you resolve your issue?
New contributor
$endgroup$
@BearsBeetBattlestar I'm facing the same issue and I've raised a separate question.
Validation loss increases and validation accuracy decreases
Can I know exactly how did you resolve your issue?
New contributor
New contributor
answered 13 hours ago
strangerstranger
112
112
New contributor
New contributor
$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago
1
$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago
add a comment |
$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago
1
$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago
$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago
$begingroup$
What you write here is not an answer it can be written as a comment for the question
$endgroup$
– Alireza Zolanvari
13 hours ago
1
1
$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago
$begingroup$
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. You can also add a bounty to draw more attention to this question once you have enough reputation. - From Review
$endgroup$
– oW_
9 hours ago
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f42657%2fwhat-are-the-possible-approaches-to-fixing-overfitting-on-a-cnn%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown