Remedies to CNN-LSTM overfitting on relatively small image dataset
$begingroup$
Notes
Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)
Context
I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.
Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.
Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).
Problem
The problem is with over-fitting, tracking best model on validation set is done by early stopping.
Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.
Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).
The model, I came up with is the following:
losses_weights=[[1, .4]];
main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction
x__ = TimeDistributed(
Conv2D(8, kernel_size=(3, 3), strides=(1, 1) , activation='relu')
)(main_input__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
x__ = (TimeDistributed(BatchNormalization()))(x__)
x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
# extract features and dropout
x__ = TimeDistributed(Flatten())(x__)
x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)
x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)
auxiliary_input__ = Input(shape=(31, 10), name='aux_input')
z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])
# We stack a deep densely-connected network on top
# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)
z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)
main_output__ = Dense(8, name='main_output')(z__)
################################################################################
loss=[loss_mse_warmup, loss_mse_warmup];
#;
model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])
model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)
history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)
loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.
Tries
- Several Batch-size lengths: [32, 16, 8, 2].
- loss weights ranging in [[1, .4], [1, .3], [1, .2]].
- Different variants of number of nodes in CNN: [8,16,32].
- variants of strides: (2, 2), (1, 1), (3, 3).
- LSTM layer number of nodes: 20 seems to be far better from other
tries. - Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.
- Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.
- Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.
- Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.
All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.
This is a visualization of the model:
lstm cnn overfitting
$endgroup$
add a comment |
$begingroup$
Notes
Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)
Context
I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.
Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.
Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).
Problem
The problem is with over-fitting, tracking best model on validation set is done by early stopping.
Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.
Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).
The model, I came up with is the following:
losses_weights=[[1, .4]];
main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction
x__ = TimeDistributed(
Conv2D(8, kernel_size=(3, 3), strides=(1, 1) , activation='relu')
)(main_input__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
x__ = (TimeDistributed(BatchNormalization()))(x__)
x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
# extract features and dropout
x__ = TimeDistributed(Flatten())(x__)
x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)
x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)
auxiliary_input__ = Input(shape=(31, 10), name='aux_input')
z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])
# We stack a deep densely-connected network on top
# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)
z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)
main_output__ = Dense(8, name='main_output')(z__)
################################################################################
loss=[loss_mse_warmup, loss_mse_warmup];
#;
model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])
model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)
history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)
loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.
Tries
- Several Batch-size lengths: [32, 16, 8, 2].
- loss weights ranging in [[1, .4], [1, .3], [1, .2]].
- Different variants of number of nodes in CNN: [8,16,32].
- variants of strides: (2, 2), (1, 1), (3, 3).
- LSTM layer number of nodes: 20 seems to be far better from other
tries. - Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.
- Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.
- Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.
- Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.
All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.
This is a visualization of the model:
lstm cnn overfitting
$endgroup$
add a comment |
$begingroup$
Notes
Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)
Context
I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.
Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.
Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).
Problem
The problem is with over-fitting, tracking best model on validation set is done by early stopping.
Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.
Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).
The model, I came up with is the following:
losses_weights=[[1, .4]];
main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction
x__ = TimeDistributed(
Conv2D(8, kernel_size=(3, 3), strides=(1, 1) , activation='relu')
)(main_input__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
x__ = (TimeDistributed(BatchNormalization()))(x__)
x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
# extract features and dropout
x__ = TimeDistributed(Flatten())(x__)
x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)
x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)
auxiliary_input__ = Input(shape=(31, 10), name='aux_input')
z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])
# We stack a deep densely-connected network on top
# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)
z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)
main_output__ = Dense(8, name='main_output')(z__)
################################################################################
loss=[loss_mse_warmup, loss_mse_warmup];
#;
model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])
model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)
history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)
loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.
Tries
- Several Batch-size lengths: [32, 16, 8, 2].
- loss weights ranging in [[1, .4], [1, .3], [1, .2]].
- Different variants of number of nodes in CNN: [8,16,32].
- variants of strides: (2, 2), (1, 1), (3, 3).
- LSTM layer number of nodes: 20 seems to be far better from other
tries. - Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.
- Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.
- Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.
- Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.
All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.
This is a visualization of the model:
lstm cnn overfitting
$endgroup$
Notes
Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)
Context
I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.
Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.
Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).
Problem
The problem is with over-fitting, tracking best model on validation set is done by early stopping.
Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.
Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).
The model, I came up with is the following:
losses_weights=[[1, .4]];
main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction
x__ = TimeDistributed(
Conv2D(8, kernel_size=(3, 3), strides=(1, 1) , activation='relu')
)(main_input__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
x__ = (TimeDistributed(BatchNormalization()))(x__)
x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
# extract features and dropout
x__ = TimeDistributed(Flatten())(x__)
x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)
x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)
auxiliary_input__ = Input(shape=(31, 10), name='aux_input')
z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])
# We stack a deep densely-connected network on top
# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)
z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)
main_output__ = Dense(8, name='main_output')(z__)
################################################################################
loss=[loss_mse_warmup, loss_mse_warmup];
#;
model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])
model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)
history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)
loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.
Tries
- Several Batch-size lengths: [32, 16, 8, 2].
- loss weights ranging in [[1, .4], [1, .3], [1, .2]].
- Different variants of number of nodes in CNN: [8,16,32].
- variants of strides: (2, 2), (1, 1), (3, 3).
- LSTM layer number of nodes: 20 seems to be far better from other
tries. - Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.
- Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.
- Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.
- Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.
All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.
This is a visualization of the model:
lstm cnn overfitting
lstm cnn overfitting
edited 2 days ago
bacloud14
asked 2 days ago
bacloud14bacloud14
699
699
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47243%2fremedies-to-cnn-lstm-overfitting-on-relatively-small-image-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47243%2fremedies-to-cnn-lstm-overfitting-on-relatively-small-image-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown