Remedies to CNN-LSTM overfitting on relatively small image dataset

Notes

Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)

Context

I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.

Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.

Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).

Problem

The problem is with over-fitting, tracking best model on validation set is done by early stopping.

Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.

Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).

The model, I came up with is the following:

losses_weights=[[1, .4]];

main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction



x__ = TimeDistributed(

    Conv2D(8, kernel_size=(3, 3), strides=(1, 1)  , activation='relu')

)(main_input__)

x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)

x__ = (TimeDistributed(BatchNormalization()))(x__)

x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)

x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)



# extract features and dropout 

x__ = TimeDistributed(Flatten())(x__)

x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)

x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)

lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)

auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)



auxiliary_input__ = Input(shape=(31, 10), name='aux_input')

z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])



# We stack a deep densely-connected network on top

# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)

z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)

main_output__ = Dense(8, name='main_output')(z__)

################################################################################



loss=[loss_mse_warmup, loss_mse_warmup];

 #;

model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])

model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)



history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)

loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.

Tries

Several Batch-size lengths: [32, 16, 8, 2].

loss weights ranging in [[1, .4], [1, .3], [1, .2]].

Different variants of number of nodes in CNN: [8,16,32].

variants of strides: (2, 2), (1, 1), (3, 3).

LSTM layer number of nodes: 20 seems to be far better from other
tries.

Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.

Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.

Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.

Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.

All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.

This is a visualization of the model:

enter image description here

edited 2 days ago

asked 2 days ago

bacloud14

699

add a comment |

Notes

Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)

Context

Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.

Problem

The problem is with over-fitting, tracking best model on validation set is done by early stopping.

Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.

Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).

The model, I came up with is the following:

losses_weights=[[1, .4]];

main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction



x__ = TimeDistributed(

    Conv2D(8, kernel_size=(3, 3), strides=(1, 1)  , activation='relu')

)(main_input__)

x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)

x__ = (TimeDistributed(BatchNormalization()))(x__)

x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)

x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)



# extract features and dropout 

x__ = TimeDistributed(Flatten())(x__)

x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)

x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)

lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)

auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)



auxiliary_input__ = Input(shape=(31, 10), name='aux_input')

z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])



# We stack a deep densely-connected network on top

# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)

z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)

main_output__ = Dense(8, name='main_output')(z__)

################################################################################



loss=[loss_mse_warmup, loss_mse_warmup];

 #;

model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])

model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)



history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)

loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.

Tries

Several Batch-size lengths: [32, 16, 8, 2].

loss weights ranging in [[1, .4], [1, .3], [1, .2]].

Different variants of number of nodes in CNN: [8,16,32].

variants of strides: (2, 2), (1, 1), (3, 3).

LSTM layer number of nodes: 20 seems to be far better from other
tries.

Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.

Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.

Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.

Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.

All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.

This is a visualization of the model:

enter image description here

edited 2 days ago

asked 2 days ago

bacloud14

699

add a comment |

Notes

Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)

Context

Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.

Problem

The problem is with over-fitting, tracking best model on validation set is done by early stopping.

Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.

Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).

The model, I came up with is the following:

losses_weights=[[1, .4]];

main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction



x__ = TimeDistributed(

    Conv2D(8, kernel_size=(3, 3), strides=(1, 1)  , activation='relu')

)(main_input__)

x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)

x__ = (TimeDistributed(BatchNormalization()))(x__)

x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)

x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)



# extract features and dropout 

x__ = TimeDistributed(Flatten())(x__)

x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)

x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)

lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)

auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)



auxiliary_input__ = Input(shape=(31, 10), name='aux_input')

z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])



# We stack a deep densely-connected network on top

# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)

z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)

main_output__ = Dense(8, name='main_output')(z__)

################################################################################



loss=[loss_mse_warmup, loss_mse_warmup];

 #;

model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])

model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)



history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)

loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.

Tries

Several Batch-size lengths: [32, 16, 8, 2].

loss weights ranging in [[1, .4], [1, .3], [1, .2]].

Different variants of number of nodes in CNN: [8,16,32].

variants of strides: (2, 2), (1, 1), (3, 3).

LSTM layer number of nodes: 20 seems to be far better from other
tries.

Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.

Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.

Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.

Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.

All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.

This is a visualization of the model:

enter image description here

edited 2 days ago

asked 2 days ago

bacloud14

699

Notes

Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)

Context

Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.

Problem

The problem is with over-fitting, tracking best model on validation set is done by early stopping.

Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.

Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).

The model, I came up with is the following:

losses_weights=[[1, .4]];

main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction



x__ = TimeDistributed(

    Conv2D(8, kernel_size=(3, 3), strides=(1, 1)  , activation='relu')

)(main_input__)

x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)

x__ = (TimeDistributed(BatchNormalization()))(x__)

x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)

x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)



# extract features and dropout 

x__ = TimeDistributed(Flatten())(x__)

x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)

x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)

lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)

auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)



auxiliary_input__ = Input(shape=(31, 10), name='aux_input')

z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])



# We stack a deep densely-connected network on top

# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)

z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)

main_output__ = Dense(8, name='main_output')(z__)

################################################################################



loss=[loss_mse_warmup, loss_mse_warmup];

 #;

model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])

model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)



history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)

loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.

Tries

Several Batch-size lengths: [32, 16, 8, 2].

loss weights ranging in [[1, .4], [1, .3], [1, .2]].

Different variants of number of nodes in CNN: [8,16,32].

variants of strides: (2, 2), (1, 1), (3, 3).

LSTM layer number of nodes: 20 seems to be far better from other
tries.

Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.

Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.

Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.

Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.

All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.

This is a visualization of the model:

enter image description here

lstm cnn overfitting

edited 2 days ago

asked 2 days ago

bacloud14

699

edited 2 days ago

asked 2 days ago

bacloud14

699

edited 2 days ago

asked 2 days ago

bacloud14

699

asked 2 days ago

bacloud14

699

asked 2 days ago

bacloud14

699

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47243%2fremedies-to-cnn-lstm-overfitting-on-relatively-small-image-dataset%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk