Remedies to CNN-LSTM overfitting on relatively small image dataset












0












$begingroup$


Notes



Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)



Context



I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.



Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.



Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).



Problem



The problem is with over-fitting, tracking best model on validation set is done by early stopping.



Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.



Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).



The model, I came up with is the following:



losses_weights=[[1, .4]];
main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction

x__ = TimeDistributed(
Conv2D(8, kernel_size=(3, 3), strides=(1, 1) , activation='relu')
)(main_input__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
x__ = (TimeDistributed(BatchNormalization()))(x__)
x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)
x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)

# extract features and dropout
x__ = TimeDistributed(Flatten())(x__)
x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)
x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)

auxiliary_input__ = Input(shape=(31, 10), name='aux_input')
z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])

# We stack a deep densely-connected network on top
# z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)
z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)
main_output__ = Dense(8, name='main_output')(z__)
################################################################################

loss=[loss_mse_warmup, loss_mse_warmup];
#;
model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])
model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)

history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)


loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.



Tries




  1. Several Batch-size lengths: [32, 16, 8, 2].

  2. loss weights ranging in [[1, .4], [1, .3], [1, .2]].

  3. Different variants of number of nodes in CNN: [8,16,32].

  4. variants of strides: (2, 2), (1, 1), (3, 3).

  5. LSTM layer number of nodes: 20 seems to be far better from other
    tries.

  6. Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.

  7. Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.

  8. Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.

  9. Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.


All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.



This is a visualization of the model:



enter image description here










share|improve this question











$endgroup$

















    0












    $begingroup$


    Notes



    Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)



    Context



    I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.



    Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.



    Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).



    Problem



    The problem is with over-fitting, tracking best model on validation set is done by early stopping.



    Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.



    Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).



    The model, I came up with is the following:



    losses_weights=[[1, .4]];
    main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction

    x__ = TimeDistributed(
    Conv2D(8, kernel_size=(3, 3), strides=(1, 1) , activation='relu')
    )(main_input__)
    x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
    x__ = (TimeDistributed(BatchNormalization()))(x__)
    x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)
    x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)

    # extract features and dropout
    x__ = TimeDistributed(Flatten())(x__)
    x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)
    x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
    lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
    auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)

    auxiliary_input__ = Input(shape=(31, 10), name='aux_input')
    z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])

    # We stack a deep densely-connected network on top
    # z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)
    z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)
    main_output__ = Dense(8, name='main_output')(z__)
    ################################################################################

    loss=[loss_mse_warmup, loss_mse_warmup];
    #;
    model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])
    model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)

    history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)


    loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.



    Tries




    1. Several Batch-size lengths: [32, 16, 8, 2].

    2. loss weights ranging in [[1, .4], [1, .3], [1, .2]].

    3. Different variants of number of nodes in CNN: [8,16,32].

    4. variants of strides: (2, 2), (1, 1), (3, 3).

    5. LSTM layer number of nodes: 20 seems to be far better from other
      tries.

    6. Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.

    7. Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.

    8. Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.

    9. Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.


    All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.



    This is a visualization of the model:



    enter image description here










    share|improve this question











    $endgroup$















      0












      0








      0





      $begingroup$


      Notes



      Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)



      Context



      I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.



      Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.



      Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).



      Problem



      The problem is with over-fitting, tracking best model on validation set is done by early stopping.



      Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.



      Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).



      The model, I came up with is the following:



      losses_weights=[[1, .4]];
      main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction

      x__ = TimeDistributed(
      Conv2D(8, kernel_size=(3, 3), strides=(1, 1) , activation='relu')
      )(main_input__)
      x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
      x__ = (TimeDistributed(BatchNormalization()))(x__)
      x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)
      x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)

      # extract features and dropout
      x__ = TimeDistributed(Flatten())(x__)
      x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)
      x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
      lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
      auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)

      auxiliary_input__ = Input(shape=(31, 10), name='aux_input')
      z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])

      # We stack a deep densely-connected network on top
      # z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)
      z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)
      main_output__ = Dense(8, name='main_output')(z__)
      ################################################################################

      loss=[loss_mse_warmup, loss_mse_warmup];
      #;
      model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])
      model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)

      history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)


      loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.



      Tries




      1. Several Batch-size lengths: [32, 16, 8, 2].

      2. loss weights ranging in [[1, .4], [1, .3], [1, .2]].

      3. Different variants of number of nodes in CNN: [8,16,32].

      4. variants of strides: (2, 2), (1, 1), (3, 3).

      5. LSTM layer number of nodes: 20 seems to be far better from other
        tries.

      6. Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.

      7. Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.

      8. Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.

      9. Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.


      All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.



      This is a visualization of the model:



      enter image description here










      share|improve this question











      $endgroup$




      Notes



      Using a pretrained model, trying data augmentation (not possible knowing nature of images, lowering number of parameters in the network, all didn't help)



      Context



      I have a sequence of images. Target is a multivariate continuous time series. I am trying LSTM on CNN without using a pretrained model. Training a CNN model didn't got me a satisfying results. A very sure reason is that train is only on one year. While predicting with test images present is on several months.



      Any image augmentation is nearly impossible from nature of images, satellite images on a fixed geo-location, tracking passing clouds.



      Along with images, I have time features, and trend, seasonality of target which is known for test set, as it can be scientifically calculated (it's about GHI, estimated by the Ineichen and Perez model).



      Problem



      The problem is with over-fitting, tracking best model on validation set is done by early stopping.



      Validation set is a small fraction from train, .9 for train, so train set is furthermore made fewer.



      Train is a set of 8804 images, and target variable. Timed Model layers take series of 31 sequences. For example train takes (255,31) and validation takes (29,31).



      The model, I came up with is the following:



      losses_weights=[[1, .4]];
      main_input__ = Input(shape=(31, 120, 120, 1), name='main_input__')extraction

      x__ = TimeDistributed(
      Conv2D(8, kernel_size=(3, 3), strides=(1, 1) , activation='relu')
      )(main_input__)
      x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)
      x__ = (TimeDistributed(BatchNormalization()))(x__)
      x__ = TimeDistributed(Conv2D(8, (2,2), strides=(1, 1), activation='relu'))(x__)
      x__ = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(x__)

      # extract features and dropout
      x__ = TimeDistributed(Flatten())(x__)
      x__ = (TimeDistributed(Dense(8, activation='relu')))(x__)
      x__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
      lstm_out__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(x__)
      auxiliary_output__ = Dense(8, name='aux_output')(lstm_out__)

      auxiliary_input__ = Input(shape=(31, 10), name='aux_input')
      z__ = keras.layers.concatenate([lstm_out__, auxiliary_input__])

      # We stack a deep densely-connected network on top
      # z__ = (LSTM(lstm, return_sequences=True, dropout=0.4))(z__)
      z__ = Bidirectional(LSTM(lstm, return_sequences=True, dropout=0.3))(z__)
      main_output__ = Dense(8, name='main_output')(z__)
      ################################################################################

      loss=[loss_mse_warmup, loss_mse_warmup];
      #;
      model__ = Model(inputs=[main_input__, auxiliary_input__], outputs=[main_output__, auxiliary_output__])
      model__.compile(loss=loss_mse_warmup, optimizer='adam', loss_weights=loss_weights)

      history__ = model__.fit(x=[x_train, aux_train],y=[y_train, y_train], epochs=100, batch_size=2, validation_split=.9, callbacks=callbacks)


      loss_mse_warmup is just a mean_squared_error that ignores 5 first training input signals.



      Tries




      1. Several Batch-size lengths: [32, 16, 8, 2].

      2. loss weights ranging in [[1, .4], [1, .3], [1, .2]].

      3. Different variants of number of nodes in CNN: [8,16,32].

      4. variants of strides: (2, 2), (1, 1), (3, 3).

      5. LSTM layer number of nodes: 20 seems to be far better from other
        tries.

      6. Stacking two layers of LSTM gives nearly same result as one layer for main input and auxiliary input.

      7. Validation and train loss of auxiliary output is less than main output, so auxiliary data is useful.

      8. Time sequences tried: 62, 31 and 1. 31 Is slightly better. it represents half a day.

      9. Tested pretrained mobilenet model wrapped in TimeDistributed layer. But it didn't show better results.


      All tries didn't achieve validation loss better than .2 knowing that learning can be improved, knowing the challenge platform.



      This is a visualization of the model:



      enter image description here







      lstm cnn overfitting






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 2 days ago







      bacloud14

















      asked 2 days ago









      bacloud14bacloud14

      699




      699






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47243%2fremedies-to-cnn-lstm-overfitting-on-relatively-small-image-dataset%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47243%2fremedies-to-cnn-lstm-overfitting-on-relatively-small-image-dataset%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Tabula Rosettana

          Aureus (color)