Keras, DNN ending with sigmoid - model.predict produces values < 0.5. This indicates…?












2












$begingroup$


I'm trying a simple Keras project with Dense layers for binary classification. About 300000 rows of data, labels are



training_set['TARGET'].value_counts()    
0 282686
1 24825


My model looks like this



def build_model():
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
input_shape=(train_data.shape[1],)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])

return model


So it's binary classification that ends with a sigmoid. It's my understanding that I should get values close to 0 or close to 1? I've tried different model architectures, hyperparameters, epochs, batch sizes, etc. but when I run model.predict on my validation set my values never get above 0.5. Here are some samples.



20 epochs, 16384 batch size
max 0.458850622177124, min 0.1022530049085617
max 0.47131556272506714, min 0.057787925004959106

20 epochs, 8192 batch size
max 0.42957592010498047, min 0.060324762016534805
max 0.3811708390712738, min 0.022215187549591064

20 epochs, 4096 batch size
max 0.3163970410823822, min 0.0657803937792778

20 epochs, 2048 batch size
max 0.21799422800540924, min 0.03832605481147766


Is this an indication that I'm doing something wrong?



Training and validation loss
enter image description here










share|improve this question











$endgroup$




bumped to the homepage by Community 19 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.




















    2












    $begingroup$


    I'm trying a simple Keras project with Dense layers for binary classification. About 300000 rows of data, labels are



    training_set['TARGET'].value_counts()    
    0 282686
    1 24825


    My model looks like this



    def build_model():
    model = models.Sequential()
    model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
    input_shape=(train_data.shape[1],)))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(1, activation='sigmoid'))
    model.compile(optimizer='rmsprop',
    loss='binary_crossentropy',
    metrics=['accuracy'])

    return model


    So it's binary classification that ends with a sigmoid. It's my understanding that I should get values close to 0 or close to 1? I've tried different model architectures, hyperparameters, epochs, batch sizes, etc. but when I run model.predict on my validation set my values never get above 0.5. Here are some samples.



    20 epochs, 16384 batch size
    max 0.458850622177124, min 0.1022530049085617
    max 0.47131556272506714, min 0.057787925004959106

    20 epochs, 8192 batch size
    max 0.42957592010498047, min 0.060324762016534805
    max 0.3811708390712738, min 0.022215187549591064

    20 epochs, 4096 batch size
    max 0.3163970410823822, min 0.0657803937792778

    20 epochs, 2048 batch size
    max 0.21799422800540924, min 0.03832605481147766


    Is this an indication that I'm doing something wrong?



    Training and validation loss
    enter image description here










    share|improve this question











    $endgroup$




    bumped to the homepage by Community 19 hours ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.


















      2












      2








      2





      $begingroup$


      I'm trying a simple Keras project with Dense layers for binary classification. About 300000 rows of data, labels are



      training_set['TARGET'].value_counts()    
      0 282686
      1 24825


      My model looks like this



      def build_model():
      model = models.Sequential()
      model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
      input_shape=(train_data.shape[1],)))
      model.add(layers.Dropout(0.5))
      model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
      model.add(layers.Dropout(0.5))
      model.add(layers.Dense(1, activation='sigmoid'))
      model.compile(optimizer='rmsprop',
      loss='binary_crossentropy',
      metrics=['accuracy'])

      return model


      So it's binary classification that ends with a sigmoid. It's my understanding that I should get values close to 0 or close to 1? I've tried different model architectures, hyperparameters, epochs, batch sizes, etc. but when I run model.predict on my validation set my values never get above 0.5. Here are some samples.



      20 epochs, 16384 batch size
      max 0.458850622177124, min 0.1022530049085617
      max 0.47131556272506714, min 0.057787925004959106

      20 epochs, 8192 batch size
      max 0.42957592010498047, min 0.060324762016534805
      max 0.3811708390712738, min 0.022215187549591064

      20 epochs, 4096 batch size
      max 0.3163970410823822, min 0.0657803937792778

      20 epochs, 2048 batch size
      max 0.21799422800540924, min 0.03832605481147766


      Is this an indication that I'm doing something wrong?



      Training and validation loss
      enter image description here










      share|improve this question











      $endgroup$




      I'm trying a simple Keras project with Dense layers for binary classification. About 300000 rows of data, labels are



      training_set['TARGET'].value_counts()    
      0 282686
      1 24825


      My model looks like this



      def build_model():
      model = models.Sequential()
      model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
      input_shape=(train_data.shape[1],)))
      model.add(layers.Dropout(0.5))
      model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
      model.add(layers.Dropout(0.5))
      model.add(layers.Dense(1, activation='sigmoid'))
      model.compile(optimizer='rmsprop',
      loss='binary_crossentropy',
      metrics=['accuracy'])

      return model


      So it's binary classification that ends with a sigmoid. It's my understanding that I should get values close to 0 or close to 1? I've tried different model architectures, hyperparameters, epochs, batch sizes, etc. but when I run model.predict on my validation set my values never get above 0.5. Here are some samples.



      20 epochs, 16384 batch size
      max 0.458850622177124, min 0.1022530049085617
      max 0.47131556272506714, min 0.057787925004959106

      20 epochs, 8192 batch size
      max 0.42957592010498047, min 0.060324762016534805
      max 0.3811708390712738, min 0.022215187549591064

      20 epochs, 4096 batch size
      max 0.3163970410823822, min 0.0657803937792778

      20 epochs, 2048 batch size
      max 0.21799422800540924, min 0.03832605481147766


      Is this an indication that I'm doing something wrong?



      Training and validation loss
      enter image description here







      keras






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Sep 16 '18 at 1:43







      rr_cook

















      asked Sep 16 '18 at 0:58









      rr_cookrr_cook

      112




      112





      bumped to the homepage by Community 19 hours ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 19 hours ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
























          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          I think the dropout is a bit high and if it's a binary classification, then why in the end a single node?



          Make sure your target variable is having proper shape in case of softmax...(one hot/ to_categorical())



          def build_model():
          model = models.Sequential()
          model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
          input_shape=(train_data.shape[1],)))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(**num_classes**, activation='softmax'))
          model.compile(optimizer='rmsprop',
          loss='binary_crossentropy',
          metrics=['accuracy'])

          return model


          To improve it further, you need to use some techniques, such as cross-validation, batch normalization, and increase the epochs(maybe).






          share|improve this answer











          $endgroup$













          • $begingroup$
            Short answer, I end with a single node because this is basically my first deep learning project and Chollet's "Deep Learning with Python" ended its binary classification project with that layer. I thought binary classifications were one node with sigmoid, and multiclass was n nodes with softmax?
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:33












          • $begingroup$
            The quick Dropout change didn't make a difference. I added a typical training and validation loss pic in the hopes that it offers a clue. I'll try the other suggestions, thanks.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:44












          • $begingroup$
            How are you passing your inputs/any preprocessing then? Add that and the compiling step.. Will help others to debug easily as I don't think there's anything wrong with the model building process except that I prefer softmax one as it lets me know model's behaviour...
            $endgroup$
            – Aditya
            Sep 16 '18 at 1:57












          • $begingroup$
            Numeric columns are being subtracted by the mean and divided by the std. Categorical columns (M/F etc.) are being one-hot encoded using pd.get_dummies.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 2:04










          • $begingroup$
            Drop a sample of your data if it's too big! I will try experiments!
            $endgroup$
            – Aditya
            Sep 17 '18 at 3:51












          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38311%2fkeras-dnn-ending-with-sigmoid-model-predict-produces-values-0-5-this-indic%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          I think the dropout is a bit high and if it's a binary classification, then why in the end a single node?



          Make sure your target variable is having proper shape in case of softmax...(one hot/ to_categorical())



          def build_model():
          model = models.Sequential()
          model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
          input_shape=(train_data.shape[1],)))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(**num_classes**, activation='softmax'))
          model.compile(optimizer='rmsprop',
          loss='binary_crossentropy',
          metrics=['accuracy'])

          return model


          To improve it further, you need to use some techniques, such as cross-validation, batch normalization, and increase the epochs(maybe).






          share|improve this answer











          $endgroup$













          • $begingroup$
            Short answer, I end with a single node because this is basically my first deep learning project and Chollet's "Deep Learning with Python" ended its binary classification project with that layer. I thought binary classifications were one node with sigmoid, and multiclass was n nodes with softmax?
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:33












          • $begingroup$
            The quick Dropout change didn't make a difference. I added a typical training and validation loss pic in the hopes that it offers a clue. I'll try the other suggestions, thanks.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:44












          • $begingroup$
            How are you passing your inputs/any preprocessing then? Add that and the compiling step.. Will help others to debug easily as I don't think there's anything wrong with the model building process except that I prefer softmax one as it lets me know model's behaviour...
            $endgroup$
            – Aditya
            Sep 16 '18 at 1:57












          • $begingroup$
            Numeric columns are being subtracted by the mean and divided by the std. Categorical columns (M/F etc.) are being one-hot encoded using pd.get_dummies.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 2:04










          • $begingroup$
            Drop a sample of your data if it's too big! I will try experiments!
            $endgroup$
            – Aditya
            Sep 17 '18 at 3:51
















          0












          $begingroup$

          I think the dropout is a bit high and if it's a binary classification, then why in the end a single node?



          Make sure your target variable is having proper shape in case of softmax...(one hot/ to_categorical())



          def build_model():
          model = models.Sequential()
          model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
          input_shape=(train_data.shape[1],)))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(**num_classes**, activation='softmax'))
          model.compile(optimizer='rmsprop',
          loss='binary_crossentropy',
          metrics=['accuracy'])

          return model


          To improve it further, you need to use some techniques, such as cross-validation, batch normalization, and increase the epochs(maybe).






          share|improve this answer











          $endgroup$













          • $begingroup$
            Short answer, I end with a single node because this is basically my first deep learning project and Chollet's "Deep Learning with Python" ended its binary classification project with that layer. I thought binary classifications were one node with sigmoid, and multiclass was n nodes with softmax?
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:33












          • $begingroup$
            The quick Dropout change didn't make a difference. I added a typical training and validation loss pic in the hopes that it offers a clue. I'll try the other suggestions, thanks.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:44












          • $begingroup$
            How are you passing your inputs/any preprocessing then? Add that and the compiling step.. Will help others to debug easily as I don't think there's anything wrong with the model building process except that I prefer softmax one as it lets me know model's behaviour...
            $endgroup$
            – Aditya
            Sep 16 '18 at 1:57












          • $begingroup$
            Numeric columns are being subtracted by the mean and divided by the std. Categorical columns (M/F etc.) are being one-hot encoded using pd.get_dummies.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 2:04










          • $begingroup$
            Drop a sample of your data if it's too big! I will try experiments!
            $endgroup$
            – Aditya
            Sep 17 '18 at 3:51














          0












          0








          0





          $begingroup$

          I think the dropout is a bit high and if it's a binary classification, then why in the end a single node?



          Make sure your target variable is having proper shape in case of softmax...(one hot/ to_categorical())



          def build_model():
          model = models.Sequential()
          model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
          input_shape=(train_data.shape[1],)))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(**num_classes**, activation='softmax'))
          model.compile(optimizer='rmsprop',
          loss='binary_crossentropy',
          metrics=['accuracy'])

          return model


          To improve it further, you need to use some techniques, such as cross-validation, batch normalization, and increase the epochs(maybe).






          share|improve this answer











          $endgroup$



          I think the dropout is a bit high and if it's a binary classification, then why in the end a single node?



          Make sure your target variable is having proper shape in case of softmax...(one hot/ to_categorical())



          def build_model():
          model = models.Sequential()
          model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001),
          input_shape=(train_data.shape[1],)))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(32, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
          model.add(layers.Dropout(0.3))
          model.add(layers.Dense(**num_classes**, activation='softmax'))
          model.compile(optimizer='rmsprop',
          loss='binary_crossentropy',
          metrics=['accuracy'])

          return model


          To improve it further, you need to use some techniques, such as cross-validation, batch normalization, and increase the epochs(maybe).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Sep 16 '18 at 1:51

























          answered Sep 16 '18 at 1:18









          AdityaAditya

          1,4341626




          1,4341626












          • $begingroup$
            Short answer, I end with a single node because this is basically my first deep learning project and Chollet's "Deep Learning with Python" ended its binary classification project with that layer. I thought binary classifications were one node with sigmoid, and multiclass was n nodes with softmax?
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:33












          • $begingroup$
            The quick Dropout change didn't make a difference. I added a typical training and validation loss pic in the hopes that it offers a clue. I'll try the other suggestions, thanks.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:44












          • $begingroup$
            How are you passing your inputs/any preprocessing then? Add that and the compiling step.. Will help others to debug easily as I don't think there's anything wrong with the model building process except that I prefer softmax one as it lets me know model's behaviour...
            $endgroup$
            – Aditya
            Sep 16 '18 at 1:57












          • $begingroup$
            Numeric columns are being subtracted by the mean and divided by the std. Categorical columns (M/F etc.) are being one-hot encoded using pd.get_dummies.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 2:04










          • $begingroup$
            Drop a sample of your data if it's too big! I will try experiments!
            $endgroup$
            – Aditya
            Sep 17 '18 at 3:51


















          • $begingroup$
            Short answer, I end with a single node because this is basically my first deep learning project and Chollet's "Deep Learning with Python" ended its binary classification project with that layer. I thought binary classifications were one node with sigmoid, and multiclass was n nodes with softmax?
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:33












          • $begingroup$
            The quick Dropout change didn't make a difference. I added a typical training and validation loss pic in the hopes that it offers a clue. I'll try the other suggestions, thanks.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 1:44












          • $begingroup$
            How are you passing your inputs/any preprocessing then? Add that and the compiling step.. Will help others to debug easily as I don't think there's anything wrong with the model building process except that I prefer softmax one as it lets me know model's behaviour...
            $endgroup$
            – Aditya
            Sep 16 '18 at 1:57












          • $begingroup$
            Numeric columns are being subtracted by the mean and divided by the std. Categorical columns (M/F etc.) are being one-hot encoded using pd.get_dummies.
            $endgroup$
            – rr_cook
            Sep 16 '18 at 2:04










          • $begingroup$
            Drop a sample of your data if it's too big! I will try experiments!
            $endgroup$
            – Aditya
            Sep 17 '18 at 3:51
















          $begingroup$
          Short answer, I end with a single node because this is basically my first deep learning project and Chollet's "Deep Learning with Python" ended its binary classification project with that layer. I thought binary classifications were one node with sigmoid, and multiclass was n nodes with softmax?
          $endgroup$
          – rr_cook
          Sep 16 '18 at 1:33






          $begingroup$
          Short answer, I end with a single node because this is basically my first deep learning project and Chollet's "Deep Learning with Python" ended its binary classification project with that layer. I thought binary classifications were one node with sigmoid, and multiclass was n nodes with softmax?
          $endgroup$
          – rr_cook
          Sep 16 '18 at 1:33














          $begingroup$
          The quick Dropout change didn't make a difference. I added a typical training and validation loss pic in the hopes that it offers a clue. I'll try the other suggestions, thanks.
          $endgroup$
          – rr_cook
          Sep 16 '18 at 1:44






          $begingroup$
          The quick Dropout change didn't make a difference. I added a typical training and validation loss pic in the hopes that it offers a clue. I'll try the other suggestions, thanks.
          $endgroup$
          – rr_cook
          Sep 16 '18 at 1:44














          $begingroup$
          How are you passing your inputs/any preprocessing then? Add that and the compiling step.. Will help others to debug easily as I don't think there's anything wrong with the model building process except that I prefer softmax one as it lets me know model's behaviour...
          $endgroup$
          – Aditya
          Sep 16 '18 at 1:57






          $begingroup$
          How are you passing your inputs/any preprocessing then? Add that and the compiling step.. Will help others to debug easily as I don't think there's anything wrong with the model building process except that I prefer softmax one as it lets me know model's behaviour...
          $endgroup$
          – Aditya
          Sep 16 '18 at 1:57














          $begingroup$
          Numeric columns are being subtracted by the mean and divided by the std. Categorical columns (M/F etc.) are being one-hot encoded using pd.get_dummies.
          $endgroup$
          – rr_cook
          Sep 16 '18 at 2:04




          $begingroup$
          Numeric columns are being subtracted by the mean and divided by the std. Categorical columns (M/F etc.) are being one-hot encoded using pd.get_dummies.
          $endgroup$
          – rr_cook
          Sep 16 '18 at 2:04












          $begingroup$
          Drop a sample of your data if it's too big! I will try experiments!
          $endgroup$
          – Aditya
          Sep 17 '18 at 3:51




          $begingroup$
          Drop a sample of your data if it's too big! I will try experiments!
          $endgroup$
          – Aditya
          Sep 17 '18 at 3:51


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38311%2fkeras-dnn-ending-with-sigmoid-model-predict-produces-values-0-5-this-indic%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Vallis Paradisi

          Tabula Rosettana