Why does the classic Neural Network perform better than LSTM in Sentiment Analysis












0












$begingroup$


My goal is to predict the polarity of some reviews (negative, positive or neutral). I tried two different neural networks:



    left_branch = Input((7000, ))
left_branch_dense = Dense(512, activation = 'relu')(left_branch)

right_branch = Input((14012, ))
right_branch_dense = Dense(512, activation = 'relu')(right_branch)
merged = Concatenate()([left_branch_dense, right_branch_dense])
output_layer = Dense(3, activation = 'softmax')(merged)

model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit([np.array(review_matrix), np.array(X_train)], labels,epochs=2, verbose=1)
model.save('model.merged')

#############################################################################


#############################################################################

#We will try to merge two different models in a different way: Accuracy: 70

# Prepare the review column for embedding:
review_matrix_for_embedding = prepare_for_encoding(train_set[4].tolist(), 7000) # Shape: (1503,100)

second_matrix = np.array(pd.concat([onehot_category, aspect_matrix],axis=1))


left_branch = Input(shape=(100,), dtype='int32')
# input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
# input_length: Size of the array
left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
lstm_out = LSTM(256)(left_branch_embedding)
lstm_out = Dropout(0.7)(lstm_out)
lstm_out = Dense(128, activation='sigmoid')(lstm_out)

right_branch = Input((7012, ))
merged = Concatenate()([lstm_out, right_branch])
output_layer = Dense(3, activation = 'softmax')(merged)

model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit([review_matrix_for_embedding, second_matrix], labels,epochs=5, verbose=1)


The first one does 80% accuracy while the second one does 70%, with embedding vectors and LSTM layer. How is it possible? Is there anything wrong in my architecture?










share|improve this question









New contributor




nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$


    My goal is to predict the polarity of some reviews (negative, positive or neutral). I tried two different neural networks:



        left_branch = Input((7000, ))
    left_branch_dense = Dense(512, activation = 'relu')(left_branch)

    right_branch = Input((14012, ))
    right_branch_dense = Dense(512, activation = 'relu')(right_branch)
    merged = Concatenate()([left_branch_dense, right_branch_dense])
    output_layer = Dense(3, activation = 'softmax')(merged)

    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit([np.array(review_matrix), np.array(X_train)], labels,epochs=2, verbose=1)
    model.save('model.merged')

    #############################################################################


    #############################################################################

    #We will try to merge two different models in a different way: Accuracy: 70

    # Prepare the review column for embedding:
    review_matrix_for_embedding = prepare_for_encoding(train_set[4].tolist(), 7000) # Shape: (1503,100)

    second_matrix = np.array(pd.concat([onehot_category, aspect_matrix],axis=1))


    left_branch = Input(shape=(100,), dtype='int32')
    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
    # input_length: Size of the array
    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
    lstm_out = LSTM(256)(left_branch_embedding)
    lstm_out = Dropout(0.7)(lstm_out)
    lstm_out = Dense(128, activation='sigmoid')(lstm_out)

    right_branch = Input((7012, ))
    merged = Concatenate()([lstm_out, right_branch])
    output_layer = Dense(3, activation = 'softmax')(merged)

    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit([review_matrix_for_embedding, second_matrix], labels,epochs=5, verbose=1)


    The first one does 80% accuracy while the second one does 70%, with embedding vectors and LSTM layer. How is it possible? Is there anything wrong in my architecture?










    share|improve this question









    New contributor




    nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0


      1



      $begingroup$


      My goal is to predict the polarity of some reviews (negative, positive or neutral). I tried two different neural networks:



          left_branch = Input((7000, ))
      left_branch_dense = Dense(512, activation = 'relu')(left_branch)

      right_branch = Input((14012, ))
      right_branch_dense = Dense(512, activation = 'relu')(right_branch)
      merged = Concatenate()([left_branch_dense, right_branch_dense])
      output_layer = Dense(3, activation = 'softmax')(merged)

      model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
      model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
      model.fit([np.array(review_matrix), np.array(X_train)], labels,epochs=2, verbose=1)
      model.save('model.merged')

      #############################################################################


      #############################################################################

      #We will try to merge two different models in a different way: Accuracy: 70

      # Prepare the review column for embedding:
      review_matrix_for_embedding = prepare_for_encoding(train_set[4].tolist(), 7000) # Shape: (1503,100)

      second_matrix = np.array(pd.concat([onehot_category, aspect_matrix],axis=1))


      left_branch = Input(shape=(100,), dtype='int32')
      # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
      # input_length: Size of the array
      left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
      lstm_out = LSTM(256)(left_branch_embedding)
      lstm_out = Dropout(0.7)(lstm_out)
      lstm_out = Dense(128, activation='sigmoid')(lstm_out)

      right_branch = Input((7012, ))
      merged = Concatenate()([lstm_out, right_branch])
      output_layer = Dense(3, activation = 'softmax')(merged)

      model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
      model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
      model.fit([review_matrix_for_embedding, second_matrix], labels,epochs=5, verbose=1)


      The first one does 80% accuracy while the second one does 70%, with embedding vectors and LSTM layer. How is it possible? Is there anything wrong in my architecture?










      share|improve this question









      New contributor




      nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      My goal is to predict the polarity of some reviews (negative, positive or neutral). I tried two different neural networks:



          left_branch = Input((7000, ))
      left_branch_dense = Dense(512, activation = 'relu')(left_branch)

      right_branch = Input((14012, ))
      right_branch_dense = Dense(512, activation = 'relu')(right_branch)
      merged = Concatenate()([left_branch_dense, right_branch_dense])
      output_layer = Dense(3, activation = 'softmax')(merged)

      model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
      model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
      model.fit([np.array(review_matrix), np.array(X_train)], labels,epochs=2, verbose=1)
      model.save('model.merged')

      #############################################################################


      #############################################################################

      #We will try to merge two different models in a different way: Accuracy: 70

      # Prepare the review column for embedding:
      review_matrix_for_embedding = prepare_for_encoding(train_set[4].tolist(), 7000) # Shape: (1503,100)

      second_matrix = np.array(pd.concat([onehot_category, aspect_matrix],axis=1))


      left_branch = Input(shape=(100,), dtype='int32')
      # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
      # input_length: Size of the array
      left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
      lstm_out = LSTM(256)(left_branch_embedding)
      lstm_out = Dropout(0.7)(lstm_out)
      lstm_out = Dense(128, activation='sigmoid')(lstm_out)

      right_branch = Input((7012, ))
      merged = Concatenate()([lstm_out, right_branch])
      output_layer = Dense(3, activation = 'softmax')(merged)

      model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
      model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
      model.fit([review_matrix_for_embedding, second_matrix], labels,epochs=5, verbose=1)


      The first one does 80% accuracy while the second one does 70%, with embedding vectors and LSTM layer. How is it possible? Is there anything wrong in my architecture?







      keras nlp






      share|improve this question









      New contributor




      nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 days ago









      Nischal Hp

      48829




      48829






      New contributor




      nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      nolw38nolw38

      62




      62




      New contributor




      nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          0












          $begingroup$

          First of all, I have noticed that you have used sigmoid activation function for your LSTM Dense Layer and in your ANN you used relu, maybe, MAYBE, this can be a reason for your lower performance. That is could be happening because sigmoid functions suffer from two problems:
          - Saturation of gradients: sigmoid functions have tail distributions, meaning that they saturate in this 'flat' regions practically diminishing the gradient to zero and affecting backpropagation/training process.
          - Sigmoid outputs are not zero-centered: This is an issue due the gradient calculation during backpropagation. Either having all enters positivo or negative will add a 'zigzag' effect difficulting the training process.



          My comments above are taken from this excellent tutorial that you should read: http://cs231n.github.io/neural-networks-1/
          I tried to summarize it but they did a master work and I think you should read it.



          Second, we must consider other factors such as: are you analyzing your train/test/val losses? Maybe your LSTM networks just takes longer to train and reach its minimum. You need to work a little more on these parameters before taking any conclusions. Plot a graph showing your training and validation loss so we can assess if your model is underfitting.



          Lastly, I have a question for you: Why should your LSTM network perform better than a simple ANN? Although LSTM + Embeddings are powerful techniques that have gained attention in a lot of fields, essentially NLP, that will be not every task that they beat classical approaches. I myself have tried with different data sets, and depending on the application, simple ML algorithms such as SVM would easily beat the more complex ones, including sentiment analysis.



          So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.






          share|improve this answer









          $endgroup$





















            0












            $begingroup$

            Thank you for your answer.
            I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!



            Here are pictures of training for both the one who does 79% accuracy (merged two classic neural networks), and the one in which I do 70: (of course I don't talk about the training accuracy but the test accuracy).
            I see a difference in the loss value, but I don't know how to interpret it? Does this mean that for the less good architecture, I don't reach a minimum? If yes, what can I modify in my NN ?



            Thank you for the time you took to answer!!



            70



            79



            Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?



                left_branch = Input(shape=(100,), dtype='int32')
            # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
            # input_length: Size of the array
            left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
            lstm_out = LSTM(256)(left_branch_embedding)
            lstm_out = Dropout(0.7)(lstm_out)
            lstm_out = Dense(128, activation='sigmoid')(lstm_out)

            right_branch = Input((7012, ))
            merged = Concatenate()([lstm_out, right_branch])
            output_layer = Dense(3, activation = 'softmax')(merged)

            model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
            model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])





            share|improve this answer








            New contributor




            nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "557"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });






              nolw38 is a new contributor. Be nice, and check out our Code of Conduct.










              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47094%2fwhy-does-the-classic-neural-network-perform-better-than-lstm-in-sentiment-analys%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0












              $begingroup$

              First of all, I have noticed that you have used sigmoid activation function for your LSTM Dense Layer and in your ANN you used relu, maybe, MAYBE, this can be a reason for your lower performance. That is could be happening because sigmoid functions suffer from two problems:
              - Saturation of gradients: sigmoid functions have tail distributions, meaning that they saturate in this 'flat' regions practically diminishing the gradient to zero and affecting backpropagation/training process.
              - Sigmoid outputs are not zero-centered: This is an issue due the gradient calculation during backpropagation. Either having all enters positivo or negative will add a 'zigzag' effect difficulting the training process.



              My comments above are taken from this excellent tutorial that you should read: http://cs231n.github.io/neural-networks-1/
              I tried to summarize it but they did a master work and I think you should read it.



              Second, we must consider other factors such as: are you analyzing your train/test/val losses? Maybe your LSTM networks just takes longer to train and reach its minimum. You need to work a little more on these parameters before taking any conclusions. Plot a graph showing your training and validation loss so we can assess if your model is underfitting.



              Lastly, I have a question for you: Why should your LSTM network perform better than a simple ANN? Although LSTM + Embeddings are powerful techniques that have gained attention in a lot of fields, essentially NLP, that will be not every task that they beat classical approaches. I myself have tried with different data sets, and depending on the application, simple ML algorithms such as SVM would easily beat the more complex ones, including sentiment analysis.



              So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.






              share|improve this answer









              $endgroup$


















                0












                $begingroup$

                First of all, I have noticed that you have used sigmoid activation function for your LSTM Dense Layer and in your ANN you used relu, maybe, MAYBE, this can be a reason for your lower performance. That is could be happening because sigmoid functions suffer from two problems:
                - Saturation of gradients: sigmoid functions have tail distributions, meaning that they saturate in this 'flat' regions practically diminishing the gradient to zero and affecting backpropagation/training process.
                - Sigmoid outputs are not zero-centered: This is an issue due the gradient calculation during backpropagation. Either having all enters positivo or negative will add a 'zigzag' effect difficulting the training process.



                My comments above are taken from this excellent tutorial that you should read: http://cs231n.github.io/neural-networks-1/
                I tried to summarize it but they did a master work and I think you should read it.



                Second, we must consider other factors such as: are you analyzing your train/test/val losses? Maybe your LSTM networks just takes longer to train and reach its minimum. You need to work a little more on these parameters before taking any conclusions. Plot a graph showing your training and validation loss so we can assess if your model is underfitting.



                Lastly, I have a question for you: Why should your LSTM network perform better than a simple ANN? Although LSTM + Embeddings are powerful techniques that have gained attention in a lot of fields, essentially NLP, that will be not every task that they beat classical approaches. I myself have tried with different data sets, and depending on the application, simple ML algorithms such as SVM would easily beat the more complex ones, including sentiment analysis.



                So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.






                share|improve this answer









                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  First of all, I have noticed that you have used sigmoid activation function for your LSTM Dense Layer and in your ANN you used relu, maybe, MAYBE, this can be a reason for your lower performance. That is could be happening because sigmoid functions suffer from two problems:
                  - Saturation of gradients: sigmoid functions have tail distributions, meaning that they saturate in this 'flat' regions practically diminishing the gradient to zero and affecting backpropagation/training process.
                  - Sigmoid outputs are not zero-centered: This is an issue due the gradient calculation during backpropagation. Either having all enters positivo or negative will add a 'zigzag' effect difficulting the training process.



                  My comments above are taken from this excellent tutorial that you should read: http://cs231n.github.io/neural-networks-1/
                  I tried to summarize it but they did a master work and I think you should read it.



                  Second, we must consider other factors such as: are you analyzing your train/test/val losses? Maybe your LSTM networks just takes longer to train and reach its minimum. You need to work a little more on these parameters before taking any conclusions. Plot a graph showing your training and validation loss so we can assess if your model is underfitting.



                  Lastly, I have a question for you: Why should your LSTM network perform better than a simple ANN? Although LSTM + Embeddings are powerful techniques that have gained attention in a lot of fields, essentially NLP, that will be not every task that they beat classical approaches. I myself have tried with different data sets, and depending on the application, simple ML algorithms such as SVM would easily beat the more complex ones, including sentiment analysis.



                  So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.






                  share|improve this answer









                  $endgroup$



                  First of all, I have noticed that you have used sigmoid activation function for your LSTM Dense Layer and in your ANN you used relu, maybe, MAYBE, this can be a reason for your lower performance. That is could be happening because sigmoid functions suffer from two problems:
                  - Saturation of gradients: sigmoid functions have tail distributions, meaning that they saturate in this 'flat' regions practically diminishing the gradient to zero and affecting backpropagation/training process.
                  - Sigmoid outputs are not zero-centered: This is an issue due the gradient calculation during backpropagation. Either having all enters positivo or negative will add a 'zigzag' effect difficulting the training process.



                  My comments above are taken from this excellent tutorial that you should read: http://cs231n.github.io/neural-networks-1/
                  I tried to summarize it but they did a master work and I think you should read it.



                  Second, we must consider other factors such as: are you analyzing your train/test/val losses? Maybe your LSTM networks just takes longer to train and reach its minimum. You need to work a little more on these parameters before taking any conclusions. Plot a graph showing your training and validation loss so we can assess if your model is underfitting.



                  Lastly, I have a question for you: Why should your LSTM network perform better than a simple ANN? Although LSTM + Embeddings are powerful techniques that have gained attention in a lot of fields, essentially NLP, that will be not every task that they beat classical approaches. I myself have tried with different data sets, and depending on the application, simple ML algorithms such as SVM would easily beat the more complex ones, including sentiment analysis.



                  So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 days ago









                  Victor OliveiraVictor Oliveira

                  3057




                  3057























                      0












                      $begingroup$

                      Thank you for your answer.
                      I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!



                      Here are pictures of training for both the one who does 79% accuracy (merged two classic neural networks), and the one in which I do 70: (of course I don't talk about the training accuracy but the test accuracy).
                      I see a difference in the loss value, but I don't know how to interpret it? Does this mean that for the less good architecture, I don't reach a minimum? If yes, what can I modify in my NN ?



                      Thank you for the time you took to answer!!



                      70



                      79



                      Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?



                          left_branch = Input(shape=(100,), dtype='int32')
                      # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
                      # input_length: Size of the array
                      left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
                      lstm_out = LSTM(256)(left_branch_embedding)
                      lstm_out = Dropout(0.7)(lstm_out)
                      lstm_out = Dense(128, activation='sigmoid')(lstm_out)

                      right_branch = Input((7012, ))
                      merged = Concatenate()([lstm_out, right_branch])
                      output_layer = Dense(3, activation = 'softmax')(merged)

                      model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
                      model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])





                      share|improve this answer








                      New contributor




                      nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$


















                        0












                        $begingroup$

                        Thank you for your answer.
                        I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!



                        Here are pictures of training for both the one who does 79% accuracy (merged two classic neural networks), and the one in which I do 70: (of course I don't talk about the training accuracy but the test accuracy).
                        I see a difference in the loss value, but I don't know how to interpret it? Does this mean that for the less good architecture, I don't reach a minimum? If yes, what can I modify in my NN ?



                        Thank you for the time you took to answer!!



                        70



                        79



                        Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?



                            left_branch = Input(shape=(100,), dtype='int32')
                        # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
                        # input_length: Size of the array
                        left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
                        lstm_out = LSTM(256)(left_branch_embedding)
                        lstm_out = Dropout(0.7)(lstm_out)
                        lstm_out = Dense(128, activation='sigmoid')(lstm_out)

                        right_branch = Input((7012, ))
                        merged = Concatenate()([lstm_out, right_branch])
                        output_layer = Dense(3, activation = 'softmax')(merged)

                        model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
                        model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])





                        share|improve this answer








                        New contributor




                        nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$
















                          0












                          0








                          0





                          $begingroup$

                          Thank you for your answer.
                          I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!



                          Here are pictures of training for both the one who does 79% accuracy (merged two classic neural networks), and the one in which I do 70: (of course I don't talk about the training accuracy but the test accuracy).
                          I see a difference in the loss value, but I don't know how to interpret it? Does this mean that for the less good architecture, I don't reach a minimum? If yes, what can I modify in my NN ?



                          Thank you for the time you took to answer!!



                          70



                          79



                          Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?



                              left_branch = Input(shape=(100,), dtype='int32')
                          # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
                          # input_length: Size of the array
                          left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
                          lstm_out = LSTM(256)(left_branch_embedding)
                          lstm_out = Dropout(0.7)(lstm_out)
                          lstm_out = Dense(128, activation='sigmoid')(lstm_out)

                          right_branch = Input((7012, ))
                          merged = Concatenate()([lstm_out, right_branch])
                          output_layer = Dense(3, activation = 'softmax')(merged)

                          model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
                          model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])





                          share|improve this answer








                          New contributor




                          nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






                          $endgroup$



                          Thank you for your answer.
                          I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!



                          Here are pictures of training for both the one who does 79% accuracy (merged two classic neural networks), and the one in which I do 70: (of course I don't talk about the training accuracy but the test accuracy).
                          I see a difference in the loss value, but I don't know how to interpret it? Does this mean that for the less good architecture, I don't reach a minimum? If yes, what can I modify in my NN ?



                          Thank you for the time you took to answer!!



                          70



                          79



                          Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?



                              left_branch = Input(shape=(100,), dtype='int32')
                          # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector;
                          # input_length: Size of the array
                          left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)
                          lstm_out = LSTM(256)(left_branch_embedding)
                          lstm_out = Dropout(0.7)(lstm_out)
                          lstm_out = Dense(128, activation='sigmoid')(lstm_out)

                          right_branch = Input((7012, ))
                          merged = Concatenate()([lstm_out, right_branch])
                          output_layer = Dense(3, activation = 'softmax')(merged)

                          model = Model(inputs=[left_branch, right_branch], outputs=output_layer)
                          model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])






                          share|improve this answer








                          New contributor




                          nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.









                          share|improve this answer



                          share|improve this answer






                          New contributor




                          nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.









                          answered yesterday









                          nolw38nolw38

                          62




                          62




                          New contributor




                          nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.





                          New contributor





                          nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






                          nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






















                              nolw38 is a new contributor. Be nice, and check out our Code of Conduct.










                              draft saved

                              draft discarded


















                              nolw38 is a new contributor. Be nice, and check out our Code of Conduct.













                              nolw38 is a new contributor. Be nice, and check out our Code of Conduct.












                              nolw38 is a new contributor. Be nice, and check out our Code of Conduct.
















                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47094%2fwhy-does-the-classic-neural-network-perform-better-than-lstm-in-sentiment-analys%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              How to label and detect the document text images

                              Tabula Rosettana

                              Aureus (color)