Why does the classic Neural Network perform better than LSTM in Sentiment Analysis

My goal is to predict the polarity of some reviews (negative, positive or neutral). I tried two different neural networks:

    left_branch = Input((7000, ))

    left_branch_dense = Dense(512, activation = 'relu')(left_branch)



    right_branch = Input((14012, ))

    right_branch_dense = Dense(512, activation = 'relu')(right_branch)

    merged = Concatenate()([left_branch_dense, right_branch_dense])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit([np.array(review_matrix), np.array(X_train)], labels,epochs=2, verbose=1)

    model.save('model.merged') 



    #############################################################################





    #############################################################################



    #We will try to merge two different models in a different way: Accuracy: 70



    # Prepare the review column for embedding: 

    review_matrix_for_embedding = prepare_for_encoding(train_set[4].tolist(), 7000) # Shape: (1503,100)        



    second_matrix = np.array(pd.concat([onehot_category, aspect_matrix],axis=1))





    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit([review_matrix_for_embedding, second_matrix], labels,epochs=5, verbose=1)

The first one does 80% accuracy while the second one does 70%, with embedding vectors and LSTM layer. How is it possible? Is there anything wrong in my architecture?

edited 2 days ago

Nischal Hp

48829

asked 2 days ago

nolw38

New contributor

add a comment |

My goal is to predict the polarity of some reviews (negative, positive or neutral). I tried two different neural networks:

    left_branch = Input((7000, ))

    left_branch_dense = Dense(512, activation = 'relu')(left_branch)



    right_branch = Input((14012, ))

    right_branch_dense = Dense(512, activation = 'relu')(right_branch)

    merged = Concatenate()([left_branch_dense, right_branch_dense])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit([np.array(review_matrix), np.array(X_train)], labels,epochs=2, verbose=1)

    model.save('model.merged') 



    #############################################################################





    #############################################################################



    #We will try to merge two different models in a different way: Accuracy: 70



    # Prepare the review column for embedding: 

    review_matrix_for_embedding = prepare_for_encoding(train_set[4].tolist(), 7000) # Shape: (1503,100)        



    second_matrix = np.array(pd.concat([onehot_category, aspect_matrix],axis=1))





    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit([review_matrix_for_embedding, second_matrix], labels,epochs=5, verbose=1)

The first one does 80% accuracy while the second one does 70%, with embedding vectors and LSTM layer. How is it possible? Is there anything wrong in my architecture?

edited 2 days ago

Nischal Hp

48829

asked 2 days ago

nolw38

New contributor

add a comment |

My goal is to predict the polarity of some reviews (negative, positive or neutral). I tried two different neural networks:

    left_branch = Input((7000, ))

    left_branch_dense = Dense(512, activation = 'relu')(left_branch)



    right_branch = Input((14012, ))

    right_branch_dense = Dense(512, activation = 'relu')(right_branch)

    merged = Concatenate()([left_branch_dense, right_branch_dense])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit([np.array(review_matrix), np.array(X_train)], labels,epochs=2, verbose=1)

    model.save('model.merged') 



    #############################################################################





    #############################################################################



    #We will try to merge two different models in a different way: Accuracy: 70



    # Prepare the review column for embedding: 

    review_matrix_for_embedding = prepare_for_encoding(train_set[4].tolist(), 7000) # Shape: (1503,100)        



    second_matrix = np.array(pd.concat([onehot_category, aspect_matrix],axis=1))





    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit([review_matrix_for_embedding, second_matrix], labels,epochs=5, verbose=1)

The first one does 80% accuracy while the second one does 70%, with embedding vectors and LSTM layer. How is it possible? Is there anything wrong in my architecture?

edited 2 days ago

Nischal Hp

48829

asked 2 days ago

nolw38

New contributor

My goal is to predict the polarity of some reviews (negative, positive or neutral). I tried two different neural networks:

    left_branch = Input((7000, ))

    left_branch_dense = Dense(512, activation = 'relu')(left_branch)



    right_branch = Input((14012, ))

    right_branch_dense = Dense(512, activation = 'relu')(right_branch)

    merged = Concatenate()([left_branch_dense, right_branch_dense])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit([np.array(review_matrix), np.array(X_train)], labels,epochs=2, verbose=1)

    model.save('model.merged') 



    #############################################################################





    #############################################################################



    #We will try to merge two different models in a different way: Accuracy: 70



    # Prepare the review column for embedding: 

    review_matrix_for_embedding = prepare_for_encoding(train_set[4].tolist(), 7000) # Shape: (1503,100)        



    second_matrix = np.array(pd.concat([onehot_category, aspect_matrix],axis=1))





    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit([review_matrix_for_embedding, second_matrix], labels,epochs=5, verbose=1)

The first one does 80% accuracy while the second one does 70%, with embedding vectors and LSTM layer. How is it possible? Is there anything wrong in my architecture?

keras nlp

edited 2 days ago

Nischal Hp

48829

asked 2 days ago

nolw38

New contributor

edited 2 days ago

Nischal Hp

48829

asked 2 days ago

nolw38

New contributor

edited 2 days ago

Nischal Hp

48829

edited 2 days ago

Nischal Hp

48829

edited 2 days ago

Nischal Hp

48829

asked 2 days ago

nolw38

New contributor

asked 2 days ago

nolw38

asked 2 days ago

nolw38

New contributor

nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

First of all, I have noticed that you have used sigmoid activation function for your LSTM Dense Layer and in your ANN you used relu, maybe, MAYBE, this can be a reason for your lower performance. That is could be happening because sigmoid functions suffer from two problems:
- Saturation of gradients: sigmoid functions have tail distributions, meaning that they saturate in this 'flat' regions practically diminishing the gradient to zero and affecting backpropagation/training process.
- Sigmoid outputs are not zero-centered: This is an issue due the gradient calculation during backpropagation. Either having all enters positivo or negative will add a 'zigzag' effect difficulting the training process.

My comments above are taken from this excellent tutorial that you should read: http://cs231n.github.io/neural-networks-1/
I tried to summarize it but they did a master work and I think you should read it.

Second, we must consider other factors such as: are you analyzing your train/test/val losses? Maybe your LSTM networks just takes longer to train and reach its minimum. You need to work a little more on these parameters before taking any conclusions. Plot a graph showing your training and validation loss so we can assess if your model is underfitting.

Lastly, I have a question for you: Why should your LSTM network perform better than a simple ANN? Although LSTM + Embeddings are powerful techniques that have gained attention in a lot of fields, essentially NLP, that will be not every task that they beat classical approaches. I myself have tried with different data sets, and depending on the application, simple ML algorithms such as SVM would easily beat the more complex ones, including sentiment analysis.

So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.

answered 2 days ago

Victor Oliveira

3057

add a comment |

Thank you for your answer.
I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!

Here are pictures of training for both the one who does 79% accuracy (merged two classic neural networks), and the one in which I do 70: (of course I don't talk about the training accuracy but the test accuracy).
I see a difference in the loss value, but I don't know how to interpret it? Does this mean that for the less good architecture, I don't reach a minimum? If yes, what can I modify in my NN ?

Thank you for the time you took to answer!!

Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?

    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

answered yesterday

nolw38

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

nolw38 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47094%2fwhy-does-the-classic-neural-network-perform-better-than-lstm-in-sentiment-analys%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.

answered 2 days ago

Victor Oliveira

3057

add a comment |

So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.

answered 2 days ago

Victor Oliveira

3057

add a comment |

So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.

answered 2 days ago

Victor Oliveira

3057

So to conclude, try these things and let us know your results. Also, if anyone disagrees with my answer, I would like to discuss it. I hope it helps.

answered 2 days ago

Victor Oliveira

3057

answered 2 days ago

Victor Oliveira

3057

answered 2 days ago

Victor Oliveira

3057

answered 2 days ago

Victor Oliveira

3057

add a comment |

Thank you for your answer.
I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!

Thank you for the time you took to answer!!

Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?

    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

answered yesterday

nolw38

New contributor

add a comment |

Thank you for your answer.
I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!

Thank you for the time you took to answer!!

Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?

    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

answered yesterday

nolw38

New contributor

add a comment |

Thank you for your answer.
I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!

Thank you for the time you took to answer!!

Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?

    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

answered yesterday

nolw38

New contributor

Thank you for your answer.
I changed sigmoid for RELU, and the result is the same. Anyway, I will keep RELU now!!

Thank you for the time you took to answer!!

Edit: Where can I modify parameters like learning rate in the LSTM part of the Neural network?

    left_branch = Input(shape=(100,), dtype='int32')

    # input_dim: Size of maximum integer (7001 here); output dim: Size of embedded vector; 

    # input_length: Size of the array

    left_branch_embedding = Embedding(7000, 300, input_length=100)(left_branch)

    lstm_out = LSTM(256)(left_branch_embedding)

    lstm_out = Dropout(0.7)(lstm_out)

    lstm_out = Dense(128, activation='sigmoid')(lstm_out)



    right_branch = Input((7012, ))

    merged = Concatenate()([lstm_out, right_branch])

    output_layer = Dense(3, activation = 'softmax')(merged)



    model = Model(inputs=[left_branch, right_branch], outputs=output_layer)

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

answered yesterday

nolw38

New contributor

answered yesterday

nolw38

New contributor

answered yesterday

nolw38

answered yesterday

nolw38

New contributor

nolw38 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

nolw38 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

nolw38 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk