Should we use only one-hot-vector for LSTM input/outputs?
$begingroup$
Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use
softmax()
function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
If we must(or better) to use softmax, how can we interpret it's result?If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?
lstm
New contributor
$endgroup$
add a comment |
$begingroup$
Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use
softmax()
function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
If we must(or better) to use softmax, how can we interpret it's result?If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?
lstm
New contributor
$endgroup$
add a comment |
$begingroup$
Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use
softmax()
function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
If we must(or better) to use softmax, how can we interpret it's result?If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?
lstm
New contributor
$endgroup$
Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use
softmax()
function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
If we must(or better) to use softmax, how can we interpret it's result?If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?
lstm
lstm
New contributor
New contributor
New contributor
asked 18 hours ago
user145959user145959
1
1
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.
This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.
I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
user145959 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45803%2fshould-we-use-only-one-hot-vector-for-lstm-input-outputs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.
This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.
I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.
$endgroup$
add a comment |
$begingroup$
This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.
This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.
I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.
$endgroup$
add a comment |
$begingroup$
This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.
This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.
I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.
$endgroup$
This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.
This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.
I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.
answered 8 hours ago
kylec123kylec123
13
13
add a comment |
add a comment |
user145959 is a new contributor. Be nice, and check out our Code of Conduct.
user145959 is a new contributor. Be nice, and check out our Code of Conduct.
user145959 is a new contributor. Be nice, and check out our Code of Conduct.
user145959 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45803%2fshould-we-use-only-one-hot-vector-for-lstm-input-outputs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown