Should we use only one-hot-vector for LSTM input/outputs?

Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?

If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use softmax() function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
If we must(or better) to use softmax, how can we interpret it's result?

If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?

asked 18 hours ago

user145959

New contributor

add a comment |

Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?

If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use softmax() function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
If we must(or better) to use softmax, how can we interpret it's result?

If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?

asked 18 hours ago

user145959

New contributor

add a comment |

Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?

If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use softmax() function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
If we must(or better) to use softmax, how can we interpret it's result?

If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?

asked 18 hours ago

user145959

New contributor

Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?

If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use softmax() function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
If we must(or better) to use softmax, how can we interpret it's result?

If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?

lstm

asked 18 hours ago

user145959

New contributor

asked 18 hours ago

user145959

New contributor

asked 18 hours ago

user145959

New contributor

asked 18 hours ago

user145959

asked 18 hours ago

user145959

New contributor

user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.

This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.

I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.

answered 8 hours ago

kylec123

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

user145959 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45803%2fshould-we-use-only-one-hot-vector-for-lstm-input-outputs%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.

This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.

I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.

answered 8 hours ago

kylec123

add a comment |

This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.

This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.

I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.

answered 8 hours ago

kylec123

add a comment |

This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.

This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.

I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.

answered 8 hours ago

kylec123

This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.

This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.

I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.

answered 8 hours ago

kylec123

answered 8 hours ago

kylec123

answered 8 hours ago

kylec123

answered 8 hours ago

kylec123

add a comment |

user145959 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

user145959 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk