Should we use only one-hot-vector for LSTM input/outputs?












0












$begingroup$



  1. Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
    I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
    Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?


  2. If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use softmax() function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
    If we must(or better) to use softmax, how can we interpret it's result?


  3. If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?











share|improve this question







New contributor




user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$



    1. Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
      I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
      Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?


    2. If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use softmax() function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
      If we must(or better) to use softmax, how can we interpret it's result?


    3. If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?











    share|improve this question







    New contributor




    user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$



      1. Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
        I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
        Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?


      2. If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use softmax() function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
        If we must(or better) to use softmax, how can we interpret it's result?


      3. If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?











      share|improve this question







      New contributor




      user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$





      1. Should we convert our inputs to on-hot-vectors and expect one-hot-vectors as output?
        I mean can we feed LSTM with a vector like x=[12, -234, 54 , 78 , 12 , 6], and have a label vector like this: y=[13, -230, 50, 80 , 9 , 7]? (And we don't use one-hot-vectors at all).
        Will such network work properly? Or it's better to convert inputs/outputs to a one-hot-vector and this is essence of LSTM?


      2. If feeding LSTM with one-hot-vector isn't a necessary rule, and we like to feed our network with such vectors in my previous question, should we again use softmax() function for out outputs? Or we can use better options for such problem(or even don't use any functions there)?
        If we must(or better) to use softmax, how can we interpret it's result?


      3. If it's better to convert our inputs/outputs to one-hot-vectors, can we use two or three hot vectors(I mean: x =[1,0,0,1,0,0] or x=[0,1,1,1,0,0])? Does this work properly or it disrupts the LSTM performance?








      lstm






      share|improve this question







      New contributor




      user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 18 hours ago









      user145959user145959

      1




      1




      New contributor




      user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      user145959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$


          1. This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.


          2. This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.


          3. I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.







          share|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            user145959 is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45803%2fshould-we-use-only-one-hot-vector-for-lstm-input-outputs%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$


            1. This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.


            2. This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.


            3. I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.







            share|improve this answer









            $endgroup$


















              0












              $begingroup$


              1. This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.


              2. This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.


              3. I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.







              share|improve this answer









              $endgroup$
















                0












                0








                0





                $begingroup$


                1. This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.


                2. This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.


                3. I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.







                share|improve this answer









                $endgroup$




                1. This depends on what your data is representing and what you want to predict. My understanding of One-Hot-Encoding is that this should only be used for encoding of categorical features. For example, if you have a feature representing a category of K classes, you should one hot encode this as well as the Y variable (if you are trying to predict this categorical variable). Of course, have the final layer be a softmax to output a distribution of size K.


                2. This highly depends on what your data is representing. If categorical, see above. If just a simple numeric, you should not one hot encode this. You could, I guess, if this set of integers is finite and small, but there is no need to learn the extra weights. You should only be using a softmax when you want to output a vector of K dimension where the entries all sum to one (perfect for representing a probability distribution over K classes). The final layer should output whatever it is you want to predict. If you want to predict something as simple as a numeric variable at the next time-step, just have a dense layer of size 1, with some activation function (relu probably). More information about what exactly you are trying to predict is needed to know how to recommend anything concrete here.


                3. I'm not sure about this. You could have inputs and outputs represented this way, but you wouldn't use a softmax activation at the end. You would use a dense layer outputting a vector the size of your X variable, that is, if that's the variable you are trying to predict. I'm not super well trained in various types of use cases for LSTMs, but from my experience, I can't think of a reason to do this.








                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 8 hours ago









                kylec123kylec123

                13




                13






















                    user145959 is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    user145959 is a new contributor. Be nice, and check out our Code of Conduct.













                    user145959 is a new contributor. Be nice, and check out our Code of Conduct.












                    user145959 is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45803%2fshould-we-use-only-one-hot-vector-for-lstm-input-outputs%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to label and detect the document text images

                    Vallis Paradisi

                    Tabula Rosettana