understanding linear algebra of a forget gate












2












$begingroup$


This blog covers the basics of LSTMs.



A forget gate is defined as :



$$f_t = sigma(W_f cdot [h_{t-1}, x_t]+ b_f)$$



At this point the linear algebra confuses me more than it should. The syntax of $Wcdot [h,x]$ is confusing in this context. I think a vector should go into the activation function since the output $f$ is a vector, but the syntax of the forget gate above implies that the input has $2$ columns because $[h,x]$ will be an $ntimes 2$ matrix



For the sake of example lets say ...



begin{align} W &= begin{bmatrix} 0 & 1 \
2 &3 end{bmatrix}\
h &= begin{bmatrix} -1 \
2 end{bmatrix}\
x &= begin{bmatrix} 3 \
0 end{bmatrix}\
b &= begin{bmatrix} 1 \
-2 end{bmatrix}end{align}



Can anyone give the final vector that goes into the sigmoid function ?



I think the math is



$$ begin{bmatrix} 0 & 1 \ 2 & 3 end{bmatrix}begin{bmatrix} -3 & 3 \ 2 & 0 end{bmatrix} + begin{bmatrix} 1 \ -2end{bmatrix} = begin{bmatrix} 2 & 0 \ 4 & 6end{bmatrix}+ begin{bmatrix} 1 \ -2end{bmatrix} = text{ Something wrong}$$










share|improve this question









New contributor




sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    2












    $begingroup$


    This blog covers the basics of LSTMs.



    A forget gate is defined as :



    $$f_t = sigma(W_f cdot [h_{t-1}, x_t]+ b_f)$$



    At this point the linear algebra confuses me more than it should. The syntax of $Wcdot [h,x]$ is confusing in this context. I think a vector should go into the activation function since the output $f$ is a vector, but the syntax of the forget gate above implies that the input has $2$ columns because $[h,x]$ will be an $ntimes 2$ matrix



    For the sake of example lets say ...



    begin{align} W &= begin{bmatrix} 0 & 1 \
    2 &3 end{bmatrix}\
    h &= begin{bmatrix} -1 \
    2 end{bmatrix}\
    x &= begin{bmatrix} 3 \
    0 end{bmatrix}\
    b &= begin{bmatrix} 1 \
    -2 end{bmatrix}end{align}



    Can anyone give the final vector that goes into the sigmoid function ?



    I think the math is



    $$ begin{bmatrix} 0 & 1 \ 2 & 3 end{bmatrix}begin{bmatrix} -3 & 3 \ 2 & 0 end{bmatrix} + begin{bmatrix} 1 \ -2end{bmatrix} = begin{bmatrix} 2 & 0 \ 4 & 6end{bmatrix}+ begin{bmatrix} 1 \ -2end{bmatrix} = text{ Something wrong}$$










    share|improve this question









    New contributor




    sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      2












      2








      2





      $begingroup$


      This blog covers the basics of LSTMs.



      A forget gate is defined as :



      $$f_t = sigma(W_f cdot [h_{t-1}, x_t]+ b_f)$$



      At this point the linear algebra confuses me more than it should. The syntax of $Wcdot [h,x]$ is confusing in this context. I think a vector should go into the activation function since the output $f$ is a vector, but the syntax of the forget gate above implies that the input has $2$ columns because $[h,x]$ will be an $ntimes 2$ matrix



      For the sake of example lets say ...



      begin{align} W &= begin{bmatrix} 0 & 1 \
      2 &3 end{bmatrix}\
      h &= begin{bmatrix} -1 \
      2 end{bmatrix}\
      x &= begin{bmatrix} 3 \
      0 end{bmatrix}\
      b &= begin{bmatrix} 1 \
      -2 end{bmatrix}end{align}



      Can anyone give the final vector that goes into the sigmoid function ?



      I think the math is



      $$ begin{bmatrix} 0 & 1 \ 2 & 3 end{bmatrix}begin{bmatrix} -3 & 3 \ 2 & 0 end{bmatrix} + begin{bmatrix} 1 \ -2end{bmatrix} = begin{bmatrix} 2 & 0 \ 4 & 6end{bmatrix}+ begin{bmatrix} 1 \ -2end{bmatrix} = text{ Something wrong}$$










      share|improve this question









      New contributor




      sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      This blog covers the basics of LSTMs.



      A forget gate is defined as :



      $$f_t = sigma(W_f cdot [h_{t-1}, x_t]+ b_f)$$



      At this point the linear algebra confuses me more than it should. The syntax of $Wcdot [h,x]$ is confusing in this context. I think a vector should go into the activation function since the output $f$ is a vector, but the syntax of the forget gate above implies that the input has $2$ columns because $[h,x]$ will be an $ntimes 2$ matrix



      For the sake of example lets say ...



      begin{align} W &= begin{bmatrix} 0 & 1 \
      2 &3 end{bmatrix}\
      h &= begin{bmatrix} -1 \
      2 end{bmatrix}\
      x &= begin{bmatrix} 3 \
      0 end{bmatrix}\
      b &= begin{bmatrix} 1 \
      -2 end{bmatrix}end{align}



      Can anyone give the final vector that goes into the sigmoid function ?



      I think the math is



      $$ begin{bmatrix} 0 & 1 \ 2 & 3 end{bmatrix}begin{bmatrix} -3 & 3 \ 2 & 0 end{bmatrix} + begin{bmatrix} 1 \ -2end{bmatrix} = begin{bmatrix} 2 & 0 \ 4 & 6end{bmatrix}+ begin{bmatrix} 1 \ -2end{bmatrix} = text{ Something wrong}$$







      neural-network lstm rnn






      share|improve this question









      New contributor




      sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 14 hours ago









      Siong Thye Goh

      1,197418




      1,197418






      New contributor




      sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 17 hours ago









      samsam

      112




      112




      New contributor




      sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      sam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          I interpret it as



          $$f_t = sigma left(W_fcdot begin{bmatrix} h_{t-1} \ x_tend{bmatrix} + b_fright)$$



          That is $W_f$ has as many columns as the entries of $h_{t-1}$ and $x_t$. $W_f$ also has as many rows as $b_f$. This would make the dimension matches and prodcues a vector output.






          share|improve this answer











          $endgroup$





















            1












            $begingroup$

            Note that $$[h_{t-1}, x_t]$$
            is the concatenation of two vectors.
            In your example, it would be: $$[h_{t-1}, x_t] = [-1, 2 , 3, 0]$$
            and then the dimensions of $W_f$ would be $2 times 4$, where $2$ is the dimension of the output of the LSTM cell, i.e. the activation $h_t$, that you defined to be of dimension $2$.



            Hence, $$W_f cdot [h_{t-1}, x_t] $$ is a multiplication of a matrix of dimension $2times4$ by a vector of $4$, which will return a vector of dimesion $2$. And then the sigmoid function will be applied point wise on each of the two elements of the result.



            Hope it makes sense.






            share|improve this answer











            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "557"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });






              sam is a new contributor. Be nice, and check out our Code of Conduct.










              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47017%2funderstanding-linear-algebra-of-a-forget-gate%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              I interpret it as



              $$f_t = sigma left(W_fcdot begin{bmatrix} h_{t-1} \ x_tend{bmatrix} + b_fright)$$



              That is $W_f$ has as many columns as the entries of $h_{t-1}$ and $x_t$. $W_f$ also has as many rows as $b_f$. This would make the dimension matches and prodcues a vector output.






              share|improve this answer











              $endgroup$


















                1












                $begingroup$

                I interpret it as



                $$f_t = sigma left(W_fcdot begin{bmatrix} h_{t-1} \ x_tend{bmatrix} + b_fright)$$



                That is $W_f$ has as many columns as the entries of $h_{t-1}$ and $x_t$. $W_f$ also has as many rows as $b_f$. This would make the dimension matches and prodcues a vector output.






                share|improve this answer











                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  I interpret it as



                  $$f_t = sigma left(W_fcdot begin{bmatrix} h_{t-1} \ x_tend{bmatrix} + b_fright)$$



                  That is $W_f$ has as many columns as the entries of $h_{t-1}$ and $x_t$. $W_f$ also has as many rows as $b_f$. This would make the dimension matches and prodcues a vector output.






                  share|improve this answer











                  $endgroup$



                  I interpret it as



                  $$f_t = sigma left(W_fcdot begin{bmatrix} h_{t-1} \ x_tend{bmatrix} + b_fright)$$



                  That is $W_f$ has as many columns as the entries of $h_{t-1}$ and $x_t$. $W_f$ also has as many rows as $b_f$. This would make the dimension matches and prodcues a vector output.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 14 hours ago

























                  answered 14 hours ago









                  Siong Thye GohSiong Thye Goh

                  1,197418




                  1,197418























                      1












                      $begingroup$

                      Note that $$[h_{t-1}, x_t]$$
                      is the concatenation of two vectors.
                      In your example, it would be: $$[h_{t-1}, x_t] = [-1, 2 , 3, 0]$$
                      and then the dimensions of $W_f$ would be $2 times 4$, where $2$ is the dimension of the output of the LSTM cell, i.e. the activation $h_t$, that you defined to be of dimension $2$.



                      Hence, $$W_f cdot [h_{t-1}, x_t] $$ is a multiplication of a matrix of dimension $2times4$ by a vector of $4$, which will return a vector of dimesion $2$. And then the sigmoid function will be applied point wise on each of the two elements of the result.



                      Hope it makes sense.






                      share|improve this answer











                      $endgroup$


















                        1












                        $begingroup$

                        Note that $$[h_{t-1}, x_t]$$
                        is the concatenation of two vectors.
                        In your example, it would be: $$[h_{t-1}, x_t] = [-1, 2 , 3, 0]$$
                        and then the dimensions of $W_f$ would be $2 times 4$, where $2$ is the dimension of the output of the LSTM cell, i.e. the activation $h_t$, that you defined to be of dimension $2$.



                        Hence, $$W_f cdot [h_{t-1}, x_t] $$ is a multiplication of a matrix of dimension $2times4$ by a vector of $4$, which will return a vector of dimesion $2$. And then the sigmoid function will be applied point wise on each of the two elements of the result.



                        Hope it makes sense.






                        share|improve this answer











                        $endgroup$
















                          1












                          1








                          1





                          $begingroup$

                          Note that $$[h_{t-1}, x_t]$$
                          is the concatenation of two vectors.
                          In your example, it would be: $$[h_{t-1}, x_t] = [-1, 2 , 3, 0]$$
                          and then the dimensions of $W_f$ would be $2 times 4$, where $2$ is the dimension of the output of the LSTM cell, i.e. the activation $h_t$, that you defined to be of dimension $2$.



                          Hence, $$W_f cdot [h_{t-1}, x_t] $$ is a multiplication of a matrix of dimension $2times4$ by a vector of $4$, which will return a vector of dimesion $2$. And then the sigmoid function will be applied point wise on each of the two elements of the result.



                          Hope it makes sense.






                          share|improve this answer











                          $endgroup$



                          Note that $$[h_{t-1}, x_t]$$
                          is the concatenation of two vectors.
                          In your example, it would be: $$[h_{t-1}, x_t] = [-1, 2 , 3, 0]$$
                          and then the dimensions of $W_f$ would be $2 times 4$, where $2$ is the dimension of the output of the LSTM cell, i.e. the activation $h_t$, that you defined to be of dimension $2$.



                          Hence, $$W_f cdot [h_{t-1}, x_t] $$ is a multiplication of a matrix of dimension $2times4$ by a vector of $4$, which will return a vector of dimesion $2$. And then the sigmoid function will be applied point wise on each of the two elements of the result.



                          Hope it makes sense.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 3 hours ago

























                          answered 13 hours ago









                          EscachatorEscachator

                          309111




                          309111






















                              sam is a new contributor. Be nice, and check out our Code of Conduct.










                              draft saved

                              draft discarded


















                              sam is a new contributor. Be nice, and check out our Code of Conduct.













                              sam is a new contributor. Be nice, and check out our Code of Conduct.












                              sam is a new contributor. Be nice, and check out our Code of Conduct.
















                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47017%2funderstanding-linear-algebra-of-a-forget-gate%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              How to label and detect the document text images

                              Vallis Paradisi

                              Tabula Rosettana