Gradient flow through concatenation operation












5












$begingroup$


I need help in understanding the gradient flow through a concatenation operation.



I'm implementing a network (mostly a CNN) which has a concatenation operation (in pytorch). The network is defined such that the responses of passing two different images through a CNN are concatenated and passed through another CNN and the training is done end to end.



Since the first CNN is shared between both of the inputs to the concatenation, I was wondering how the gradients should be distributed through the concatenation operation during backprop? I'm not an expert on backprop and this is the first time I'm tinkering with a custom backward implementation so any pointers would be helpful.



I can provide more details if you guys need it.










share|improve this question











$endgroup$




bumped to the homepage by Community yesterday


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.




















    5












    $begingroup$


    I need help in understanding the gradient flow through a concatenation operation.



    I'm implementing a network (mostly a CNN) which has a concatenation operation (in pytorch). The network is defined such that the responses of passing two different images through a CNN are concatenated and passed through another CNN and the training is done end to end.



    Since the first CNN is shared between both of the inputs to the concatenation, I was wondering how the gradients should be distributed through the concatenation operation during backprop? I'm not an expert on backprop and this is the first time I'm tinkering with a custom backward implementation so any pointers would be helpful.



    I can provide more details if you guys need it.










    share|improve this question











    $endgroup$




    bumped to the homepage by Community yesterday


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.


















      5












      5








      5


      1



      $begingroup$


      I need help in understanding the gradient flow through a concatenation operation.



      I'm implementing a network (mostly a CNN) which has a concatenation operation (in pytorch). The network is defined such that the responses of passing two different images through a CNN are concatenated and passed through another CNN and the training is done end to end.



      Since the first CNN is shared between both of the inputs to the concatenation, I was wondering how the gradients should be distributed through the concatenation operation during backprop? I'm not an expert on backprop and this is the first time I'm tinkering with a custom backward implementation so any pointers would be helpful.



      I can provide more details if you guys need it.










      share|improve this question











      $endgroup$




      I need help in understanding the gradient flow through a concatenation operation.



      I'm implementing a network (mostly a CNN) which has a concatenation operation (in pytorch). The network is defined such that the responses of passing two different images through a CNN are concatenated and passed through another CNN and the training is done end to end.



      Since the first CNN is shared between both of the inputs to the concatenation, I was wondering how the gradients should be distributed through the concatenation operation during backprop? I'm not an expert on backprop and this is the first time I'm tinkering with a custom backward implementation so any pointers would be helpful.



      I can provide more details if you guys need it.







      deep-learning convnet backpropagation computer-vision






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 12 '17 at 22:01









      Stephen Rauch

      1,52551330




      1,52551330










      asked Dec 12 '17 at 18:18









      MonsterMonster

      1264




      1264





      bumped to the homepage by Community yesterday


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community yesterday


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
























          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.



          The layer immediately after the concatenated layer does interact with both networks, and it will have some weight parameters that multiply outputs from network A and some that multiply outputs from network B. There will not be any parameters that multiply outputs from both layers (unless you are forcing them to be the same through weight sharing, but that won't be the case if for example you are stacking features from both starting networks).



          The only issue you might have is clearly identifying which parameters link to each original network. That is an implementation detail, so you would need to share your code so far in order to debug that if it goes wrong.






          share|improve this answer









          $endgroup$













          • $begingroup$
            Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
            $endgroup$
            – Monster
            Dec 13 '17 at 22:23










          • $begingroup$
            They are updated once, using as gradient the summation of the gradients computed through A and B.
            $endgroup$
            – ncasas
            Jan 12 '18 at 13:51












          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25606%2fgradient-flow-through-concatenation-operation%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.



          The layer immediately after the concatenated layer does interact with both networks, and it will have some weight parameters that multiply outputs from network A and some that multiply outputs from network B. There will not be any parameters that multiply outputs from both layers (unless you are forcing them to be the same through weight sharing, but that won't be the case if for example you are stacking features from both starting networks).



          The only issue you might have is clearly identifying which parameters link to each original network. That is an implementation detail, so you would need to share your code so far in order to debug that if it goes wrong.






          share|improve this answer









          $endgroup$













          • $begingroup$
            Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
            $endgroup$
            – Monster
            Dec 13 '17 at 22:23










          • $begingroup$
            They are updated once, using as gradient the summation of the gradients computed through A and B.
            $endgroup$
            – ncasas
            Jan 12 '18 at 13:51
















          0












          $begingroup$

          For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.



          The layer immediately after the concatenated layer does interact with both networks, and it will have some weight parameters that multiply outputs from network A and some that multiply outputs from network B. There will not be any parameters that multiply outputs from both layers (unless you are forcing them to be the same through weight sharing, but that won't be the case if for example you are stacking features from both starting networks).



          The only issue you might have is clearly identifying which parameters link to each original network. That is an implementation detail, so you would need to share your code so far in order to debug that if it goes wrong.






          share|improve this answer









          $endgroup$













          • $begingroup$
            Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
            $endgroup$
            – Monster
            Dec 13 '17 at 22:23










          • $begingroup$
            They are updated once, using as gradient the summation of the gradients computed through A and B.
            $endgroup$
            – ncasas
            Jan 12 '18 at 13:51














          0












          0








          0





          $begingroup$

          For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.



          The layer immediately after the concatenated layer does interact with both networks, and it will have some weight parameters that multiply outputs from network A and some that multiply outputs from network B. There will not be any parameters that multiply outputs from both layers (unless you are forcing them to be the same through weight sharing, but that won't be the case if for example you are stacking features from both starting networks).



          The only issue you might have is clearly identifying which parameters link to each original network. That is an implementation detail, so you would need to share your code so far in order to debug that if it goes wrong.






          share|improve this answer









          $endgroup$



          For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.



          The layer immediately after the concatenated layer does interact with both networks, and it will have some weight parameters that multiply outputs from network A and some that multiply outputs from network B. There will not be any parameters that multiply outputs from both layers (unless you are forcing them to be the same through weight sharing, but that won't be the case if for example you are stacking features from both starting networks).



          The only issue you might have is clearly identifying which parameters link to each original network. That is an implementation detail, so you would need to share your code so far in order to debug that if it goes wrong.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Dec 13 '17 at 11:41









          Neil SlaterNeil Slater

          17.6k33264




          17.6k33264












          • $begingroup$
            Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
            $endgroup$
            – Monster
            Dec 13 '17 at 22:23










          • $begingroup$
            They are updated once, using as gradient the summation of the gradients computed through A and B.
            $endgroup$
            – ncasas
            Jan 12 '18 at 13:51


















          • $begingroup$
            Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
            $endgroup$
            – Monster
            Dec 13 '17 at 22:23










          • $begingroup$
            They are updated once, using as gradient the summation of the gradients computed through A and B.
            $endgroup$
            – ncasas
            Jan 12 '18 at 13:51
















          $begingroup$
          Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
          $endgroup$
          – Monster
          Dec 13 '17 at 22:23




          $begingroup$
          Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
          $endgroup$
          – Monster
          Dec 13 '17 at 22:23












          $begingroup$
          They are updated once, using as gradient the summation of the gradients computed through A and B.
          $endgroup$
          – ncasas
          Jan 12 '18 at 13:51




          $begingroup$
          They are updated once, using as gradient the summation of the gradients computed through A and B.
          $endgroup$
          – ncasas
          Jan 12 '18 at 13:51


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25606%2fgradient-flow-through-concatenation-operation%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Callistus I

          Tabula Rosettana

          How to label and detect the document text images