Gradient flow through concatenation operation

I need help in understanding the gradient flow through a concatenation operation.

I'm implementing a network (mostly a CNN) which has a concatenation operation (in pytorch). The network is defined such that the responses of passing two different images through a CNN are concatenated and passed through another CNN and the training is done end to end.

Since the first CNN is shared between both of the inputs to the concatenation, I was wondering how the gradients should be distributed through the concatenation operation during backprop? I'm not an expert on backprop and this is the first time I'm tinkering with a custom backward implementation so any pointers would be helpful.

I can provide more details if you guys need it.

edited Dec 12 '17 at 22:01

Stephen Rauch♦

1,52551330

asked Dec 12 '17 at 18:18

Monster

1264

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

I need help in understanding the gradient flow through a concatenation operation.

I can provide more details if you guys need it.

edited Dec 12 '17 at 22:01

Stephen Rauch♦

1,52551330

asked Dec 12 '17 at 18:18

Monster

1264

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

I need help in understanding the gradient flow through a concatenation operation.

I can provide more details if you guys need it.

edited Dec 12 '17 at 22:01

Stephen Rauch♦

1,52551330

asked Dec 12 '17 at 18:18

Monster

1264

I need help in understanding the gradient flow through a concatenation operation.

I can provide more details if you guys need it.

deep-learning convnet backpropagation computer-vision

edited Dec 12 '17 at 22:01

Stephen Rauch♦

1,52551330

asked Dec 12 '17 at 18:18

Monster

1264

edited Dec 12 '17 at 22:01

Stephen Rauch♦

1,52551330

asked Dec 12 '17 at 18:18

Monster

1264

edited Dec 12 '17 at 22:01

Stephen Rauch♦

1,52551330

edited Dec 12 '17 at 22:01

Stephen Rauch♦

1,52551330

edited Dec 12 '17 at 22:01

Stephen Rauch♦

1,52551330

asked Dec 12 '17 at 18:18

Monster

1264

asked Dec 12 '17 at 18:18

Monster

1264

asked Dec 12 '17 at 18:18

Monster

1264

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

1 Answer
1

active

oldest

votes

For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.

The layer immediately after the concatenated layer does interact with both networks, and it will have some weight parameters that multiply outputs from network A and some that multiply outputs from network B. There will not be any parameters that multiply outputs from both layers (unless you are forcing them to be the same through weight sharing, but that won't be the case if for example you are stacking features from both starting networks).

The only issue you might have is clearly identifying which parameters link to each original network. That is an implementation detail, so you would need to share your code so far in order to debug that if it goes wrong.

answered Dec 13 '17 at 11:41

Neil Slater

17.6k33264

$begingroup$
Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
$endgroup$
– Monster
Dec 13 '17 at 22:23

$begingroup$
They are updated once, using as gradient the summation of the gradients computed through A and B.
$endgroup$
– ncasas
Jan 12 '18 at 13:51

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25606%2fgradient-flow-through-concatenation-operation%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.

answered Dec 13 '17 at 11:41

Neil Slater

17.6k33264

$begingroup$
Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
$endgroup$
– Monster
Dec 13 '17 at 22:23

$begingroup$
They are updated once, using as gradient the summation of the gradients computed through A and B.
$endgroup$
– ncasas
Jan 12 '18 at 13:51

add a comment |

For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.

answered Dec 13 '17 at 11:41

Neil Slater

17.6k33264

$begingroup$
Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
$endgroup$
– Monster
Dec 13 '17 at 22:23

$begingroup$
They are updated once, using as gradient the summation of the gradients computed through A and B.
$endgroup$
– ncasas
Jan 12 '18 at 13:51

add a comment |

For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.

answered Dec 13 '17 at 11:41

Neil Slater

17.6k33264

For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.

answered Dec 13 '17 at 11:41

Neil Slater

17.6k33264

answered Dec 13 '17 at 11:41

Neil Slater

17.6k33264

answered Dec 13 '17 at 11:41

Neil Slater

17.6k33264

answered Dec 13 '17 at 11:41

Neil Slater

17.6k33264

$begingroup$
Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
$endgroup$
– Monster
Dec 13 '17 at 22:23

$begingroup$
They are updated once, using as gradient the summation of the gradients computed through A and B.
$endgroup$
– ncasas
Jan 12 '18 at 13:51

add a comment |

$begingroup$
Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?
$endgroup$
– Monster
Dec 13 '17 at 22:23

$begingroup$
They are updated once, using as gradient the summation of the gradients computed through A and B.
$endgroup$
– ncasas
Jan 12 '18 at 13:51

Thanks for your answer! The problem that I'm facing is that network A and network B are the same (with same weights, not just architecture)? How should the gradients be distributed in this case? Should the network weights be updated twice? or just once?

– Monster
Dec 13 '17 at 22:23

They are updated once, using as gradient the summation of the gradients computed through A and B.

– ncasas
Jan 12 '18 at 13:51

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk