What are deconvolutional layers?












145












$begingroup$


I recently read Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, Trevor Darrell. I don't understand what "deconvolutional layers" do / how they work.



The relevant part is




3.3. Upsampling is backwards strided convolution



Another way to connect coarse outputs to dense pixels
is interpolation. For instance, simple bilinear interpolation
computes each output $y_{ij}$ from the nearest four inputs by a
linear map that depends only on the relative positions of the
input and output cells.

In a sense, upsampling with factor $f$ is convolution with
a fractional input stride of 1/f. So long as $f$ is integral, a
natural way to upsample is therefore backwards convolution
(sometimes called deconvolution) with an output stride of
$f$. Such an operation is trivial to implement, since it simply
reverses the forward and backward passes of convolution.

Thus upsampling is performed in-network for end-to-end
learning by backpropagation from the pixelwise loss.

Note that the deconvolution filter in such a layer need not
be fixed (e.g., to bilinear upsampling), but can be learned.
A stack of deconvolution layers and activation functions can
even learn a nonlinear upsampling.

In our experiments, we find that in-network upsampling
is fast and effective for learning dense prediction. Our best
segmentation architecture uses these layers to learn to upsample
for refined prediction in Section 4.2.




I don't think I really understood how convolutional layers are trained.



What I think I've understood is that convolutional layers with a kernel size $k$ learn filters of size $k times k$. The output of a convolutional layer with kernel size $k$, stride $s in mathbb{N}$ and $n$ filters is of dimension $frac{text{Input dim}}{s^2} cdot n$. However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).



So if my understanding of convolutional layers is correct, I have no clue how this can be reversed.



Could anybody please help me to understand deconvolutional layers?










share|improve this question











$endgroup$








  • 1




    $begingroup$
    This video lecture explains deconvolution/upsampling: youtu.be/ByjaPdWXKJ4?t=16m59s
    $endgroup$
    – user199309
    Apr 22 '16 at 13:14






  • 3




    $begingroup$
    Hoping it could be useful to anyone, I made a notebook to explore how convolution and transposed convolution can be used in TensorFlow (0.11). Maybe having some practical examples and figures may help a bit more to understand how they works.
    $endgroup$
    – AkiRoss
    Nov 22 '16 at 14:56










  • $begingroup$
    For me, this page gave me a better explanation it also explains the difference between deconvolution and transpose convolution: towardsdatascience.com/…
    $endgroup$
    – T.Antoni
    Jun 28 '18 at 17:37










  • $begingroup$
    Isn't upsampling more like backwards pooling than backwards strided convolution, since it has no parameters?
    $endgroup$
    – Ken Fehling
    Jul 11 '18 at 17:29
















145












$begingroup$


I recently read Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, Trevor Darrell. I don't understand what "deconvolutional layers" do / how they work.



The relevant part is




3.3. Upsampling is backwards strided convolution



Another way to connect coarse outputs to dense pixels
is interpolation. For instance, simple bilinear interpolation
computes each output $y_{ij}$ from the nearest four inputs by a
linear map that depends only on the relative positions of the
input and output cells.

In a sense, upsampling with factor $f$ is convolution with
a fractional input stride of 1/f. So long as $f$ is integral, a
natural way to upsample is therefore backwards convolution
(sometimes called deconvolution) with an output stride of
$f$. Such an operation is trivial to implement, since it simply
reverses the forward and backward passes of convolution.

Thus upsampling is performed in-network for end-to-end
learning by backpropagation from the pixelwise loss.

Note that the deconvolution filter in such a layer need not
be fixed (e.g., to bilinear upsampling), but can be learned.
A stack of deconvolution layers and activation functions can
even learn a nonlinear upsampling.

In our experiments, we find that in-network upsampling
is fast and effective for learning dense prediction. Our best
segmentation architecture uses these layers to learn to upsample
for refined prediction in Section 4.2.




I don't think I really understood how convolutional layers are trained.



What I think I've understood is that convolutional layers with a kernel size $k$ learn filters of size $k times k$. The output of a convolutional layer with kernel size $k$, stride $s in mathbb{N}$ and $n$ filters is of dimension $frac{text{Input dim}}{s^2} cdot n$. However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).



So if my understanding of convolutional layers is correct, I have no clue how this can be reversed.



Could anybody please help me to understand deconvolutional layers?










share|improve this question











$endgroup$








  • 1




    $begingroup$
    This video lecture explains deconvolution/upsampling: youtu.be/ByjaPdWXKJ4?t=16m59s
    $endgroup$
    – user199309
    Apr 22 '16 at 13:14






  • 3




    $begingroup$
    Hoping it could be useful to anyone, I made a notebook to explore how convolution and transposed convolution can be used in TensorFlow (0.11). Maybe having some practical examples and figures may help a bit more to understand how they works.
    $endgroup$
    – AkiRoss
    Nov 22 '16 at 14:56










  • $begingroup$
    For me, this page gave me a better explanation it also explains the difference between deconvolution and transpose convolution: towardsdatascience.com/…
    $endgroup$
    – T.Antoni
    Jun 28 '18 at 17:37










  • $begingroup$
    Isn't upsampling more like backwards pooling than backwards strided convolution, since it has no parameters?
    $endgroup$
    – Ken Fehling
    Jul 11 '18 at 17:29














145












145








145


109



$begingroup$


I recently read Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, Trevor Darrell. I don't understand what "deconvolutional layers" do / how they work.



The relevant part is




3.3. Upsampling is backwards strided convolution



Another way to connect coarse outputs to dense pixels
is interpolation. For instance, simple bilinear interpolation
computes each output $y_{ij}$ from the nearest four inputs by a
linear map that depends only on the relative positions of the
input and output cells.

In a sense, upsampling with factor $f$ is convolution with
a fractional input stride of 1/f. So long as $f$ is integral, a
natural way to upsample is therefore backwards convolution
(sometimes called deconvolution) with an output stride of
$f$. Such an operation is trivial to implement, since it simply
reverses the forward and backward passes of convolution.

Thus upsampling is performed in-network for end-to-end
learning by backpropagation from the pixelwise loss.

Note that the deconvolution filter in such a layer need not
be fixed (e.g., to bilinear upsampling), but can be learned.
A stack of deconvolution layers and activation functions can
even learn a nonlinear upsampling.

In our experiments, we find that in-network upsampling
is fast and effective for learning dense prediction. Our best
segmentation architecture uses these layers to learn to upsample
for refined prediction in Section 4.2.




I don't think I really understood how convolutional layers are trained.



What I think I've understood is that convolutional layers with a kernel size $k$ learn filters of size $k times k$. The output of a convolutional layer with kernel size $k$, stride $s in mathbb{N}$ and $n$ filters is of dimension $frac{text{Input dim}}{s^2} cdot n$. However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).



So if my understanding of convolutional layers is correct, I have no clue how this can be reversed.



Could anybody please help me to understand deconvolutional layers?










share|improve this question











$endgroup$




I recently read Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, Trevor Darrell. I don't understand what "deconvolutional layers" do / how they work.



The relevant part is




3.3. Upsampling is backwards strided convolution



Another way to connect coarse outputs to dense pixels
is interpolation. For instance, simple bilinear interpolation
computes each output $y_{ij}$ from the nearest four inputs by a
linear map that depends only on the relative positions of the
input and output cells.

In a sense, upsampling with factor $f$ is convolution with
a fractional input stride of 1/f. So long as $f$ is integral, a
natural way to upsample is therefore backwards convolution
(sometimes called deconvolution) with an output stride of
$f$. Such an operation is trivial to implement, since it simply
reverses the forward and backward passes of convolution.

Thus upsampling is performed in-network for end-to-end
learning by backpropagation from the pixelwise loss.

Note that the deconvolution filter in such a layer need not
be fixed (e.g., to bilinear upsampling), but can be learned.
A stack of deconvolution layers and activation functions can
even learn a nonlinear upsampling.

In our experiments, we find that in-network upsampling
is fast and effective for learning dense prediction. Our best
segmentation architecture uses these layers to learn to upsample
for refined prediction in Section 4.2.




I don't think I really understood how convolutional layers are trained.



What I think I've understood is that convolutional layers with a kernel size $k$ learn filters of size $k times k$. The output of a convolutional layer with kernel size $k$, stride $s in mathbb{N}$ and $n$ filters is of dimension $frac{text{Input dim}}{s^2} cdot n$. However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).



So if my understanding of convolutional layers is correct, I have no clue how this can be reversed.



Could anybody please help me to understand deconvolutional layers?







neural-network convnet






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 7 '16 at 10:47







Martin Thoma

















asked Jun 13 '15 at 9:56









Martin ThomaMartin Thoma

6,0001353126




6,0001353126








  • 1




    $begingroup$
    This video lecture explains deconvolution/upsampling: youtu.be/ByjaPdWXKJ4?t=16m59s
    $endgroup$
    – user199309
    Apr 22 '16 at 13:14






  • 3




    $begingroup$
    Hoping it could be useful to anyone, I made a notebook to explore how convolution and transposed convolution can be used in TensorFlow (0.11). Maybe having some practical examples and figures may help a bit more to understand how they works.
    $endgroup$
    – AkiRoss
    Nov 22 '16 at 14:56










  • $begingroup$
    For me, this page gave me a better explanation it also explains the difference between deconvolution and transpose convolution: towardsdatascience.com/…
    $endgroup$
    – T.Antoni
    Jun 28 '18 at 17:37










  • $begingroup$
    Isn't upsampling more like backwards pooling than backwards strided convolution, since it has no parameters?
    $endgroup$
    – Ken Fehling
    Jul 11 '18 at 17:29














  • 1




    $begingroup$
    This video lecture explains deconvolution/upsampling: youtu.be/ByjaPdWXKJ4?t=16m59s
    $endgroup$
    – user199309
    Apr 22 '16 at 13:14






  • 3




    $begingroup$
    Hoping it could be useful to anyone, I made a notebook to explore how convolution and transposed convolution can be used in TensorFlow (0.11). Maybe having some practical examples and figures may help a bit more to understand how they works.
    $endgroup$
    – AkiRoss
    Nov 22 '16 at 14:56










  • $begingroup$
    For me, this page gave me a better explanation it also explains the difference between deconvolution and transpose convolution: towardsdatascience.com/…
    $endgroup$
    – T.Antoni
    Jun 28 '18 at 17:37










  • $begingroup$
    Isn't upsampling more like backwards pooling than backwards strided convolution, since it has no parameters?
    $endgroup$
    – Ken Fehling
    Jul 11 '18 at 17:29








1




1




$begingroup$
This video lecture explains deconvolution/upsampling: youtu.be/ByjaPdWXKJ4?t=16m59s
$endgroup$
– user199309
Apr 22 '16 at 13:14




$begingroup$
This video lecture explains deconvolution/upsampling: youtu.be/ByjaPdWXKJ4?t=16m59s
$endgroup$
– user199309
Apr 22 '16 at 13:14




3




3




$begingroup$
Hoping it could be useful to anyone, I made a notebook to explore how convolution and transposed convolution can be used in TensorFlow (0.11). Maybe having some practical examples and figures may help a bit more to understand how they works.
$endgroup$
– AkiRoss
Nov 22 '16 at 14:56




$begingroup$
Hoping it could be useful to anyone, I made a notebook to explore how convolution and transposed convolution can be used in TensorFlow (0.11). Maybe having some practical examples and figures may help a bit more to understand how they works.
$endgroup$
– AkiRoss
Nov 22 '16 at 14:56












$begingroup$
For me, this page gave me a better explanation it also explains the difference between deconvolution and transpose convolution: towardsdatascience.com/…
$endgroup$
– T.Antoni
Jun 28 '18 at 17:37




$begingroup$
For me, this page gave me a better explanation it also explains the difference between deconvolution and transpose convolution: towardsdatascience.com/…
$endgroup$
– T.Antoni
Jun 28 '18 at 17:37












$begingroup$
Isn't upsampling more like backwards pooling than backwards strided convolution, since it has no parameters?
$endgroup$
– Ken Fehling
Jul 11 '18 at 17:29




$begingroup$
Isn't upsampling more like backwards pooling than backwards strided convolution, since it has no parameters?
$endgroup$
– Ken Fehling
Jul 11 '18 at 17:29










9 Answers
9






active

oldest

votes


















168












$begingroup$

Deconvolution layer is a very unfortunate name and should rather be called a transposed convolutional layer.



Visually, for a transposed convolution with stride one and no padding, we just pad the original input (blue entries) with zeroes (white entries) (Figure 1).



Figure 1



In case of stride two and padding, the transposed convolution would look like this (Figure 2):



Figure 2



You can find more (great) visualisations of convolutional arithmetics here.






share|improve this answer









$endgroup$









  • 13




    $begingroup$
    Just to make sure I understood it: "Deconvolution" is pretty much the same as convolution, but you add some padding? (Around the image / when s > 1 also around each pixel)?
    $endgroup$
    – Martin Thoma
    Jun 8 '16 at 5:00






  • 14




    $begingroup$
    Yes, a deconvolution layer performs also convolution! That is why transposed convolution fits so much better as name and the term deconvolution is actually misleading.
    $endgroup$
    – David Dao
    Jun 30 '16 at 20:47






  • 9




    $begingroup$
    Why do you say "no padding" in Figure 1, if actually input is zero-padded?
    $endgroup$
    – Stas S
    Jul 30 '16 at 13:06






  • 5




    $begingroup$
    By the way: It is called transposed convolution now in TensorFlow: tensorflow.org/versions/r0.10/api_docs/python/…
    $endgroup$
    – Martin Thoma
    Aug 8 '16 at 14:08






  • 5




    $begingroup$
    Thanks for this very intuitive answer, but I'm confused about why the second one is the 'stride two' case, it behaves exactly like the first one when kernel moves.
    $endgroup$
    – Demonedge
    Aug 10 '16 at 14:01



















38












$begingroup$

I think one way to get a really basic level intuition behind convolution is that you are sliding K filters, which you can think of as K stencils, over the input image and produce K activations - each one representing a degree of match with a particular stencil. The inverse operation of that would be to take K activations and expand them into a preimage of the convolution operation. The intuitive explanation of the inverse operation is therefore, roughly, image reconstruction given the stencils (filters) and activations (the degree of the match for each stencil) and therefore at the basic intuitive level we want to blow up each activation by the stencil's mask and add them up.



Another way to approach understanding deconv would be to examine the deconvolution layer implementation in Caffe, see the following relevant bits of code:



DeconvolutionLayer<Dtype>::Forward_gpu
ConvolutionLayer<Dtype>::Backward_gpu
CuDNNConvolutionLayer<Dtype>::Backward_gpu
BaseConvolutionLayer<Dtype>::backward_cpu_gemm


You can see that it's implemented in Caffe exactly as backprop for a regular forward convolutional layer (to me it was more obvious after i compared the implementation of backprop in cuDNN conv layer vs ConvolutionLayer::Backward_gpu implemented using GEMM). So if you work through how backpropagation is done for regular convolution you will understand what happens on a mechanical computation level. The way this computation works matches the intuition described in the first paragraph of this blurb.




However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).




To answer your other question inside your first question, there are two main differences between MLP backpropagation (fully connected layer) and convolutional nets:



1) the influence of weights is localized, so first figure out how to do backprop for, say a 3x3 filter convolved with a small 3x3 area of an input image, mapping to a single point in the result image.



2) the weights of convolutional filters are shared for spatial invariance. What this means in practice is that in the forward pass the same 3x3 filter with the same weights is dragged through the entire image with the same weights for forward computation to yield the output image (for that particular filter). What this means for backprop is that the backprop gradients for each point in the source image are summed over the entire range that we dragged that filter during the forward pass. Note that there are also different gradients of loss wrt x, w and bias since dLoss/dx needs to be backpropagated, and dLoss/dw is how we update the weights. w and bias are independent inputs in the computation DAG (there are no prior inputs), so there's no need to do backpropagation on those.



(my notation here assumes that convolution is y = x*w+b where '*' is the convolution operation)





share|improve this answer











$endgroup$









  • 7




    $begingroup$
    I think this is the best answer for this question.
    $endgroup$
    – kli_nlpr
    Dec 25 '16 at 15:30






  • 7




    $begingroup$
    I agree that this is the best answer. The top answer has pretty animations, but until I read this answer they just looked like regular convolutions with some arbitrary padding to me. Oh how people are swayed by eye candy.
    $endgroup$
    – Reii Nakano
    Jun 10 '17 at 6:23












  • $begingroup$
    Agree, the accepted answer didn't explain anything. This is much better.
    $endgroup$
    – BjornW
    Mar 27 '18 at 8:37



















27












$begingroup$

Step by step math explaining how transpose convolution does 2x upsampling with 3x3 filter and stride of 2:



enter image description here



The simplest TensorFlow snippet to validate the math:



import tensorflow as tf
import numpy as np

def test_conv2d_transpose():
# input batch shape = (1, 2, 2, 1) -> (batch_size, height, width, channels) - 2x2x1 image in batch of 1
x = tf.constant(np.array([[
[[1], [2]],
[[3], [4]]
]]), tf.float32)

# shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
f = tf.constant(np.array([
[[[1]], [[1]], [[1]]],
[[[1]], [[1]], [[1]]],
[[[1]], [[1]], [[1]]]
]), tf.float32)

conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 4, 4, 1), strides=[1, 2, 2, 1], padding='SAME')

with tf.Session() as session:
result = session.run(conv)

assert (np.array([[
[[1.0], [1.0], [3.0], [2.0]],
[[1.0], [1.0], [3.0], [2.0]],
[[4.0], [4.0], [10.0], [6.0]],
[[3.0], [3.0], [7.0], [4.0]]]]) == result).all()





share|improve this answer











$endgroup$













  • $begingroup$
    I think your calculation is wrong here. The intermediate output should be 3+ 2*2=7, then for a 3x3 kernel the final output should be 7-3+1 = 5x5
    $endgroup$
    – Alex
    Nov 14 '17 at 14:59










  • $begingroup$
    Sorry, @Alex, but I fail to understand why intermediate output is 7. Can you please elaborate?
    $endgroup$
    – andriys
    Nov 19 '17 at 9:49












  • $begingroup$
    @andriys In the image that you've shown, why is the final result cropped?
    $endgroup$
    – James Bond
    Jun 25 '18 at 13:29



















23












$begingroup$

The notes that accompany Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition, by Andrej Karpathy, do an excellent job of explaining convolutional neural networks.



Reading this paper should give you a rough idea about:




  • Deconvolutional Networks
    Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor and Rob Fergus
    Dept. of Computer Science, Courant Institute, New York University


These slides are great for Deconvolutional Networks.






share|improve this answer











$endgroup$









  • 24




    $begingroup$
    Is it possible to summarise the content of any one of those links, in a short paragraph? The links might be useful for further research, but ideally a stack exchange answer should have enough text to address the basic question without needing to go off site.
    $endgroup$
    – Neil Slater
    Jun 20 '15 at 7:01












  • $begingroup$
    I am sorry but the content of these pages is too large to be summarized in a short paragraph.
    $endgroup$
    – Azrael
    Jun 20 '15 at 9:11






  • 12




    $begingroup$
    A full summary is not required, just a headline - e.g. "A deconvolutional neural network is similar to a CNN, but is trained so that features in any hidden layer can be used to reconstruct the previous layer (and by repetition across layers, eventually the input could be reconstructed from the output). This allows it to be trained unsupervised in order to learn generic high-level features in a problem domain - usually image processing" (note I am not even sure if that is correct, hence not writing my own answer).
    $endgroup$
    – Neil Slater
    Jun 20 '15 at 11:08








  • 5




    $begingroup$
    Although the links are good, a brief summary of the model in your own words would have been better.
    $endgroup$
    – SmallChess
    Dec 19 '15 at 13:34



















7












$begingroup$

Just found a great article from the theaon website on this topic [1]:




The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, [...] to project feature maps to a higher-dimensional space.
[...]
i.e., map from a 4-dimensional space to a 16-dimensional space, while keeping the connectivity pattern of the convolution.



Transposed convolutions – also called fractionally strided convolutions – work by swapping the forward and backward passes of a convolution. One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.



The transposed convolution operation can be thought of as the gradient of some convolution with respect to its input, which is usually how transposed convolutions are implemented in practice.



Finally note that it is always possible to implement a transposed convolution with a direct convolution. The disadvantage is that it usually involves adding many columns and rows of zeros to the input, resulting in a much less efficient implementation.




So in simplespeak, a "transposed convolution" is mathematical operation using matrices (just like convolution) but is more efficient than the normal convolution operation in the case when you want to go back from the convolved values to the original (opposite direction). This is why it is preferred in implementations to convolution when computing the opposite direction (i.e. to avoid many unnecessary 0 multiplications caused by the sparse matrix that results from padding the input).



Image ---> convolution ---> Result



Result ---> transposed convolution ---> "originalish Image"



Sometimes you save some values along the convolution path and reuse that information when "going back":



Result ---> transposed convolution ---> Image



That's probably the reason why it's wrongly called a "deconvolution". However, it does have something to do with the matrix transpose of the convolution (C^T), hence the more appropriate name "transposed convolution".



So it makes a lot of sense when considering computing cost. You'd pay a lot more for amazon gpus if you wouldn't use the transposed convolution.



Read and watch the animations here carefully:
http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#no-zero-padding-unit-strides-transposed



Some other relevant reading:




The transpose (or more generally, the Hermitian or conjugate transpose) of a filter is simply the matched filter[3]. This is found by time reversing the kernel and taking the conjugate of all the values[2].




I am also new to this and would be grateful for any feedback or corrections.



[1] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html



[2] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#transposed-convolution-arithmetic



[3] https://en.wikipedia.org/wiki/Matched_filter






share|improve this answer











$endgroup$









  • 1




    $begingroup$
    Nit picking, but the link should be: deeplearning.net/software/theano_versions/dev/tutorial/…
    $endgroup$
    – Herbert
    Jan 19 '17 at 13:05






  • 1




    $begingroup$
    I think this is the best answer!!!
    $endgroup$
    – kli_nlpr
    Mar 31 '17 at 7:58



















7












$begingroup$

We could use PCA for analogy.



When using conv, the forward pass is to extract the coefficients of principle components from the input image, and the backward pass (that updates the input) is to use (the gradient of) the coefficients to reconstruct a new input image, so that the new input image has PC coefficients that better match the desired coefficients.



When using deconv, the forward pass and the backward pass are reversed. The forward pass tries to reconstruct an image from PC coefficients, and the backward pass updates the PC coefficients given (the gradient of) the image.



The deconv forward pass does exactly the conv gradient computation given in this post:
http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/



That's why in the caffe implementation of deconv (refer to Andrei Pokrovsky's answer), the deconv forward pass calls backward_cpu_gemm(), and the backward pass calls forward_cpu_gemm().






share|improve this answer











$endgroup$





















    4












    $begingroup$

    In addition to David Dao's answer: It is also possible to think the other way around. Instead of focusing on which (low resolution) input pixels are used to produce a single output pixel, you can also focus on which individual input pixels contribute to which region of output pixels.



    This is done in this distill publication, including a series of very intuitive and interactive visualizations. One advantage of thinking in this direction is that explaining checkerboard artifacts becomes easy.






    share|improve this answer











    $endgroup$





















      0












      $begingroup$

      Convolutions from a DSP perspective



      I'm a bit late to this but still would like to share my perspective and insights. My background is theoretical physics and digital signal processing. In particular I studied wavelets and convolutions are almost in my backbone ;)



      The way people in the deep learning community talk about convolutions was also confusing to me. From my perspective what seems to be missing is a proper separation of concerns. I will explain the deep learning convolutions using some DSP tools.



      Disclaimer



      My explanations will be a bit hand-wavy and not mathematical rigorous in order to get the main points across.






      Definitions



      Let's define a few things first. I limit my discussion to one dimensional (the extension to more dimension is straight forward) infinite (so we don't need to mess with boundaries) sequences $x_n = {x_n}_{n=-infty}^{infty} = {dots, x_{-1}, x_{0}, x_{1}, dots }$.



      A pure (discrete) convolution between two sequences $y_n$ and $x_n$ is defined as



      $$ (y * x)_n = sum_{k=-infty}^{infty} y_{n-k} x_k $$



      If we write this in terms of matrix vector operations it looks like this (assuming a simple kernel $mathbf{q} = (q_0,q_1,q_2)$ and vector $mathbf{x} = (x_0, x_1, x_2, x_3)^T$):



      $$ mathbf{q} * mathbf{x} =
      left( begin{array}{cccc}
      q_1 & q_0 & 0 & 0 \
      q_2 & q_1 & q_0 & 0 \
      0 & q_2 & q_1 & q_0 \
      0 & 0 & q_2 & q_1 \
      end{array} right)
      left( begin{array}{cccc}
      x_0 \ x_1 \ x_2 \ x_3
      end{array} right)
      $$



      Let's introduce the down- and up-sampling operators, $downarrow$ and $uparrow$, respectively. Downsampling by factor $k in mathbb{N}$ is removing all samples except every k-th one:



      $$ downarrow_k!x_n = x_{nk} $$



      And upsampling by factor $k$ is interleaving $k-1$ zeros between the samples:



      $$ uparrow_k!x_n = left { begin{array}{ll}
      x_{n/k} & n/k in mathbb{Z} \
      0 & text{otherwise}
      end{array} right.
      $$



      E.g. we have for $k=3$:



      $$ downarrow_3!{ dots, x_0, x_1, x_2, x_3, x_4, x_5, x_6, dots } = { dots, x_0, x_3, x_6, dots } $$
      $$ uparrow_3!{ dots, x_0, x_1, x_2, dots } = { dots x_0, 0, 0, x_1, 0, 0, x_2, 0, 0, dots } $$



      or written in terms of matrix operations (here $k=2$):



      $$ downarrow_2!x =
      left( begin{array}{cc}
      x_0 \ x_2
      end{array} right) =
      left( begin{array}{cccc}
      1 & 0 & 0 & 0 \
      0 & 0 & 1 & 0 \
      end{array} right)
      left( begin{array}{cccc}
      x_0 \ x_1 \ x_2 \ x_3
      end{array} right)
      $$



      and



      $$ uparrow_2!x =
      left( begin{array}{cccc}
      x_0 \ 0 \ x_1 \ 0
      end{array} right) =
      left( begin{array}{cc}
      1 & 0 \
      0 & 0 \
      0 & 1 \
      0 & 0 \
      end{array} right)
      left( begin{array}{cc}
      x_0 \ x_1
      end{array} right)
      $$



      As one can already see, the down- and up-sample operators are mutually transposed, i.e. $uparrow_k = downarrow_k^T$.






      Deep Learning Convolutions by Parts



      Let's look at the typical convolutions used in deep learning and how we write them. Given some kernel $mathbf{q}$ and vector $mathbf{x}$ we have the following:




      • a strided convolution with stride $k$ is $downarrow_k!(mathbf{q} * mathbf{x})$,

      • a dilated convolution with factor $k$ is $(uparrow_k!mathbf{q}) * mathbf{x}$,

      • a transposed convolution (or deconvolution) with stride $k$ is $ mathbf{q} * (uparrow_k!mathbf{x})$


      Let's rearrange the transposed convolution a bit:
      $$
      mathbf{q} * (uparrow_k!mathbf{x}) ; = ;
      mathbf{q} * (downarrow_k^T!mathbf{x}) ; = ;
      (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
      $$



      In this notation $(mathbf{q}*)$ must be read as an operator, i.e. it abstracts convolving something with kernel $mathbf{q}$.
      Or written in matrix operations (example):



      $$
      begin{align}
      mathbf{q} * (uparrow_k!mathbf{x}) & =
      left( begin{array}{cccc}
      q_1 & q_0 & 0 & 0 \
      q_2 & q_1 & q_0 & 0 \
      0 & q_2 & q_1 & q_0 \
      0 & 0 & q_2 & q_1 \
      end{array} right)
      left( begin{array}{cc}
      1 & 0 \
      0 & 0 \
      0 & 1 \
      0 & 0 \
      end{array} right)
      left( begin{array}{c}
      x_0\
      x_1\
      end{array} right)
      \ & =
      left( begin{array}{cccc}
      q_1 & q_2 & 0 & 0 \
      q_0 & q_1 & q_2 & 0 \
      0 & q_0 & q_1 & q_2 \
      0 & 0 & q_0 & q_1 \
      end{array} right)^T
      left( begin{array}{cccc}
      1 & 0 & 0 & 0\
      0 & 0 & 1 & 0\
      end{array} right)^T
      left( begin{array}{c}
      x_0\
      x_1\
      end{array} right)
      \ & =
      left(
      left( begin{array}{cccc}
      1 & 0 & 0 & 0\
      0 & 0 & 1 & 0\
      end{array} right)
      left( begin{array}{cccc}
      q_1 & q_2 & 0 & 0 \
      q_0 & q_1 & q_2 & 0 \
      0 & q_0 & q_1 & q_2 \
      0 & 0 & q_0 & q_1 \
      end{array} right)
      right)^T
      left( begin{array}{c}
      x_0\
      x_1\
      end{array} right)
      \ & = (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
      end{align}
      $$



      As one can see the deconvolution is the transposed operation, thus, the name.



      Connection to Nearest Neighbor Upsampling



      Another common approach found in convolutional networks is upsampling with some built-in form of interpolation. Let's take upsampling by factor 2 with a simple repeat interpolation.
      This can be written as $uparrow_2!(1;1) * mathbf{x}$. If we also add a learnable kernel $mathbf{q}$ to this we have $uparrow_2!(1;1) * mathbf{q} * mathbf{x}$. The convolutions can be combined, e.g. for $mathbf{q}=(q_0;q_1;q_2)$, we have $$(1;1) * mathbf{q} = (q_0;;q_0!!+!q_1;;q_1!!+!q_2;;q_2),$$



      i.e. we can replace a repeat upsampler with factor 2 and a convolution with a kernel of size 3 by a transposed convolution with kernel size 4. This transposed convolution has the same "interpolation capacity" but would be able to learn better matching interpolations.






      Conclusions and Final Remarks



      I hope I could clarify some common convolutions found in deep learning a bit by taking them apart in the fundamental operations.



      I didn't cover pooling here. But this is just a nonlinear downsampler and can be treated within this notation as well.






      share|improve this answer








      New contributor




      André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$





















        -1












        $begingroup$

        The following paper discusses deconvolutional layers.Both from the architectural and training point of view.Deconvolutional networks






        share|improve this answer









        $endgroup$









        • 1




          $begingroup$
          This does not add any value to this answer
          $endgroup$
          – Martin Thoma
          Jan 19 '17 at 12:40











        Your Answer





        StackExchange.ifUsing("editor", function () {
        return StackExchange.using("mathjaxEditing", function () {
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        });
        });
        }, "mathjax-editing");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "557"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f6107%2fwhat-are-deconvolutional-layers%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        9 Answers
        9






        active

        oldest

        votes








        9 Answers
        9






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        168












        $begingroup$

        Deconvolution layer is a very unfortunate name and should rather be called a transposed convolutional layer.



        Visually, for a transposed convolution with stride one and no padding, we just pad the original input (blue entries) with zeroes (white entries) (Figure 1).



        Figure 1



        In case of stride two and padding, the transposed convolution would look like this (Figure 2):



        Figure 2



        You can find more (great) visualisations of convolutional arithmetics here.






        share|improve this answer









        $endgroup$









        • 13




          $begingroup$
          Just to make sure I understood it: "Deconvolution" is pretty much the same as convolution, but you add some padding? (Around the image / when s > 1 also around each pixel)?
          $endgroup$
          – Martin Thoma
          Jun 8 '16 at 5:00






        • 14




          $begingroup$
          Yes, a deconvolution layer performs also convolution! That is why transposed convolution fits so much better as name and the term deconvolution is actually misleading.
          $endgroup$
          – David Dao
          Jun 30 '16 at 20:47






        • 9




          $begingroup$
          Why do you say "no padding" in Figure 1, if actually input is zero-padded?
          $endgroup$
          – Stas S
          Jul 30 '16 at 13:06






        • 5




          $begingroup$
          By the way: It is called transposed convolution now in TensorFlow: tensorflow.org/versions/r0.10/api_docs/python/…
          $endgroup$
          – Martin Thoma
          Aug 8 '16 at 14:08






        • 5




          $begingroup$
          Thanks for this very intuitive answer, but I'm confused about why the second one is the 'stride two' case, it behaves exactly like the first one when kernel moves.
          $endgroup$
          – Demonedge
          Aug 10 '16 at 14:01
















        168












        $begingroup$

        Deconvolution layer is a very unfortunate name and should rather be called a transposed convolutional layer.



        Visually, for a transposed convolution with stride one and no padding, we just pad the original input (blue entries) with zeroes (white entries) (Figure 1).



        Figure 1



        In case of stride two and padding, the transposed convolution would look like this (Figure 2):



        Figure 2



        You can find more (great) visualisations of convolutional arithmetics here.






        share|improve this answer









        $endgroup$









        • 13




          $begingroup$
          Just to make sure I understood it: "Deconvolution" is pretty much the same as convolution, but you add some padding? (Around the image / when s > 1 also around each pixel)?
          $endgroup$
          – Martin Thoma
          Jun 8 '16 at 5:00






        • 14




          $begingroup$
          Yes, a deconvolution layer performs also convolution! That is why transposed convolution fits so much better as name and the term deconvolution is actually misleading.
          $endgroup$
          – David Dao
          Jun 30 '16 at 20:47






        • 9




          $begingroup$
          Why do you say "no padding" in Figure 1, if actually input is zero-padded?
          $endgroup$
          – Stas S
          Jul 30 '16 at 13:06






        • 5




          $begingroup$
          By the way: It is called transposed convolution now in TensorFlow: tensorflow.org/versions/r0.10/api_docs/python/…
          $endgroup$
          – Martin Thoma
          Aug 8 '16 at 14:08






        • 5




          $begingroup$
          Thanks for this very intuitive answer, but I'm confused about why the second one is the 'stride two' case, it behaves exactly like the first one when kernel moves.
          $endgroup$
          – Demonedge
          Aug 10 '16 at 14:01














        168












        168








        168





        $begingroup$

        Deconvolution layer is a very unfortunate name and should rather be called a transposed convolutional layer.



        Visually, for a transposed convolution with stride one and no padding, we just pad the original input (blue entries) with zeroes (white entries) (Figure 1).



        Figure 1



        In case of stride two and padding, the transposed convolution would look like this (Figure 2):



        Figure 2



        You can find more (great) visualisations of convolutional arithmetics here.






        share|improve this answer









        $endgroup$



        Deconvolution layer is a very unfortunate name and should rather be called a transposed convolutional layer.



        Visually, for a transposed convolution with stride one and no padding, we just pad the original input (blue entries) with zeroes (white entries) (Figure 1).



        Figure 1



        In case of stride two and padding, the transposed convolution would look like this (Figure 2):



        Figure 2



        You can find more (great) visualisations of convolutional arithmetics here.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 7 '16 at 20:15









        David DaoDavid Dao

        1,910176




        1,910176








        • 13




          $begingroup$
          Just to make sure I understood it: "Deconvolution" is pretty much the same as convolution, but you add some padding? (Around the image / when s > 1 also around each pixel)?
          $endgroup$
          – Martin Thoma
          Jun 8 '16 at 5:00






        • 14




          $begingroup$
          Yes, a deconvolution layer performs also convolution! That is why transposed convolution fits so much better as name and the term deconvolution is actually misleading.
          $endgroup$
          – David Dao
          Jun 30 '16 at 20:47






        • 9




          $begingroup$
          Why do you say "no padding" in Figure 1, if actually input is zero-padded?
          $endgroup$
          – Stas S
          Jul 30 '16 at 13:06






        • 5




          $begingroup$
          By the way: It is called transposed convolution now in TensorFlow: tensorflow.org/versions/r0.10/api_docs/python/…
          $endgroup$
          – Martin Thoma
          Aug 8 '16 at 14:08






        • 5




          $begingroup$
          Thanks for this very intuitive answer, but I'm confused about why the second one is the 'stride two' case, it behaves exactly like the first one when kernel moves.
          $endgroup$
          – Demonedge
          Aug 10 '16 at 14:01














        • 13




          $begingroup$
          Just to make sure I understood it: "Deconvolution" is pretty much the same as convolution, but you add some padding? (Around the image / when s > 1 also around each pixel)?
          $endgroup$
          – Martin Thoma
          Jun 8 '16 at 5:00






        • 14




          $begingroup$
          Yes, a deconvolution layer performs also convolution! That is why transposed convolution fits so much better as name and the term deconvolution is actually misleading.
          $endgroup$
          – David Dao
          Jun 30 '16 at 20:47






        • 9




          $begingroup$
          Why do you say "no padding" in Figure 1, if actually input is zero-padded?
          $endgroup$
          – Stas S
          Jul 30 '16 at 13:06






        • 5




          $begingroup$
          By the way: It is called transposed convolution now in TensorFlow: tensorflow.org/versions/r0.10/api_docs/python/…
          $endgroup$
          – Martin Thoma
          Aug 8 '16 at 14:08






        • 5




          $begingroup$
          Thanks for this very intuitive answer, but I'm confused about why the second one is the 'stride two' case, it behaves exactly like the first one when kernel moves.
          $endgroup$
          – Demonedge
          Aug 10 '16 at 14:01








        13




        13




        $begingroup$
        Just to make sure I understood it: "Deconvolution" is pretty much the same as convolution, but you add some padding? (Around the image / when s > 1 also around each pixel)?
        $endgroup$
        – Martin Thoma
        Jun 8 '16 at 5:00




        $begingroup$
        Just to make sure I understood it: "Deconvolution" is pretty much the same as convolution, but you add some padding? (Around the image / when s > 1 also around each pixel)?
        $endgroup$
        – Martin Thoma
        Jun 8 '16 at 5:00




        14




        14




        $begingroup$
        Yes, a deconvolution layer performs also convolution! That is why transposed convolution fits so much better as name and the term deconvolution is actually misleading.
        $endgroup$
        – David Dao
        Jun 30 '16 at 20:47




        $begingroup$
        Yes, a deconvolution layer performs also convolution! That is why transposed convolution fits so much better as name and the term deconvolution is actually misleading.
        $endgroup$
        – David Dao
        Jun 30 '16 at 20:47




        9




        9




        $begingroup$
        Why do you say "no padding" in Figure 1, if actually input is zero-padded?
        $endgroup$
        – Stas S
        Jul 30 '16 at 13:06




        $begingroup$
        Why do you say "no padding" in Figure 1, if actually input is zero-padded?
        $endgroup$
        – Stas S
        Jul 30 '16 at 13:06




        5




        5




        $begingroup$
        By the way: It is called transposed convolution now in TensorFlow: tensorflow.org/versions/r0.10/api_docs/python/…
        $endgroup$
        – Martin Thoma
        Aug 8 '16 at 14:08




        $begingroup$
        By the way: It is called transposed convolution now in TensorFlow: tensorflow.org/versions/r0.10/api_docs/python/…
        $endgroup$
        – Martin Thoma
        Aug 8 '16 at 14:08




        5




        5




        $begingroup$
        Thanks for this very intuitive answer, but I'm confused about why the second one is the 'stride two' case, it behaves exactly like the first one when kernel moves.
        $endgroup$
        – Demonedge
        Aug 10 '16 at 14:01




        $begingroup$
        Thanks for this very intuitive answer, but I'm confused about why the second one is the 'stride two' case, it behaves exactly like the first one when kernel moves.
        $endgroup$
        – Demonedge
        Aug 10 '16 at 14:01











        38












        $begingroup$

        I think one way to get a really basic level intuition behind convolution is that you are sliding K filters, which you can think of as K stencils, over the input image and produce K activations - each one representing a degree of match with a particular stencil. The inverse operation of that would be to take K activations and expand them into a preimage of the convolution operation. The intuitive explanation of the inverse operation is therefore, roughly, image reconstruction given the stencils (filters) and activations (the degree of the match for each stencil) and therefore at the basic intuitive level we want to blow up each activation by the stencil's mask and add them up.



        Another way to approach understanding deconv would be to examine the deconvolution layer implementation in Caffe, see the following relevant bits of code:



        DeconvolutionLayer<Dtype>::Forward_gpu
        ConvolutionLayer<Dtype>::Backward_gpu
        CuDNNConvolutionLayer<Dtype>::Backward_gpu
        BaseConvolutionLayer<Dtype>::backward_cpu_gemm


        You can see that it's implemented in Caffe exactly as backprop for a regular forward convolutional layer (to me it was more obvious after i compared the implementation of backprop in cuDNN conv layer vs ConvolutionLayer::Backward_gpu implemented using GEMM). So if you work through how backpropagation is done for regular convolution you will understand what happens on a mechanical computation level. The way this computation works matches the intuition described in the first paragraph of this blurb.




        However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).




        To answer your other question inside your first question, there are two main differences between MLP backpropagation (fully connected layer) and convolutional nets:



        1) the influence of weights is localized, so first figure out how to do backprop for, say a 3x3 filter convolved with a small 3x3 area of an input image, mapping to a single point in the result image.



        2) the weights of convolutional filters are shared for spatial invariance. What this means in practice is that in the forward pass the same 3x3 filter with the same weights is dragged through the entire image with the same weights for forward computation to yield the output image (for that particular filter). What this means for backprop is that the backprop gradients for each point in the source image are summed over the entire range that we dragged that filter during the forward pass. Note that there are also different gradients of loss wrt x, w and bias since dLoss/dx needs to be backpropagated, and dLoss/dw is how we update the weights. w and bias are independent inputs in the computation DAG (there are no prior inputs), so there's no need to do backpropagation on those.



        (my notation here assumes that convolution is y = x*w+b where '*' is the convolution operation)





        share|improve this answer











        $endgroup$









        • 7




          $begingroup$
          I think this is the best answer for this question.
          $endgroup$
          – kli_nlpr
          Dec 25 '16 at 15:30






        • 7




          $begingroup$
          I agree that this is the best answer. The top answer has pretty animations, but until I read this answer they just looked like regular convolutions with some arbitrary padding to me. Oh how people are swayed by eye candy.
          $endgroup$
          – Reii Nakano
          Jun 10 '17 at 6:23












        • $begingroup$
          Agree, the accepted answer didn't explain anything. This is much better.
          $endgroup$
          – BjornW
          Mar 27 '18 at 8:37
















        38












        $begingroup$

        I think one way to get a really basic level intuition behind convolution is that you are sliding K filters, which you can think of as K stencils, over the input image and produce K activations - each one representing a degree of match with a particular stencil. The inverse operation of that would be to take K activations and expand them into a preimage of the convolution operation. The intuitive explanation of the inverse operation is therefore, roughly, image reconstruction given the stencils (filters) and activations (the degree of the match for each stencil) and therefore at the basic intuitive level we want to blow up each activation by the stencil's mask and add them up.



        Another way to approach understanding deconv would be to examine the deconvolution layer implementation in Caffe, see the following relevant bits of code:



        DeconvolutionLayer<Dtype>::Forward_gpu
        ConvolutionLayer<Dtype>::Backward_gpu
        CuDNNConvolutionLayer<Dtype>::Backward_gpu
        BaseConvolutionLayer<Dtype>::backward_cpu_gemm


        You can see that it's implemented in Caffe exactly as backprop for a regular forward convolutional layer (to me it was more obvious after i compared the implementation of backprop in cuDNN conv layer vs ConvolutionLayer::Backward_gpu implemented using GEMM). So if you work through how backpropagation is done for regular convolution you will understand what happens on a mechanical computation level. The way this computation works matches the intuition described in the first paragraph of this blurb.




        However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).




        To answer your other question inside your first question, there are two main differences between MLP backpropagation (fully connected layer) and convolutional nets:



        1) the influence of weights is localized, so first figure out how to do backprop for, say a 3x3 filter convolved with a small 3x3 area of an input image, mapping to a single point in the result image.



        2) the weights of convolutional filters are shared for spatial invariance. What this means in practice is that in the forward pass the same 3x3 filter with the same weights is dragged through the entire image with the same weights for forward computation to yield the output image (for that particular filter). What this means for backprop is that the backprop gradients for each point in the source image are summed over the entire range that we dragged that filter during the forward pass. Note that there are also different gradients of loss wrt x, w and bias since dLoss/dx needs to be backpropagated, and dLoss/dw is how we update the weights. w and bias are independent inputs in the computation DAG (there are no prior inputs), so there's no need to do backpropagation on those.



        (my notation here assumes that convolution is y = x*w+b where '*' is the convolution operation)





        share|improve this answer











        $endgroup$









        • 7




          $begingroup$
          I think this is the best answer for this question.
          $endgroup$
          – kli_nlpr
          Dec 25 '16 at 15:30






        • 7




          $begingroup$
          I agree that this is the best answer. The top answer has pretty animations, but until I read this answer they just looked like regular convolutions with some arbitrary padding to me. Oh how people are swayed by eye candy.
          $endgroup$
          – Reii Nakano
          Jun 10 '17 at 6:23












        • $begingroup$
          Agree, the accepted answer didn't explain anything. This is much better.
          $endgroup$
          – BjornW
          Mar 27 '18 at 8:37














        38












        38








        38





        $begingroup$

        I think one way to get a really basic level intuition behind convolution is that you are sliding K filters, which you can think of as K stencils, over the input image and produce K activations - each one representing a degree of match with a particular stencil. The inverse operation of that would be to take K activations and expand them into a preimage of the convolution operation. The intuitive explanation of the inverse operation is therefore, roughly, image reconstruction given the stencils (filters) and activations (the degree of the match for each stencil) and therefore at the basic intuitive level we want to blow up each activation by the stencil's mask and add them up.



        Another way to approach understanding deconv would be to examine the deconvolution layer implementation in Caffe, see the following relevant bits of code:



        DeconvolutionLayer<Dtype>::Forward_gpu
        ConvolutionLayer<Dtype>::Backward_gpu
        CuDNNConvolutionLayer<Dtype>::Backward_gpu
        BaseConvolutionLayer<Dtype>::backward_cpu_gemm


        You can see that it's implemented in Caffe exactly as backprop for a regular forward convolutional layer (to me it was more obvious after i compared the implementation of backprop in cuDNN conv layer vs ConvolutionLayer::Backward_gpu implemented using GEMM). So if you work through how backpropagation is done for regular convolution you will understand what happens on a mechanical computation level. The way this computation works matches the intuition described in the first paragraph of this blurb.




        However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).




        To answer your other question inside your first question, there are two main differences between MLP backpropagation (fully connected layer) and convolutional nets:



        1) the influence of weights is localized, so first figure out how to do backprop for, say a 3x3 filter convolved with a small 3x3 area of an input image, mapping to a single point in the result image.



        2) the weights of convolutional filters are shared for spatial invariance. What this means in practice is that in the forward pass the same 3x3 filter with the same weights is dragged through the entire image with the same weights for forward computation to yield the output image (for that particular filter). What this means for backprop is that the backprop gradients for each point in the source image are summed over the entire range that we dragged that filter during the forward pass. Note that there are also different gradients of loss wrt x, w and bias since dLoss/dx needs to be backpropagated, and dLoss/dw is how we update the weights. w and bias are independent inputs in the computation DAG (there are no prior inputs), so there's no need to do backpropagation on those.



        (my notation here assumes that convolution is y = x*w+b where '*' is the convolution operation)





        share|improve this answer











        $endgroup$



        I think one way to get a really basic level intuition behind convolution is that you are sliding K filters, which you can think of as K stencils, over the input image and produce K activations - each one representing a degree of match with a particular stencil. The inverse operation of that would be to take K activations and expand them into a preimage of the convolution operation. The intuitive explanation of the inverse operation is therefore, roughly, image reconstruction given the stencils (filters) and activations (the degree of the match for each stencil) and therefore at the basic intuitive level we want to blow up each activation by the stencil's mask and add them up.



        Another way to approach understanding deconv would be to examine the deconvolution layer implementation in Caffe, see the following relevant bits of code:



        DeconvolutionLayer<Dtype>::Forward_gpu
        ConvolutionLayer<Dtype>::Backward_gpu
        CuDNNConvolutionLayer<Dtype>::Backward_gpu
        BaseConvolutionLayer<Dtype>::backward_cpu_gemm


        You can see that it's implemented in Caffe exactly as backprop for a regular forward convolutional layer (to me it was more obvious after i compared the implementation of backprop in cuDNN conv layer vs ConvolutionLayer::Backward_gpu implemented using GEMM). So if you work through how backpropagation is done for regular convolution you will understand what happens on a mechanical computation level. The way this computation works matches the intuition described in the first paragraph of this blurb.




        However, I don't know how the learning of convolutional layers works. (I understand how simple MLPs learn with gradient descent, if that helps).




        To answer your other question inside your first question, there are two main differences between MLP backpropagation (fully connected layer) and convolutional nets:



        1) the influence of weights is localized, so first figure out how to do backprop for, say a 3x3 filter convolved with a small 3x3 area of an input image, mapping to a single point in the result image.



        2) the weights of convolutional filters are shared for spatial invariance. What this means in practice is that in the forward pass the same 3x3 filter with the same weights is dragged through the entire image with the same weights for forward computation to yield the output image (for that particular filter). What this means for backprop is that the backprop gradients for each point in the source image are summed over the entire range that we dragged that filter during the forward pass. Note that there are also different gradients of loss wrt x, w and bias since dLoss/dx needs to be backpropagated, and dLoss/dw is how we update the weights. w and bias are independent inputs in the computation DAG (there are no prior inputs), so there's no need to do backpropagation on those.



        (my notation here assumes that convolution is y = x*w+b where '*' is the convolution operation)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jun 25 '18 at 21:16









        n1k31t4

        5,5422218




        5,5422218










        answered Jan 26 '16 at 21:24









        Andrei PokrovskyAndrei Pokrovsky

        48144




        48144








        • 7




          $begingroup$
          I think this is the best answer for this question.
          $endgroup$
          – kli_nlpr
          Dec 25 '16 at 15:30






        • 7




          $begingroup$
          I agree that this is the best answer. The top answer has pretty animations, but until I read this answer they just looked like regular convolutions with some arbitrary padding to me. Oh how people are swayed by eye candy.
          $endgroup$
          – Reii Nakano
          Jun 10 '17 at 6:23












        • $begingroup$
          Agree, the accepted answer didn't explain anything. This is much better.
          $endgroup$
          – BjornW
          Mar 27 '18 at 8:37














        • 7




          $begingroup$
          I think this is the best answer for this question.
          $endgroup$
          – kli_nlpr
          Dec 25 '16 at 15:30






        • 7




          $begingroup$
          I agree that this is the best answer. The top answer has pretty animations, but until I read this answer they just looked like regular convolutions with some arbitrary padding to me. Oh how people are swayed by eye candy.
          $endgroup$
          – Reii Nakano
          Jun 10 '17 at 6:23












        • $begingroup$
          Agree, the accepted answer didn't explain anything. This is much better.
          $endgroup$
          – BjornW
          Mar 27 '18 at 8:37








        7




        7




        $begingroup$
        I think this is the best answer for this question.
        $endgroup$
        – kli_nlpr
        Dec 25 '16 at 15:30




        $begingroup$
        I think this is the best answer for this question.
        $endgroup$
        – kli_nlpr
        Dec 25 '16 at 15:30




        7




        7




        $begingroup$
        I agree that this is the best answer. The top answer has pretty animations, but until I read this answer they just looked like regular convolutions with some arbitrary padding to me. Oh how people are swayed by eye candy.
        $endgroup$
        – Reii Nakano
        Jun 10 '17 at 6:23






        $begingroup$
        I agree that this is the best answer. The top answer has pretty animations, but until I read this answer they just looked like regular convolutions with some arbitrary padding to me. Oh how people are swayed by eye candy.
        $endgroup$
        – Reii Nakano
        Jun 10 '17 at 6:23














        $begingroup$
        Agree, the accepted answer didn't explain anything. This is much better.
        $endgroup$
        – BjornW
        Mar 27 '18 at 8:37




        $begingroup$
        Agree, the accepted answer didn't explain anything. This is much better.
        $endgroup$
        – BjornW
        Mar 27 '18 at 8:37











        27












        $begingroup$

        Step by step math explaining how transpose convolution does 2x upsampling with 3x3 filter and stride of 2:



        enter image description here



        The simplest TensorFlow snippet to validate the math:



        import tensorflow as tf
        import numpy as np

        def test_conv2d_transpose():
        # input batch shape = (1, 2, 2, 1) -> (batch_size, height, width, channels) - 2x2x1 image in batch of 1
        x = tf.constant(np.array([[
        [[1], [2]],
        [[3], [4]]
        ]]), tf.float32)

        # shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
        f = tf.constant(np.array([
        [[[1]], [[1]], [[1]]],
        [[[1]], [[1]], [[1]]],
        [[[1]], [[1]], [[1]]]
        ]), tf.float32)

        conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 4, 4, 1), strides=[1, 2, 2, 1], padding='SAME')

        with tf.Session() as session:
        result = session.run(conv)

        assert (np.array([[
        [[1.0], [1.0], [3.0], [2.0]],
        [[1.0], [1.0], [3.0], [2.0]],
        [[4.0], [4.0], [10.0], [6.0]],
        [[3.0], [3.0], [7.0], [4.0]]]]) == result).all()





        share|improve this answer











        $endgroup$













        • $begingroup$
          I think your calculation is wrong here. The intermediate output should be 3+ 2*2=7, then for a 3x3 kernel the final output should be 7-3+1 = 5x5
          $endgroup$
          – Alex
          Nov 14 '17 at 14:59










        • $begingroup$
          Sorry, @Alex, but I fail to understand why intermediate output is 7. Can you please elaborate?
          $endgroup$
          – andriys
          Nov 19 '17 at 9:49












        • $begingroup$
          @andriys In the image that you've shown, why is the final result cropped?
          $endgroup$
          – James Bond
          Jun 25 '18 at 13:29
















        27












        $begingroup$

        Step by step math explaining how transpose convolution does 2x upsampling with 3x3 filter and stride of 2:



        enter image description here



        The simplest TensorFlow snippet to validate the math:



        import tensorflow as tf
        import numpy as np

        def test_conv2d_transpose():
        # input batch shape = (1, 2, 2, 1) -> (batch_size, height, width, channels) - 2x2x1 image in batch of 1
        x = tf.constant(np.array([[
        [[1], [2]],
        [[3], [4]]
        ]]), tf.float32)

        # shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
        f = tf.constant(np.array([
        [[[1]], [[1]], [[1]]],
        [[[1]], [[1]], [[1]]],
        [[[1]], [[1]], [[1]]]
        ]), tf.float32)

        conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 4, 4, 1), strides=[1, 2, 2, 1], padding='SAME')

        with tf.Session() as session:
        result = session.run(conv)

        assert (np.array([[
        [[1.0], [1.0], [3.0], [2.0]],
        [[1.0], [1.0], [3.0], [2.0]],
        [[4.0], [4.0], [10.0], [6.0]],
        [[3.0], [3.0], [7.0], [4.0]]]]) == result).all()





        share|improve this answer











        $endgroup$













        • $begingroup$
          I think your calculation is wrong here. The intermediate output should be 3+ 2*2=7, then for a 3x3 kernel the final output should be 7-3+1 = 5x5
          $endgroup$
          – Alex
          Nov 14 '17 at 14:59










        • $begingroup$
          Sorry, @Alex, but I fail to understand why intermediate output is 7. Can you please elaborate?
          $endgroup$
          – andriys
          Nov 19 '17 at 9:49












        • $begingroup$
          @andriys In the image that you've shown, why is the final result cropped?
          $endgroup$
          – James Bond
          Jun 25 '18 at 13:29














        27












        27








        27





        $begingroup$

        Step by step math explaining how transpose convolution does 2x upsampling with 3x3 filter and stride of 2:



        enter image description here



        The simplest TensorFlow snippet to validate the math:



        import tensorflow as tf
        import numpy as np

        def test_conv2d_transpose():
        # input batch shape = (1, 2, 2, 1) -> (batch_size, height, width, channels) - 2x2x1 image in batch of 1
        x = tf.constant(np.array([[
        [[1], [2]],
        [[3], [4]]
        ]]), tf.float32)

        # shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
        f = tf.constant(np.array([
        [[[1]], [[1]], [[1]]],
        [[[1]], [[1]], [[1]]],
        [[[1]], [[1]], [[1]]]
        ]), tf.float32)

        conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 4, 4, 1), strides=[1, 2, 2, 1], padding='SAME')

        with tf.Session() as session:
        result = session.run(conv)

        assert (np.array([[
        [[1.0], [1.0], [3.0], [2.0]],
        [[1.0], [1.0], [3.0], [2.0]],
        [[4.0], [4.0], [10.0], [6.0]],
        [[3.0], [3.0], [7.0], [4.0]]]]) == result).all()





        share|improve this answer











        $endgroup$



        Step by step math explaining how transpose convolution does 2x upsampling with 3x3 filter and stride of 2:



        enter image description here



        The simplest TensorFlow snippet to validate the math:



        import tensorflow as tf
        import numpy as np

        def test_conv2d_transpose():
        # input batch shape = (1, 2, 2, 1) -> (batch_size, height, width, channels) - 2x2x1 image in batch of 1
        x = tf.constant(np.array([[
        [[1], [2]],
        [[3], [4]]
        ]]), tf.float32)

        # shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
        f = tf.constant(np.array([
        [[[1]], [[1]], [[1]]],
        [[[1]], [[1]], [[1]]],
        [[[1]], [[1]], [[1]]]
        ]), tf.float32)

        conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 4, 4, 1), strides=[1, 2, 2, 1], padding='SAME')

        with tf.Session() as session:
        result = session.run(conv)

        assert (np.array([[
        [[1.0], [1.0], [3.0], [2.0]],
        [[1.0], [1.0], [3.0], [2.0]],
        [[4.0], [4.0], [10.0], [6.0]],
        [[3.0], [3.0], [7.0], [4.0]]]]) == result).all()






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 '17 at 9:45

























        answered Jul 4 '17 at 22:09









        andriysandriys

        37133




        37133












        • $begingroup$
          I think your calculation is wrong here. The intermediate output should be 3+ 2*2=7, then for a 3x3 kernel the final output should be 7-3+1 = 5x5
          $endgroup$
          – Alex
          Nov 14 '17 at 14:59










        • $begingroup$
          Sorry, @Alex, but I fail to understand why intermediate output is 7. Can you please elaborate?
          $endgroup$
          – andriys
          Nov 19 '17 at 9:49












        • $begingroup$
          @andriys In the image that you've shown, why is the final result cropped?
          $endgroup$
          – James Bond
          Jun 25 '18 at 13:29


















        • $begingroup$
          I think your calculation is wrong here. The intermediate output should be 3+ 2*2=7, then for a 3x3 kernel the final output should be 7-3+1 = 5x5
          $endgroup$
          – Alex
          Nov 14 '17 at 14:59










        • $begingroup$
          Sorry, @Alex, but I fail to understand why intermediate output is 7. Can you please elaborate?
          $endgroup$
          – andriys
          Nov 19 '17 at 9:49












        • $begingroup$
          @andriys In the image that you've shown, why is the final result cropped?
          $endgroup$
          – James Bond
          Jun 25 '18 at 13:29
















        $begingroup$
        I think your calculation is wrong here. The intermediate output should be 3+ 2*2=7, then for a 3x3 kernel the final output should be 7-3+1 = 5x5
        $endgroup$
        – Alex
        Nov 14 '17 at 14:59




        $begingroup$
        I think your calculation is wrong here. The intermediate output should be 3+ 2*2=7, then for a 3x3 kernel the final output should be 7-3+1 = 5x5
        $endgroup$
        – Alex
        Nov 14 '17 at 14:59












        $begingroup$
        Sorry, @Alex, but I fail to understand why intermediate output is 7. Can you please elaborate?
        $endgroup$
        – andriys
        Nov 19 '17 at 9:49






        $begingroup$
        Sorry, @Alex, but I fail to understand why intermediate output is 7. Can you please elaborate?
        $endgroup$
        – andriys
        Nov 19 '17 at 9:49














        $begingroup$
        @andriys In the image that you've shown, why is the final result cropped?
        $endgroup$
        – James Bond
        Jun 25 '18 at 13:29




        $begingroup$
        @andriys In the image that you've shown, why is the final result cropped?
        $endgroup$
        – James Bond
        Jun 25 '18 at 13:29











        23












        $begingroup$

        The notes that accompany Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition, by Andrej Karpathy, do an excellent job of explaining convolutional neural networks.



        Reading this paper should give you a rough idea about:




        • Deconvolutional Networks
          Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor and Rob Fergus
          Dept. of Computer Science, Courant Institute, New York University


        These slides are great for Deconvolutional Networks.






        share|improve this answer











        $endgroup$









        • 24




          $begingroup$
          Is it possible to summarise the content of any one of those links, in a short paragraph? The links might be useful for further research, but ideally a stack exchange answer should have enough text to address the basic question without needing to go off site.
          $endgroup$
          – Neil Slater
          Jun 20 '15 at 7:01












        • $begingroup$
          I am sorry but the content of these pages is too large to be summarized in a short paragraph.
          $endgroup$
          – Azrael
          Jun 20 '15 at 9:11






        • 12




          $begingroup$
          A full summary is not required, just a headline - e.g. "A deconvolutional neural network is similar to a CNN, but is trained so that features in any hidden layer can be used to reconstruct the previous layer (and by repetition across layers, eventually the input could be reconstructed from the output). This allows it to be trained unsupervised in order to learn generic high-level features in a problem domain - usually image processing" (note I am not even sure if that is correct, hence not writing my own answer).
          $endgroup$
          – Neil Slater
          Jun 20 '15 at 11:08








        • 5




          $begingroup$
          Although the links are good, a brief summary of the model in your own words would have been better.
          $endgroup$
          – SmallChess
          Dec 19 '15 at 13:34
















        23












        $begingroup$

        The notes that accompany Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition, by Andrej Karpathy, do an excellent job of explaining convolutional neural networks.



        Reading this paper should give you a rough idea about:




        • Deconvolutional Networks
          Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor and Rob Fergus
          Dept. of Computer Science, Courant Institute, New York University


        These slides are great for Deconvolutional Networks.






        share|improve this answer











        $endgroup$









        • 24




          $begingroup$
          Is it possible to summarise the content of any one of those links, in a short paragraph? The links might be useful for further research, but ideally a stack exchange answer should have enough text to address the basic question without needing to go off site.
          $endgroup$
          – Neil Slater
          Jun 20 '15 at 7:01












        • $begingroup$
          I am sorry but the content of these pages is too large to be summarized in a short paragraph.
          $endgroup$
          – Azrael
          Jun 20 '15 at 9:11






        • 12




          $begingroup$
          A full summary is not required, just a headline - e.g. "A deconvolutional neural network is similar to a CNN, but is trained so that features in any hidden layer can be used to reconstruct the previous layer (and by repetition across layers, eventually the input could be reconstructed from the output). This allows it to be trained unsupervised in order to learn generic high-level features in a problem domain - usually image processing" (note I am not even sure if that is correct, hence not writing my own answer).
          $endgroup$
          – Neil Slater
          Jun 20 '15 at 11:08








        • 5




          $begingroup$
          Although the links are good, a brief summary of the model in your own words would have been better.
          $endgroup$
          – SmallChess
          Dec 19 '15 at 13:34














        23












        23








        23





        $begingroup$

        The notes that accompany Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition, by Andrej Karpathy, do an excellent job of explaining convolutional neural networks.



        Reading this paper should give you a rough idea about:




        • Deconvolutional Networks
          Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor and Rob Fergus
          Dept. of Computer Science, Courant Institute, New York University


        These slides are great for Deconvolutional Networks.






        share|improve this answer











        $endgroup$



        The notes that accompany Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition, by Andrej Karpathy, do an excellent job of explaining convolutional neural networks.



        Reading this paper should give you a rough idea about:




        • Deconvolutional Networks
          Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor and Rob Fergus
          Dept. of Computer Science, Courant Institute, New York University


        These slides are great for Deconvolutional Networks.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jun 21 '18 at 13:21









        Stephen Rauch

        1,52751129




        1,52751129










        answered Jun 19 '15 at 10:17









        AzraelAzrael

        949710




        949710








        • 24




          $begingroup$
          Is it possible to summarise the content of any one of those links, in a short paragraph? The links might be useful for further research, but ideally a stack exchange answer should have enough text to address the basic question without needing to go off site.
          $endgroup$
          – Neil Slater
          Jun 20 '15 at 7:01












        • $begingroup$
          I am sorry but the content of these pages is too large to be summarized in a short paragraph.
          $endgroup$
          – Azrael
          Jun 20 '15 at 9:11






        • 12




          $begingroup$
          A full summary is not required, just a headline - e.g. "A deconvolutional neural network is similar to a CNN, but is trained so that features in any hidden layer can be used to reconstruct the previous layer (and by repetition across layers, eventually the input could be reconstructed from the output). This allows it to be trained unsupervised in order to learn generic high-level features in a problem domain - usually image processing" (note I am not even sure if that is correct, hence not writing my own answer).
          $endgroup$
          – Neil Slater
          Jun 20 '15 at 11:08








        • 5




          $begingroup$
          Although the links are good, a brief summary of the model in your own words would have been better.
          $endgroup$
          – SmallChess
          Dec 19 '15 at 13:34














        • 24




          $begingroup$
          Is it possible to summarise the content of any one of those links, in a short paragraph? The links might be useful for further research, but ideally a stack exchange answer should have enough text to address the basic question without needing to go off site.
          $endgroup$
          – Neil Slater
          Jun 20 '15 at 7:01












        • $begingroup$
          I am sorry but the content of these pages is too large to be summarized in a short paragraph.
          $endgroup$
          – Azrael
          Jun 20 '15 at 9:11






        • 12




          $begingroup$
          A full summary is not required, just a headline - e.g. "A deconvolutional neural network is similar to a CNN, but is trained so that features in any hidden layer can be used to reconstruct the previous layer (and by repetition across layers, eventually the input could be reconstructed from the output). This allows it to be trained unsupervised in order to learn generic high-level features in a problem domain - usually image processing" (note I am not even sure if that is correct, hence not writing my own answer).
          $endgroup$
          – Neil Slater
          Jun 20 '15 at 11:08








        • 5




          $begingroup$
          Although the links are good, a brief summary of the model in your own words would have been better.
          $endgroup$
          – SmallChess
          Dec 19 '15 at 13:34








        24




        24




        $begingroup$
        Is it possible to summarise the content of any one of those links, in a short paragraph? The links might be useful for further research, but ideally a stack exchange answer should have enough text to address the basic question without needing to go off site.
        $endgroup$
        – Neil Slater
        Jun 20 '15 at 7:01






        $begingroup$
        Is it possible to summarise the content of any one of those links, in a short paragraph? The links might be useful for further research, but ideally a stack exchange answer should have enough text to address the basic question without needing to go off site.
        $endgroup$
        – Neil Slater
        Jun 20 '15 at 7:01














        $begingroup$
        I am sorry but the content of these pages is too large to be summarized in a short paragraph.
        $endgroup$
        – Azrael
        Jun 20 '15 at 9:11




        $begingroup$
        I am sorry but the content of these pages is too large to be summarized in a short paragraph.
        $endgroup$
        – Azrael
        Jun 20 '15 at 9:11




        12




        12




        $begingroup$
        A full summary is not required, just a headline - e.g. "A deconvolutional neural network is similar to a CNN, but is trained so that features in any hidden layer can be used to reconstruct the previous layer (and by repetition across layers, eventually the input could be reconstructed from the output). This allows it to be trained unsupervised in order to learn generic high-level features in a problem domain - usually image processing" (note I am not even sure if that is correct, hence not writing my own answer).
        $endgroup$
        – Neil Slater
        Jun 20 '15 at 11:08






        $begingroup$
        A full summary is not required, just a headline - e.g. "A deconvolutional neural network is similar to a CNN, but is trained so that features in any hidden layer can be used to reconstruct the previous layer (and by repetition across layers, eventually the input could be reconstructed from the output). This allows it to be trained unsupervised in order to learn generic high-level features in a problem domain - usually image processing" (note I am not even sure if that is correct, hence not writing my own answer).
        $endgroup$
        – Neil Slater
        Jun 20 '15 at 11:08






        5




        5




        $begingroup$
        Although the links are good, a brief summary of the model in your own words would have been better.
        $endgroup$
        – SmallChess
        Dec 19 '15 at 13:34




        $begingroup$
        Although the links are good, a brief summary of the model in your own words would have been better.
        $endgroup$
        – SmallChess
        Dec 19 '15 at 13:34











        7












        $begingroup$

        Just found a great article from the theaon website on this topic [1]:




        The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, [...] to project feature maps to a higher-dimensional space.
        [...]
        i.e., map from a 4-dimensional space to a 16-dimensional space, while keeping the connectivity pattern of the convolution.



        Transposed convolutions – also called fractionally strided convolutions – work by swapping the forward and backward passes of a convolution. One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.



        The transposed convolution operation can be thought of as the gradient of some convolution with respect to its input, which is usually how transposed convolutions are implemented in practice.



        Finally note that it is always possible to implement a transposed convolution with a direct convolution. The disadvantage is that it usually involves adding many columns and rows of zeros to the input, resulting in a much less efficient implementation.




        So in simplespeak, a "transposed convolution" is mathematical operation using matrices (just like convolution) but is more efficient than the normal convolution operation in the case when you want to go back from the convolved values to the original (opposite direction). This is why it is preferred in implementations to convolution when computing the opposite direction (i.e. to avoid many unnecessary 0 multiplications caused by the sparse matrix that results from padding the input).



        Image ---> convolution ---> Result



        Result ---> transposed convolution ---> "originalish Image"



        Sometimes you save some values along the convolution path and reuse that information when "going back":



        Result ---> transposed convolution ---> Image



        That's probably the reason why it's wrongly called a "deconvolution". However, it does have something to do with the matrix transpose of the convolution (C^T), hence the more appropriate name "transposed convolution".



        So it makes a lot of sense when considering computing cost. You'd pay a lot more for amazon gpus if you wouldn't use the transposed convolution.



        Read and watch the animations here carefully:
        http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#no-zero-padding-unit-strides-transposed



        Some other relevant reading:




        The transpose (or more generally, the Hermitian or conjugate transpose) of a filter is simply the matched filter[3]. This is found by time reversing the kernel and taking the conjugate of all the values[2].




        I am also new to this and would be grateful for any feedback or corrections.



        [1] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html



        [2] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#transposed-convolution-arithmetic



        [3] https://en.wikipedia.org/wiki/Matched_filter






        share|improve this answer











        $endgroup$









        • 1




          $begingroup$
          Nit picking, but the link should be: deeplearning.net/software/theano_versions/dev/tutorial/…
          $endgroup$
          – Herbert
          Jan 19 '17 at 13:05






        • 1




          $begingroup$
          I think this is the best answer!!!
          $endgroup$
          – kli_nlpr
          Mar 31 '17 at 7:58
















        7












        $begingroup$

        Just found a great article from the theaon website on this topic [1]:




        The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, [...] to project feature maps to a higher-dimensional space.
        [...]
        i.e., map from a 4-dimensional space to a 16-dimensional space, while keeping the connectivity pattern of the convolution.



        Transposed convolutions – also called fractionally strided convolutions – work by swapping the forward and backward passes of a convolution. One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.



        The transposed convolution operation can be thought of as the gradient of some convolution with respect to its input, which is usually how transposed convolutions are implemented in practice.



        Finally note that it is always possible to implement a transposed convolution with a direct convolution. The disadvantage is that it usually involves adding many columns and rows of zeros to the input, resulting in a much less efficient implementation.




        So in simplespeak, a "transposed convolution" is mathematical operation using matrices (just like convolution) but is more efficient than the normal convolution operation in the case when you want to go back from the convolved values to the original (opposite direction). This is why it is preferred in implementations to convolution when computing the opposite direction (i.e. to avoid many unnecessary 0 multiplications caused by the sparse matrix that results from padding the input).



        Image ---> convolution ---> Result



        Result ---> transposed convolution ---> "originalish Image"



        Sometimes you save some values along the convolution path and reuse that information when "going back":



        Result ---> transposed convolution ---> Image



        That's probably the reason why it's wrongly called a "deconvolution". However, it does have something to do with the matrix transpose of the convolution (C^T), hence the more appropriate name "transposed convolution".



        So it makes a lot of sense when considering computing cost. You'd pay a lot more for amazon gpus if you wouldn't use the transposed convolution.



        Read and watch the animations here carefully:
        http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#no-zero-padding-unit-strides-transposed



        Some other relevant reading:




        The transpose (or more generally, the Hermitian or conjugate transpose) of a filter is simply the matched filter[3]. This is found by time reversing the kernel and taking the conjugate of all the values[2].




        I am also new to this and would be grateful for any feedback or corrections.



        [1] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html



        [2] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#transposed-convolution-arithmetic



        [3] https://en.wikipedia.org/wiki/Matched_filter






        share|improve this answer











        $endgroup$









        • 1




          $begingroup$
          Nit picking, but the link should be: deeplearning.net/software/theano_versions/dev/tutorial/…
          $endgroup$
          – Herbert
          Jan 19 '17 at 13:05






        • 1




          $begingroup$
          I think this is the best answer!!!
          $endgroup$
          – kli_nlpr
          Mar 31 '17 at 7:58














        7












        7








        7





        $begingroup$

        Just found a great article from the theaon website on this topic [1]:




        The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, [...] to project feature maps to a higher-dimensional space.
        [...]
        i.e., map from a 4-dimensional space to a 16-dimensional space, while keeping the connectivity pattern of the convolution.



        Transposed convolutions – also called fractionally strided convolutions – work by swapping the forward and backward passes of a convolution. One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.



        The transposed convolution operation can be thought of as the gradient of some convolution with respect to its input, which is usually how transposed convolutions are implemented in practice.



        Finally note that it is always possible to implement a transposed convolution with a direct convolution. The disadvantage is that it usually involves adding many columns and rows of zeros to the input, resulting in a much less efficient implementation.




        So in simplespeak, a "transposed convolution" is mathematical operation using matrices (just like convolution) but is more efficient than the normal convolution operation in the case when you want to go back from the convolved values to the original (opposite direction). This is why it is preferred in implementations to convolution when computing the opposite direction (i.e. to avoid many unnecessary 0 multiplications caused by the sparse matrix that results from padding the input).



        Image ---> convolution ---> Result



        Result ---> transposed convolution ---> "originalish Image"



        Sometimes you save some values along the convolution path and reuse that information when "going back":



        Result ---> transposed convolution ---> Image



        That's probably the reason why it's wrongly called a "deconvolution". However, it does have something to do with the matrix transpose of the convolution (C^T), hence the more appropriate name "transposed convolution".



        So it makes a lot of sense when considering computing cost. You'd pay a lot more for amazon gpus if you wouldn't use the transposed convolution.



        Read and watch the animations here carefully:
        http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#no-zero-padding-unit-strides-transposed



        Some other relevant reading:




        The transpose (or more generally, the Hermitian or conjugate transpose) of a filter is simply the matched filter[3]. This is found by time reversing the kernel and taking the conjugate of all the values[2].




        I am also new to this and would be grateful for any feedback or corrections.



        [1] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html



        [2] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#transposed-convolution-arithmetic



        [3] https://en.wikipedia.org/wiki/Matched_filter






        share|improve this answer











        $endgroup$



        Just found a great article from the theaon website on this topic [1]:




        The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, [...] to project feature maps to a higher-dimensional space.
        [...]
        i.e., map from a 4-dimensional space to a 16-dimensional space, while keeping the connectivity pattern of the convolution.



        Transposed convolutions – also called fractionally strided convolutions – work by swapping the forward and backward passes of a convolution. One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.



        The transposed convolution operation can be thought of as the gradient of some convolution with respect to its input, which is usually how transposed convolutions are implemented in practice.



        Finally note that it is always possible to implement a transposed convolution with a direct convolution. The disadvantage is that it usually involves adding many columns and rows of zeros to the input, resulting in a much less efficient implementation.




        So in simplespeak, a "transposed convolution" is mathematical operation using matrices (just like convolution) but is more efficient than the normal convolution operation in the case when you want to go back from the convolved values to the original (opposite direction). This is why it is preferred in implementations to convolution when computing the opposite direction (i.e. to avoid many unnecessary 0 multiplications caused by the sparse matrix that results from padding the input).



        Image ---> convolution ---> Result



        Result ---> transposed convolution ---> "originalish Image"



        Sometimes you save some values along the convolution path and reuse that information when "going back":



        Result ---> transposed convolution ---> Image



        That's probably the reason why it's wrongly called a "deconvolution". However, it does have something to do with the matrix transpose of the convolution (C^T), hence the more appropriate name "transposed convolution".



        So it makes a lot of sense when considering computing cost. You'd pay a lot more for amazon gpus if you wouldn't use the transposed convolution.



        Read and watch the animations here carefully:
        http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#no-zero-padding-unit-strides-transposed



        Some other relevant reading:




        The transpose (or more generally, the Hermitian or conjugate transpose) of a filter is simply the matched filter[3]. This is found by time reversing the kernel and taking the conjugate of all the values[2].




        I am also new to this and would be grateful for any feedback or corrections.



        [1] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html



        [2] http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#transposed-convolution-arithmetic



        [3] https://en.wikipedia.org/wiki/Matched_filter







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jan 25 '17 at 17:05

























        answered Jan 15 '17 at 21:58









        AndreiAndrei

        17914




        17914








        • 1




          $begingroup$
          Nit picking, but the link should be: deeplearning.net/software/theano_versions/dev/tutorial/…
          $endgroup$
          – Herbert
          Jan 19 '17 at 13:05






        • 1




          $begingroup$
          I think this is the best answer!!!
          $endgroup$
          – kli_nlpr
          Mar 31 '17 at 7:58














        • 1




          $begingroup$
          Nit picking, but the link should be: deeplearning.net/software/theano_versions/dev/tutorial/…
          $endgroup$
          – Herbert
          Jan 19 '17 at 13:05






        • 1




          $begingroup$
          I think this is the best answer!!!
          $endgroup$
          – kli_nlpr
          Mar 31 '17 at 7:58








        1




        1




        $begingroup$
        Nit picking, but the link should be: deeplearning.net/software/theano_versions/dev/tutorial/…
        $endgroup$
        – Herbert
        Jan 19 '17 at 13:05




        $begingroup$
        Nit picking, but the link should be: deeplearning.net/software/theano_versions/dev/tutorial/…
        $endgroup$
        – Herbert
        Jan 19 '17 at 13:05




        1




        1




        $begingroup$
        I think this is the best answer!!!
        $endgroup$
        – kli_nlpr
        Mar 31 '17 at 7:58




        $begingroup$
        I think this is the best answer!!!
        $endgroup$
        – kli_nlpr
        Mar 31 '17 at 7:58











        7












        $begingroup$

        We could use PCA for analogy.



        When using conv, the forward pass is to extract the coefficients of principle components from the input image, and the backward pass (that updates the input) is to use (the gradient of) the coefficients to reconstruct a new input image, so that the new input image has PC coefficients that better match the desired coefficients.



        When using deconv, the forward pass and the backward pass are reversed. The forward pass tries to reconstruct an image from PC coefficients, and the backward pass updates the PC coefficients given (the gradient of) the image.



        The deconv forward pass does exactly the conv gradient computation given in this post:
        http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/



        That's why in the caffe implementation of deconv (refer to Andrei Pokrovsky's answer), the deconv forward pass calls backward_cpu_gemm(), and the backward pass calls forward_cpu_gemm().






        share|improve this answer











        $endgroup$


















          7












          $begingroup$

          We could use PCA for analogy.



          When using conv, the forward pass is to extract the coefficients of principle components from the input image, and the backward pass (that updates the input) is to use (the gradient of) the coefficients to reconstruct a new input image, so that the new input image has PC coefficients that better match the desired coefficients.



          When using deconv, the forward pass and the backward pass are reversed. The forward pass tries to reconstruct an image from PC coefficients, and the backward pass updates the PC coefficients given (the gradient of) the image.



          The deconv forward pass does exactly the conv gradient computation given in this post:
          http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/



          That's why in the caffe implementation of deconv (refer to Andrei Pokrovsky's answer), the deconv forward pass calls backward_cpu_gemm(), and the backward pass calls forward_cpu_gemm().






          share|improve this answer











          $endgroup$
















            7












            7








            7





            $begingroup$

            We could use PCA for analogy.



            When using conv, the forward pass is to extract the coefficients of principle components from the input image, and the backward pass (that updates the input) is to use (the gradient of) the coefficients to reconstruct a new input image, so that the new input image has PC coefficients that better match the desired coefficients.



            When using deconv, the forward pass and the backward pass are reversed. The forward pass tries to reconstruct an image from PC coefficients, and the backward pass updates the PC coefficients given (the gradient of) the image.



            The deconv forward pass does exactly the conv gradient computation given in this post:
            http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/



            That's why in the caffe implementation of deconv (refer to Andrei Pokrovsky's answer), the deconv forward pass calls backward_cpu_gemm(), and the backward pass calls forward_cpu_gemm().






            share|improve this answer











            $endgroup$



            We could use PCA for analogy.



            When using conv, the forward pass is to extract the coefficients of principle components from the input image, and the backward pass (that updates the input) is to use (the gradient of) the coefficients to reconstruct a new input image, so that the new input image has PC coefficients that better match the desired coefficients.



            When using deconv, the forward pass and the backward pass are reversed. The forward pass tries to reconstruct an image from PC coefficients, and the backward pass updates the PC coefficients given (the gradient of) the image.



            The deconv forward pass does exactly the conv gradient computation given in this post:
            http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/



            That's why in the caffe implementation of deconv (refer to Andrei Pokrovsky's answer), the deconv forward pass calls backward_cpu_gemm(), and the backward pass calls forward_cpu_gemm().







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Oct 5 '17 at 8:10

























            answered Oct 5 '17 at 7:58









            Shaohua LiShaohua Li

            7113




            7113























                4












                $begingroup$

                In addition to David Dao's answer: It is also possible to think the other way around. Instead of focusing on which (low resolution) input pixels are used to produce a single output pixel, you can also focus on which individual input pixels contribute to which region of output pixels.



                This is done in this distill publication, including a series of very intuitive and interactive visualizations. One advantage of thinking in this direction is that explaining checkerboard artifacts becomes easy.






                share|improve this answer











                $endgroup$


















                  4












                  $begingroup$

                  In addition to David Dao's answer: It is also possible to think the other way around. Instead of focusing on which (low resolution) input pixels are used to produce a single output pixel, you can also focus on which individual input pixels contribute to which region of output pixels.



                  This is done in this distill publication, including a series of very intuitive and interactive visualizations. One advantage of thinking in this direction is that explaining checkerboard artifacts becomes easy.






                  share|improve this answer











                  $endgroup$
















                    4












                    4








                    4





                    $begingroup$

                    In addition to David Dao's answer: It is also possible to think the other way around. Instead of focusing on which (low resolution) input pixels are used to produce a single output pixel, you can also focus on which individual input pixels contribute to which region of output pixels.



                    This is done in this distill publication, including a series of very intuitive and interactive visualizations. One advantage of thinking in this direction is that explaining checkerboard artifacts becomes easy.






                    share|improve this answer











                    $endgroup$



                    In addition to David Dao's answer: It is also possible to think the other way around. Instead of focusing on which (low resolution) input pixels are used to produce a single output pixel, you can also focus on which individual input pixels contribute to which region of output pixels.



                    This is done in this distill publication, including a series of very intuitive and interactive visualizations. One advantage of thinking in this direction is that explaining checkerboard artifacts becomes easy.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Sep 26 '17 at 13:29









                    ncasas

                    3,5581130




                    3,5581130










                    answered Sep 26 '17 at 10:38









                    Martin R.Martin R.

                    1411




                    1411























                        0












                        $begingroup$

                        Convolutions from a DSP perspective



                        I'm a bit late to this but still would like to share my perspective and insights. My background is theoretical physics and digital signal processing. In particular I studied wavelets and convolutions are almost in my backbone ;)



                        The way people in the deep learning community talk about convolutions was also confusing to me. From my perspective what seems to be missing is a proper separation of concerns. I will explain the deep learning convolutions using some DSP tools.



                        Disclaimer



                        My explanations will be a bit hand-wavy and not mathematical rigorous in order to get the main points across.






                        Definitions



                        Let's define a few things first. I limit my discussion to one dimensional (the extension to more dimension is straight forward) infinite (so we don't need to mess with boundaries) sequences $x_n = {x_n}_{n=-infty}^{infty} = {dots, x_{-1}, x_{0}, x_{1}, dots }$.



                        A pure (discrete) convolution between two sequences $y_n$ and $x_n$ is defined as



                        $$ (y * x)_n = sum_{k=-infty}^{infty} y_{n-k} x_k $$



                        If we write this in terms of matrix vector operations it looks like this (assuming a simple kernel $mathbf{q} = (q_0,q_1,q_2)$ and vector $mathbf{x} = (x_0, x_1, x_2, x_3)^T$):



                        $$ mathbf{q} * mathbf{x} =
                        left( begin{array}{cccc}
                        q_1 & q_0 & 0 & 0 \
                        q_2 & q_1 & q_0 & 0 \
                        0 & q_2 & q_1 & q_0 \
                        0 & 0 & q_2 & q_1 \
                        end{array} right)
                        left( begin{array}{cccc}
                        x_0 \ x_1 \ x_2 \ x_3
                        end{array} right)
                        $$



                        Let's introduce the down- and up-sampling operators, $downarrow$ and $uparrow$, respectively. Downsampling by factor $k in mathbb{N}$ is removing all samples except every k-th one:



                        $$ downarrow_k!x_n = x_{nk} $$



                        And upsampling by factor $k$ is interleaving $k-1$ zeros between the samples:



                        $$ uparrow_k!x_n = left { begin{array}{ll}
                        x_{n/k} & n/k in mathbb{Z} \
                        0 & text{otherwise}
                        end{array} right.
                        $$



                        E.g. we have for $k=3$:



                        $$ downarrow_3!{ dots, x_0, x_1, x_2, x_3, x_4, x_5, x_6, dots } = { dots, x_0, x_3, x_6, dots } $$
                        $$ uparrow_3!{ dots, x_0, x_1, x_2, dots } = { dots x_0, 0, 0, x_1, 0, 0, x_2, 0, 0, dots } $$



                        or written in terms of matrix operations (here $k=2$):



                        $$ downarrow_2!x =
                        left( begin{array}{cc}
                        x_0 \ x_2
                        end{array} right) =
                        left( begin{array}{cccc}
                        1 & 0 & 0 & 0 \
                        0 & 0 & 1 & 0 \
                        end{array} right)
                        left( begin{array}{cccc}
                        x_0 \ x_1 \ x_2 \ x_3
                        end{array} right)
                        $$



                        and



                        $$ uparrow_2!x =
                        left( begin{array}{cccc}
                        x_0 \ 0 \ x_1 \ 0
                        end{array} right) =
                        left( begin{array}{cc}
                        1 & 0 \
                        0 & 0 \
                        0 & 1 \
                        0 & 0 \
                        end{array} right)
                        left( begin{array}{cc}
                        x_0 \ x_1
                        end{array} right)
                        $$



                        As one can already see, the down- and up-sample operators are mutually transposed, i.e. $uparrow_k = downarrow_k^T$.






                        Deep Learning Convolutions by Parts



                        Let's look at the typical convolutions used in deep learning and how we write them. Given some kernel $mathbf{q}$ and vector $mathbf{x}$ we have the following:




                        • a strided convolution with stride $k$ is $downarrow_k!(mathbf{q} * mathbf{x})$,

                        • a dilated convolution with factor $k$ is $(uparrow_k!mathbf{q}) * mathbf{x}$,

                        • a transposed convolution (or deconvolution) with stride $k$ is $ mathbf{q} * (uparrow_k!mathbf{x})$


                        Let's rearrange the transposed convolution a bit:
                        $$
                        mathbf{q} * (uparrow_k!mathbf{x}) ; = ;
                        mathbf{q} * (downarrow_k^T!mathbf{x}) ; = ;
                        (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
                        $$



                        In this notation $(mathbf{q}*)$ must be read as an operator, i.e. it abstracts convolving something with kernel $mathbf{q}$.
                        Or written in matrix operations (example):



                        $$
                        begin{align}
                        mathbf{q} * (uparrow_k!mathbf{x}) & =
                        left( begin{array}{cccc}
                        q_1 & q_0 & 0 & 0 \
                        q_2 & q_1 & q_0 & 0 \
                        0 & q_2 & q_1 & q_0 \
                        0 & 0 & q_2 & q_1 \
                        end{array} right)
                        left( begin{array}{cc}
                        1 & 0 \
                        0 & 0 \
                        0 & 1 \
                        0 & 0 \
                        end{array} right)
                        left( begin{array}{c}
                        x_0\
                        x_1\
                        end{array} right)
                        \ & =
                        left( begin{array}{cccc}
                        q_1 & q_2 & 0 & 0 \
                        q_0 & q_1 & q_2 & 0 \
                        0 & q_0 & q_1 & q_2 \
                        0 & 0 & q_0 & q_1 \
                        end{array} right)^T
                        left( begin{array}{cccc}
                        1 & 0 & 0 & 0\
                        0 & 0 & 1 & 0\
                        end{array} right)^T
                        left( begin{array}{c}
                        x_0\
                        x_1\
                        end{array} right)
                        \ & =
                        left(
                        left( begin{array}{cccc}
                        1 & 0 & 0 & 0\
                        0 & 0 & 1 & 0\
                        end{array} right)
                        left( begin{array}{cccc}
                        q_1 & q_2 & 0 & 0 \
                        q_0 & q_1 & q_2 & 0 \
                        0 & q_0 & q_1 & q_2 \
                        0 & 0 & q_0 & q_1 \
                        end{array} right)
                        right)^T
                        left( begin{array}{c}
                        x_0\
                        x_1\
                        end{array} right)
                        \ & = (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
                        end{align}
                        $$



                        As one can see the deconvolution is the transposed operation, thus, the name.



                        Connection to Nearest Neighbor Upsampling



                        Another common approach found in convolutional networks is upsampling with some built-in form of interpolation. Let's take upsampling by factor 2 with a simple repeat interpolation.
                        This can be written as $uparrow_2!(1;1) * mathbf{x}$. If we also add a learnable kernel $mathbf{q}$ to this we have $uparrow_2!(1;1) * mathbf{q} * mathbf{x}$. The convolutions can be combined, e.g. for $mathbf{q}=(q_0;q_1;q_2)$, we have $$(1;1) * mathbf{q} = (q_0;;q_0!!+!q_1;;q_1!!+!q_2;;q_2),$$



                        i.e. we can replace a repeat upsampler with factor 2 and a convolution with a kernel of size 3 by a transposed convolution with kernel size 4. This transposed convolution has the same "interpolation capacity" but would be able to learn better matching interpolations.






                        Conclusions and Final Remarks



                        I hope I could clarify some common convolutions found in deep learning a bit by taking them apart in the fundamental operations.



                        I didn't cover pooling here. But this is just a nonlinear downsampler and can be treated within this notation as well.






                        share|improve this answer








                        New contributor




                        André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$


















                          0












                          $begingroup$

                          Convolutions from a DSP perspective



                          I'm a bit late to this but still would like to share my perspective and insights. My background is theoretical physics and digital signal processing. In particular I studied wavelets and convolutions are almost in my backbone ;)



                          The way people in the deep learning community talk about convolutions was also confusing to me. From my perspective what seems to be missing is a proper separation of concerns. I will explain the deep learning convolutions using some DSP tools.



                          Disclaimer



                          My explanations will be a bit hand-wavy and not mathematical rigorous in order to get the main points across.






                          Definitions



                          Let's define a few things first. I limit my discussion to one dimensional (the extension to more dimension is straight forward) infinite (so we don't need to mess with boundaries) sequences $x_n = {x_n}_{n=-infty}^{infty} = {dots, x_{-1}, x_{0}, x_{1}, dots }$.



                          A pure (discrete) convolution between two sequences $y_n$ and $x_n$ is defined as



                          $$ (y * x)_n = sum_{k=-infty}^{infty} y_{n-k} x_k $$



                          If we write this in terms of matrix vector operations it looks like this (assuming a simple kernel $mathbf{q} = (q_0,q_1,q_2)$ and vector $mathbf{x} = (x_0, x_1, x_2, x_3)^T$):



                          $$ mathbf{q} * mathbf{x} =
                          left( begin{array}{cccc}
                          q_1 & q_0 & 0 & 0 \
                          q_2 & q_1 & q_0 & 0 \
                          0 & q_2 & q_1 & q_0 \
                          0 & 0 & q_2 & q_1 \
                          end{array} right)
                          left( begin{array}{cccc}
                          x_0 \ x_1 \ x_2 \ x_3
                          end{array} right)
                          $$



                          Let's introduce the down- and up-sampling operators, $downarrow$ and $uparrow$, respectively. Downsampling by factor $k in mathbb{N}$ is removing all samples except every k-th one:



                          $$ downarrow_k!x_n = x_{nk} $$



                          And upsampling by factor $k$ is interleaving $k-1$ zeros between the samples:



                          $$ uparrow_k!x_n = left { begin{array}{ll}
                          x_{n/k} & n/k in mathbb{Z} \
                          0 & text{otherwise}
                          end{array} right.
                          $$



                          E.g. we have for $k=3$:



                          $$ downarrow_3!{ dots, x_0, x_1, x_2, x_3, x_4, x_5, x_6, dots } = { dots, x_0, x_3, x_6, dots } $$
                          $$ uparrow_3!{ dots, x_0, x_1, x_2, dots } = { dots x_0, 0, 0, x_1, 0, 0, x_2, 0, 0, dots } $$



                          or written in terms of matrix operations (here $k=2$):



                          $$ downarrow_2!x =
                          left( begin{array}{cc}
                          x_0 \ x_2
                          end{array} right) =
                          left( begin{array}{cccc}
                          1 & 0 & 0 & 0 \
                          0 & 0 & 1 & 0 \
                          end{array} right)
                          left( begin{array}{cccc}
                          x_0 \ x_1 \ x_2 \ x_3
                          end{array} right)
                          $$



                          and



                          $$ uparrow_2!x =
                          left( begin{array}{cccc}
                          x_0 \ 0 \ x_1 \ 0
                          end{array} right) =
                          left( begin{array}{cc}
                          1 & 0 \
                          0 & 0 \
                          0 & 1 \
                          0 & 0 \
                          end{array} right)
                          left( begin{array}{cc}
                          x_0 \ x_1
                          end{array} right)
                          $$



                          As one can already see, the down- and up-sample operators are mutually transposed, i.e. $uparrow_k = downarrow_k^T$.






                          Deep Learning Convolutions by Parts



                          Let's look at the typical convolutions used in deep learning and how we write them. Given some kernel $mathbf{q}$ and vector $mathbf{x}$ we have the following:




                          • a strided convolution with stride $k$ is $downarrow_k!(mathbf{q} * mathbf{x})$,

                          • a dilated convolution with factor $k$ is $(uparrow_k!mathbf{q}) * mathbf{x}$,

                          • a transposed convolution (or deconvolution) with stride $k$ is $ mathbf{q} * (uparrow_k!mathbf{x})$


                          Let's rearrange the transposed convolution a bit:
                          $$
                          mathbf{q} * (uparrow_k!mathbf{x}) ; = ;
                          mathbf{q} * (downarrow_k^T!mathbf{x}) ; = ;
                          (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
                          $$



                          In this notation $(mathbf{q}*)$ must be read as an operator, i.e. it abstracts convolving something with kernel $mathbf{q}$.
                          Or written in matrix operations (example):



                          $$
                          begin{align}
                          mathbf{q} * (uparrow_k!mathbf{x}) & =
                          left( begin{array}{cccc}
                          q_1 & q_0 & 0 & 0 \
                          q_2 & q_1 & q_0 & 0 \
                          0 & q_2 & q_1 & q_0 \
                          0 & 0 & q_2 & q_1 \
                          end{array} right)
                          left( begin{array}{cc}
                          1 & 0 \
                          0 & 0 \
                          0 & 1 \
                          0 & 0 \
                          end{array} right)
                          left( begin{array}{c}
                          x_0\
                          x_1\
                          end{array} right)
                          \ & =
                          left( begin{array}{cccc}
                          q_1 & q_2 & 0 & 0 \
                          q_0 & q_1 & q_2 & 0 \
                          0 & q_0 & q_1 & q_2 \
                          0 & 0 & q_0 & q_1 \
                          end{array} right)^T
                          left( begin{array}{cccc}
                          1 & 0 & 0 & 0\
                          0 & 0 & 1 & 0\
                          end{array} right)^T
                          left( begin{array}{c}
                          x_0\
                          x_1\
                          end{array} right)
                          \ & =
                          left(
                          left( begin{array}{cccc}
                          1 & 0 & 0 & 0\
                          0 & 0 & 1 & 0\
                          end{array} right)
                          left( begin{array}{cccc}
                          q_1 & q_2 & 0 & 0 \
                          q_0 & q_1 & q_2 & 0 \
                          0 & q_0 & q_1 & q_2 \
                          0 & 0 & q_0 & q_1 \
                          end{array} right)
                          right)^T
                          left( begin{array}{c}
                          x_0\
                          x_1\
                          end{array} right)
                          \ & = (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
                          end{align}
                          $$



                          As one can see the deconvolution is the transposed operation, thus, the name.



                          Connection to Nearest Neighbor Upsampling



                          Another common approach found in convolutional networks is upsampling with some built-in form of interpolation. Let's take upsampling by factor 2 with a simple repeat interpolation.
                          This can be written as $uparrow_2!(1;1) * mathbf{x}$. If we also add a learnable kernel $mathbf{q}$ to this we have $uparrow_2!(1;1) * mathbf{q} * mathbf{x}$. The convolutions can be combined, e.g. for $mathbf{q}=(q_0;q_1;q_2)$, we have $$(1;1) * mathbf{q} = (q_0;;q_0!!+!q_1;;q_1!!+!q_2;;q_2),$$



                          i.e. we can replace a repeat upsampler with factor 2 and a convolution with a kernel of size 3 by a transposed convolution with kernel size 4. This transposed convolution has the same "interpolation capacity" but would be able to learn better matching interpolations.






                          Conclusions and Final Remarks



                          I hope I could clarify some common convolutions found in deep learning a bit by taking them apart in the fundamental operations.



                          I didn't cover pooling here. But this is just a nonlinear downsampler and can be treated within this notation as well.






                          share|improve this answer








                          New contributor




                          André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






                          $endgroup$
















                            0












                            0








                            0





                            $begingroup$

                            Convolutions from a DSP perspective



                            I'm a bit late to this but still would like to share my perspective and insights. My background is theoretical physics and digital signal processing. In particular I studied wavelets and convolutions are almost in my backbone ;)



                            The way people in the deep learning community talk about convolutions was also confusing to me. From my perspective what seems to be missing is a proper separation of concerns. I will explain the deep learning convolutions using some DSP tools.



                            Disclaimer



                            My explanations will be a bit hand-wavy and not mathematical rigorous in order to get the main points across.






                            Definitions



                            Let's define a few things first. I limit my discussion to one dimensional (the extension to more dimension is straight forward) infinite (so we don't need to mess with boundaries) sequences $x_n = {x_n}_{n=-infty}^{infty} = {dots, x_{-1}, x_{0}, x_{1}, dots }$.



                            A pure (discrete) convolution between two sequences $y_n$ and $x_n$ is defined as



                            $$ (y * x)_n = sum_{k=-infty}^{infty} y_{n-k} x_k $$



                            If we write this in terms of matrix vector operations it looks like this (assuming a simple kernel $mathbf{q} = (q_0,q_1,q_2)$ and vector $mathbf{x} = (x_0, x_1, x_2, x_3)^T$):



                            $$ mathbf{q} * mathbf{x} =
                            left( begin{array}{cccc}
                            q_1 & q_0 & 0 & 0 \
                            q_2 & q_1 & q_0 & 0 \
                            0 & q_2 & q_1 & q_0 \
                            0 & 0 & q_2 & q_1 \
                            end{array} right)
                            left( begin{array}{cccc}
                            x_0 \ x_1 \ x_2 \ x_3
                            end{array} right)
                            $$



                            Let's introduce the down- and up-sampling operators, $downarrow$ and $uparrow$, respectively. Downsampling by factor $k in mathbb{N}$ is removing all samples except every k-th one:



                            $$ downarrow_k!x_n = x_{nk} $$



                            And upsampling by factor $k$ is interleaving $k-1$ zeros between the samples:



                            $$ uparrow_k!x_n = left { begin{array}{ll}
                            x_{n/k} & n/k in mathbb{Z} \
                            0 & text{otherwise}
                            end{array} right.
                            $$



                            E.g. we have for $k=3$:



                            $$ downarrow_3!{ dots, x_0, x_1, x_2, x_3, x_4, x_5, x_6, dots } = { dots, x_0, x_3, x_6, dots } $$
                            $$ uparrow_3!{ dots, x_0, x_1, x_2, dots } = { dots x_0, 0, 0, x_1, 0, 0, x_2, 0, 0, dots } $$



                            or written in terms of matrix operations (here $k=2$):



                            $$ downarrow_2!x =
                            left( begin{array}{cc}
                            x_0 \ x_2
                            end{array} right) =
                            left( begin{array}{cccc}
                            1 & 0 & 0 & 0 \
                            0 & 0 & 1 & 0 \
                            end{array} right)
                            left( begin{array}{cccc}
                            x_0 \ x_1 \ x_2 \ x_3
                            end{array} right)
                            $$



                            and



                            $$ uparrow_2!x =
                            left( begin{array}{cccc}
                            x_0 \ 0 \ x_1 \ 0
                            end{array} right) =
                            left( begin{array}{cc}
                            1 & 0 \
                            0 & 0 \
                            0 & 1 \
                            0 & 0 \
                            end{array} right)
                            left( begin{array}{cc}
                            x_0 \ x_1
                            end{array} right)
                            $$



                            As one can already see, the down- and up-sample operators are mutually transposed, i.e. $uparrow_k = downarrow_k^T$.






                            Deep Learning Convolutions by Parts



                            Let's look at the typical convolutions used in deep learning and how we write them. Given some kernel $mathbf{q}$ and vector $mathbf{x}$ we have the following:




                            • a strided convolution with stride $k$ is $downarrow_k!(mathbf{q} * mathbf{x})$,

                            • a dilated convolution with factor $k$ is $(uparrow_k!mathbf{q}) * mathbf{x}$,

                            • a transposed convolution (or deconvolution) with stride $k$ is $ mathbf{q} * (uparrow_k!mathbf{x})$


                            Let's rearrange the transposed convolution a bit:
                            $$
                            mathbf{q} * (uparrow_k!mathbf{x}) ; = ;
                            mathbf{q} * (downarrow_k^T!mathbf{x}) ; = ;
                            (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
                            $$



                            In this notation $(mathbf{q}*)$ must be read as an operator, i.e. it abstracts convolving something with kernel $mathbf{q}$.
                            Or written in matrix operations (example):



                            $$
                            begin{align}
                            mathbf{q} * (uparrow_k!mathbf{x}) & =
                            left( begin{array}{cccc}
                            q_1 & q_0 & 0 & 0 \
                            q_2 & q_1 & q_0 & 0 \
                            0 & q_2 & q_1 & q_0 \
                            0 & 0 & q_2 & q_1 \
                            end{array} right)
                            left( begin{array}{cc}
                            1 & 0 \
                            0 & 0 \
                            0 & 1 \
                            0 & 0 \
                            end{array} right)
                            left( begin{array}{c}
                            x_0\
                            x_1\
                            end{array} right)
                            \ & =
                            left( begin{array}{cccc}
                            q_1 & q_2 & 0 & 0 \
                            q_0 & q_1 & q_2 & 0 \
                            0 & q_0 & q_1 & q_2 \
                            0 & 0 & q_0 & q_1 \
                            end{array} right)^T
                            left( begin{array}{cccc}
                            1 & 0 & 0 & 0\
                            0 & 0 & 1 & 0\
                            end{array} right)^T
                            left( begin{array}{c}
                            x_0\
                            x_1\
                            end{array} right)
                            \ & =
                            left(
                            left( begin{array}{cccc}
                            1 & 0 & 0 & 0\
                            0 & 0 & 1 & 0\
                            end{array} right)
                            left( begin{array}{cccc}
                            q_1 & q_2 & 0 & 0 \
                            q_0 & q_1 & q_2 & 0 \
                            0 & q_0 & q_1 & q_2 \
                            0 & 0 & q_0 & q_1 \
                            end{array} right)
                            right)^T
                            left( begin{array}{c}
                            x_0\
                            x_1\
                            end{array} right)
                            \ & = (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
                            end{align}
                            $$



                            As one can see the deconvolution is the transposed operation, thus, the name.



                            Connection to Nearest Neighbor Upsampling



                            Another common approach found in convolutional networks is upsampling with some built-in form of interpolation. Let's take upsampling by factor 2 with a simple repeat interpolation.
                            This can be written as $uparrow_2!(1;1) * mathbf{x}$. If we also add a learnable kernel $mathbf{q}$ to this we have $uparrow_2!(1;1) * mathbf{q} * mathbf{x}$. The convolutions can be combined, e.g. for $mathbf{q}=(q_0;q_1;q_2)$, we have $$(1;1) * mathbf{q} = (q_0;;q_0!!+!q_1;;q_1!!+!q_2;;q_2),$$



                            i.e. we can replace a repeat upsampler with factor 2 and a convolution with a kernel of size 3 by a transposed convolution with kernel size 4. This transposed convolution has the same "interpolation capacity" but would be able to learn better matching interpolations.






                            Conclusions and Final Remarks



                            I hope I could clarify some common convolutions found in deep learning a bit by taking them apart in the fundamental operations.



                            I didn't cover pooling here. But this is just a nonlinear downsampler and can be treated within this notation as well.






                            share|improve this answer








                            New contributor




                            André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            $endgroup$



                            Convolutions from a DSP perspective



                            I'm a bit late to this but still would like to share my perspective and insights. My background is theoretical physics and digital signal processing. In particular I studied wavelets and convolutions are almost in my backbone ;)



                            The way people in the deep learning community talk about convolutions was also confusing to me. From my perspective what seems to be missing is a proper separation of concerns. I will explain the deep learning convolutions using some DSP tools.



                            Disclaimer



                            My explanations will be a bit hand-wavy and not mathematical rigorous in order to get the main points across.






                            Definitions



                            Let's define a few things first. I limit my discussion to one dimensional (the extension to more dimension is straight forward) infinite (so we don't need to mess with boundaries) sequences $x_n = {x_n}_{n=-infty}^{infty} = {dots, x_{-1}, x_{0}, x_{1}, dots }$.



                            A pure (discrete) convolution between two sequences $y_n$ and $x_n$ is defined as



                            $$ (y * x)_n = sum_{k=-infty}^{infty} y_{n-k} x_k $$



                            If we write this in terms of matrix vector operations it looks like this (assuming a simple kernel $mathbf{q} = (q_0,q_1,q_2)$ and vector $mathbf{x} = (x_0, x_1, x_2, x_3)^T$):



                            $$ mathbf{q} * mathbf{x} =
                            left( begin{array}{cccc}
                            q_1 & q_0 & 0 & 0 \
                            q_2 & q_1 & q_0 & 0 \
                            0 & q_2 & q_1 & q_0 \
                            0 & 0 & q_2 & q_1 \
                            end{array} right)
                            left( begin{array}{cccc}
                            x_0 \ x_1 \ x_2 \ x_3
                            end{array} right)
                            $$



                            Let's introduce the down- and up-sampling operators, $downarrow$ and $uparrow$, respectively. Downsampling by factor $k in mathbb{N}$ is removing all samples except every k-th one:



                            $$ downarrow_k!x_n = x_{nk} $$



                            And upsampling by factor $k$ is interleaving $k-1$ zeros between the samples:



                            $$ uparrow_k!x_n = left { begin{array}{ll}
                            x_{n/k} & n/k in mathbb{Z} \
                            0 & text{otherwise}
                            end{array} right.
                            $$



                            E.g. we have for $k=3$:



                            $$ downarrow_3!{ dots, x_0, x_1, x_2, x_3, x_4, x_5, x_6, dots } = { dots, x_0, x_3, x_6, dots } $$
                            $$ uparrow_3!{ dots, x_0, x_1, x_2, dots } = { dots x_0, 0, 0, x_1, 0, 0, x_2, 0, 0, dots } $$



                            or written in terms of matrix operations (here $k=2$):



                            $$ downarrow_2!x =
                            left( begin{array}{cc}
                            x_0 \ x_2
                            end{array} right) =
                            left( begin{array}{cccc}
                            1 & 0 & 0 & 0 \
                            0 & 0 & 1 & 0 \
                            end{array} right)
                            left( begin{array}{cccc}
                            x_0 \ x_1 \ x_2 \ x_3
                            end{array} right)
                            $$



                            and



                            $$ uparrow_2!x =
                            left( begin{array}{cccc}
                            x_0 \ 0 \ x_1 \ 0
                            end{array} right) =
                            left( begin{array}{cc}
                            1 & 0 \
                            0 & 0 \
                            0 & 1 \
                            0 & 0 \
                            end{array} right)
                            left( begin{array}{cc}
                            x_0 \ x_1
                            end{array} right)
                            $$



                            As one can already see, the down- and up-sample operators are mutually transposed, i.e. $uparrow_k = downarrow_k^T$.






                            Deep Learning Convolutions by Parts



                            Let's look at the typical convolutions used in deep learning and how we write them. Given some kernel $mathbf{q}$ and vector $mathbf{x}$ we have the following:




                            • a strided convolution with stride $k$ is $downarrow_k!(mathbf{q} * mathbf{x})$,

                            • a dilated convolution with factor $k$ is $(uparrow_k!mathbf{q}) * mathbf{x}$,

                            • a transposed convolution (or deconvolution) with stride $k$ is $ mathbf{q} * (uparrow_k!mathbf{x})$


                            Let's rearrange the transposed convolution a bit:
                            $$
                            mathbf{q} * (uparrow_k!mathbf{x}) ; = ;
                            mathbf{q} * (downarrow_k^T!mathbf{x}) ; = ;
                            (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
                            $$



                            In this notation $(mathbf{q}*)$ must be read as an operator, i.e. it abstracts convolving something with kernel $mathbf{q}$.
                            Or written in matrix operations (example):



                            $$
                            begin{align}
                            mathbf{q} * (uparrow_k!mathbf{x}) & =
                            left( begin{array}{cccc}
                            q_1 & q_0 & 0 & 0 \
                            q_2 & q_1 & q_0 & 0 \
                            0 & q_2 & q_1 & q_0 \
                            0 & 0 & q_2 & q_1 \
                            end{array} right)
                            left( begin{array}{cc}
                            1 & 0 \
                            0 & 0 \
                            0 & 1 \
                            0 & 0 \
                            end{array} right)
                            left( begin{array}{c}
                            x_0\
                            x_1\
                            end{array} right)
                            \ & =
                            left( begin{array}{cccc}
                            q_1 & q_2 & 0 & 0 \
                            q_0 & q_1 & q_2 & 0 \
                            0 & q_0 & q_1 & q_2 \
                            0 & 0 & q_0 & q_1 \
                            end{array} right)^T
                            left( begin{array}{cccc}
                            1 & 0 & 0 & 0\
                            0 & 0 & 1 & 0\
                            end{array} right)^T
                            left( begin{array}{c}
                            x_0\
                            x_1\
                            end{array} right)
                            \ & =
                            left(
                            left( begin{array}{cccc}
                            1 & 0 & 0 & 0\
                            0 & 0 & 1 & 0\
                            end{array} right)
                            left( begin{array}{cccc}
                            q_1 & q_2 & 0 & 0 \
                            q_0 & q_1 & q_2 & 0 \
                            0 & q_0 & q_1 & q_2 \
                            0 & 0 & q_0 & q_1 \
                            end{array} right)
                            right)^T
                            left( begin{array}{c}
                            x_0\
                            x_1\
                            end{array} right)
                            \ & = (uparrow_k!(mathbf{q}*)^T)^Tmathbf{x}
                            end{align}
                            $$



                            As one can see the deconvolution is the transposed operation, thus, the name.



                            Connection to Nearest Neighbor Upsampling



                            Another common approach found in convolutional networks is upsampling with some built-in form of interpolation. Let's take upsampling by factor 2 with a simple repeat interpolation.
                            This can be written as $uparrow_2!(1;1) * mathbf{x}$. If we also add a learnable kernel $mathbf{q}$ to this we have $uparrow_2!(1;1) * mathbf{q} * mathbf{x}$. The convolutions can be combined, e.g. for $mathbf{q}=(q_0;q_1;q_2)$, we have $$(1;1) * mathbf{q} = (q_0;;q_0!!+!q_1;;q_1!!+!q_2;;q_2),$$



                            i.e. we can replace a repeat upsampler with factor 2 and a convolution with a kernel of size 3 by a transposed convolution with kernel size 4. This transposed convolution has the same "interpolation capacity" but would be able to learn better matching interpolations.






                            Conclusions and Final Remarks



                            I hope I could clarify some common convolutions found in deep learning a bit by taking them apart in the fundamental operations.



                            I didn't cover pooling here. But this is just a nonlinear downsampler and can be treated within this notation as well.







                            share|improve this answer








                            New contributor




                            André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            share|improve this answer



                            share|improve this answer






                            New contributor




                            André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            answered 5 hours ago









                            André BergnerAndré Bergner

                            1011




                            1011




                            New contributor




                            André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





                            New contributor





                            André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            André Bergner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.























                                -1












                                $begingroup$

                                The following paper discusses deconvolutional layers.Both from the architectural and training point of view.Deconvolutional networks






                                share|improve this answer









                                $endgroup$









                                • 1




                                  $begingroup$
                                  This does not add any value to this answer
                                  $endgroup$
                                  – Martin Thoma
                                  Jan 19 '17 at 12:40
















                                -1












                                $begingroup$

                                The following paper discusses deconvolutional layers.Both from the architectural and training point of view.Deconvolutional networks






                                share|improve this answer









                                $endgroup$









                                • 1




                                  $begingroup$
                                  This does not add any value to this answer
                                  $endgroup$
                                  – Martin Thoma
                                  Jan 19 '17 at 12:40














                                -1












                                -1








                                -1





                                $begingroup$

                                The following paper discusses deconvolutional layers.Both from the architectural and training point of view.Deconvolutional networks






                                share|improve this answer









                                $endgroup$



                                The following paper discusses deconvolutional layers.Both from the architectural and training point of view.Deconvolutional networks







                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered Jan 19 '17 at 11:18









                                AvhirupAvhirup

                                882




                                882








                                • 1




                                  $begingroup$
                                  This does not add any value to this answer
                                  $endgroup$
                                  – Martin Thoma
                                  Jan 19 '17 at 12:40














                                • 1




                                  $begingroup$
                                  This does not add any value to this answer
                                  $endgroup$
                                  – Martin Thoma
                                  Jan 19 '17 at 12:40








                                1




                                1




                                $begingroup$
                                This does not add any value to this answer
                                $endgroup$
                                – Martin Thoma
                                Jan 19 '17 at 12:40




                                $begingroup$
                                This does not add any value to this answer
                                $endgroup$
                                – Martin Thoma
                                Jan 19 '17 at 12:40


















                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Data Science Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f6107%2fwhat-are-deconvolutional-layers%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How to label and detect the document text images

                                Tabula Rosettana

                                Aureus (color)