DNN practice: errors and strange behavior












0












$begingroup$


I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.



The test outputs are a sin function and a linear function of the inputs, with no noise.



In short I have two questions:




  1. define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

  2. When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.


I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.



Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2



The entire matlab code is below:



rng('default')
nue = .01;
batchsize = 1;

X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;

%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')

subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')

figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')

%************* functions below **********************

function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end

k=0;losslog=;
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end

function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weights{end},2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValues{n, ln-1}*weights{ln-1};
%netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x
end
end
yhat(n,:) = netValues{n,end};
end
end

function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end

function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdW{n,ln} = netValues{n,ln}';
dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer
dLdV{n,ln} = ydelta(n,:);
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]

else
%logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function
reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdW{n,ln} = netValues{n,ln}';
%dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdU{n,ln} = sign(reluvalue); %relu derivative
dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')'; %start from index2 because index1 has holds the weight for the bias one level up
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;
% [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]
end
end
end
end

function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdL{n,ln}/size(dWdL,1);
else
meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweights{ln} = weights{ln} - meandWdL*nue;
end
end









share|improve this question









New contributor




DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$


    I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.



    The test outputs are a sin function and a linear function of the inputs, with no noise.



    In short I have two questions:




    1. define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

    2. When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.


    I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.



    Notes:
    The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
    Loss is (yModel-y)^2



    The entire matlab code is below:



    rng('default')
    nue = .01;
    batchsize = 1;

    X = rand(200100,3);
    y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
    numTestDays=100;

    %define hidden layer structure
    HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

    %run NN machinery
    [modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

    % predict y for out of sample data
    [netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

    %plot output
    figure;
    subplot(1,2,1)
    scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
    hold all
    scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
    title('y1 and y1 NN model')

    subplot(1,2,2)
    scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
    hold all
    scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
    title('y2 and y2 NN model')

    figure; plot(losslog(5:end))
    xlabel('n'); ylabel('loss'); title('loss of training example n')

    %************* functions below **********************

    function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
    numdata = size(X,1); %num data points
    dimIn = size(X,2); %dim of data
    dimOut = size(y,2); %num outputs we are modeling
    numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
    layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

    %create and initialize weights
    weights = cell(1,numLayers);
    rng('default');
    for ln = 1:numLayers
    if ln == 1
    weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
    else
    weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
    end
    end

    k=0;losslog=;
    for n = batchsize:batchsize:numdata
    theseidx = n-batchsize+1:n;
    [netValues yhat] = projectforward(X(theseidx,:), weights);
    [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
    dLdW = calculatePartials(netValues, weights, ydelta);
    weights = updateweights(dLdW, weights, nue);
    k=k+1; losslog(k)=mean(loss);
    end
    finalweights=weights;
    end

    function [netValues yhat] = projectforward(X, weights)
    netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
    yhat = nan(size(X,1), size(weights{end},2));
    for n = 1:size(X,1)
    for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
    if ln ==1
    netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
    elseif ln < length(weights)+1
    tempvals = netValues{n, ln-1}*weights{ln-1};
    %netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical
    netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)
    elseif ln == length(weights)+1
    netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x
    end
    end
    yhat(n,:) = netValues{n,end};
    end
    end

    function [loss ydelta]= calculateLoss(yhat, y)
    ydelta = yhat-y;
    loss = sum(ydelta.^2, 2)/2;
    end

    function dLdW = calculatePartials(netValues, weights, ydelta)
    numexamples=size(netValues,1);
    dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
    dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
    dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
    dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
    delta = cell(numexamples,length(weights));%dVdU .* dLdV
    for n = 1:numexamples
    for ln = length(weights):-1:1
    if ln == length(weights)
    dUdW{n,ln} = netValues{n,ln}';
    dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer
    dLdV{n,ln} = ydelta(n,:);
    delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
    dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function
    % [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]

    else
    %logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function
    reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
    dUdW{n,ln} = netValues{n,ln}';
    %dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative
    dVdU{n,ln} = sign(reluvalue); %relu derivative
    dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')'; %start from index2 because index1 has holds the weight for the bias one level up
    delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
    dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;
    % [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]
    end
    end
    end
    end

    function newweights = updateweights(dWdL, weights, nue)
    newweights = cell(size(weights));
    for ln = 1:length(weights)
    for n = 1:size(dWdL,1)
    if n==1
    meandWdL = dWdL{n,ln}/size(dWdL,1);
    else
    meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch
    end
    end
    newweights{ln} = weights{ln} - meandWdL*nue;
    end
    end









    share|improve this question









    New contributor




    DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$


      I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.



      The test outputs are a sin function and a linear function of the inputs, with no noise.



      In short I have two questions:




      1. define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

      2. When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.


      I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.



      Notes:
      The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
      Loss is (yModel-y)^2



      The entire matlab code is below:



      rng('default')
      nue = .01;
      batchsize = 1;

      X = rand(200100,3);
      y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
      numTestDays=100;

      %define hidden layer structure
      HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

      %run NN machinery
      [modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

      % predict y for out of sample data
      [netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

      %plot output
      figure;
      subplot(1,2,1)
      scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
      hold all
      scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
      title('y1 and y1 NN model')

      subplot(1,2,2)
      scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
      hold all
      scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
      title('y2 and y2 NN model')

      figure; plot(losslog(5:end))
      xlabel('n'); ylabel('loss'); title('loss of training example n')

      %************* functions below **********************

      function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
      numdata = size(X,1); %num data points
      dimIn = size(X,2); %dim of data
      dimOut = size(y,2); %num outputs we are modeling
      numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
      layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

      %create and initialize weights
      weights = cell(1,numLayers);
      rng('default');
      for ln = 1:numLayers
      if ln == 1
      weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
      else
      weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
      end
      end

      k=0;losslog=;
      for n = batchsize:batchsize:numdata
      theseidx = n-batchsize+1:n;
      [netValues yhat] = projectforward(X(theseidx,:), weights);
      [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
      dLdW = calculatePartials(netValues, weights, ydelta);
      weights = updateweights(dLdW, weights, nue);
      k=k+1; losslog(k)=mean(loss);
      end
      finalweights=weights;
      end

      function [netValues yhat] = projectforward(X, weights)
      netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
      yhat = nan(size(X,1), size(weights{end},2));
      for n = 1:size(X,1)
      for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
      if ln ==1
      netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
      elseif ln < length(weights)+1
      tempvals = netValues{n, ln-1}*weights{ln-1};
      %netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical
      netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)
      elseif ln == length(weights)+1
      netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x
      end
      end
      yhat(n,:) = netValues{n,end};
      end
      end

      function [loss ydelta]= calculateLoss(yhat, y)
      ydelta = yhat-y;
      loss = sum(ydelta.^2, 2)/2;
      end

      function dLdW = calculatePartials(netValues, weights, ydelta)
      numexamples=size(netValues,1);
      dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
      dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
      dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
      dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
      delta = cell(numexamples,length(weights));%dVdU .* dLdV
      for n = 1:numexamples
      for ln = length(weights):-1:1
      if ln == length(weights)
      dUdW{n,ln} = netValues{n,ln}';
      dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer
      dLdV{n,ln} = ydelta(n,:);
      delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
      dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function
      % [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]

      else
      %logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function
      reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
      dUdW{n,ln} = netValues{n,ln}';
      %dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative
      dVdU{n,ln} = sign(reluvalue); %relu derivative
      dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')'; %start from index2 because index1 has holds the weight for the bias one level up
      delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
      dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;
      % [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]
      end
      end
      end
      end

      function newweights = updateweights(dWdL, weights, nue)
      newweights = cell(size(weights));
      for ln = 1:length(weights)
      for n = 1:size(dWdL,1)
      if n==1
      meandWdL = dWdL{n,ln}/size(dWdL,1);
      else
      meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch
      end
      end
      newweights{ln} = weights{ln} - meandWdL*nue;
      end
      end









      share|improve this question









      New contributor




      DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.



      The test outputs are a sin function and a linear function of the inputs, with no noise.



      In short I have two questions:




      1. define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

      2. When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.


      I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.



      Notes:
      The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
      Loss is (yModel-y)^2



      The entire matlab code is below:



      rng('default')
      nue = .01;
      batchsize = 1;

      X = rand(200100,3);
      y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
      numTestDays=100;

      %define hidden layer structure
      HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

      %run NN machinery
      [modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

      % predict y for out of sample data
      [netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

      %plot output
      figure;
      subplot(1,2,1)
      scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
      hold all
      scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
      title('y1 and y1 NN model')

      subplot(1,2,2)
      scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
      hold all
      scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
      title('y2 and y2 NN model')

      figure; plot(losslog(5:end))
      xlabel('n'); ylabel('loss'); title('loss of training example n')

      %************* functions below **********************

      function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
      numdata = size(X,1); %num data points
      dimIn = size(X,2); %dim of data
      dimOut = size(y,2); %num outputs we are modeling
      numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
      layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

      %create and initialize weights
      weights = cell(1,numLayers);
      rng('default');
      for ln = 1:numLayers
      if ln == 1
      weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
      else
      weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
      end
      end

      k=0;losslog=;
      for n = batchsize:batchsize:numdata
      theseidx = n-batchsize+1:n;
      [netValues yhat] = projectforward(X(theseidx,:), weights);
      [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
      dLdW = calculatePartials(netValues, weights, ydelta);
      weights = updateweights(dLdW, weights, nue);
      k=k+1; losslog(k)=mean(loss);
      end
      finalweights=weights;
      end

      function [netValues yhat] = projectforward(X, weights)
      netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
      yhat = nan(size(X,1), size(weights{end},2));
      for n = 1:size(X,1)
      for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
      if ln ==1
      netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
      elseif ln < length(weights)+1
      tempvals = netValues{n, ln-1}*weights{ln-1};
      %netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical
      netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)
      elseif ln == length(weights)+1
      netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x
      end
      end
      yhat(n,:) = netValues{n,end};
      end
      end

      function [loss ydelta]= calculateLoss(yhat, y)
      ydelta = yhat-y;
      loss = sum(ydelta.^2, 2)/2;
      end

      function dLdW = calculatePartials(netValues, weights, ydelta)
      numexamples=size(netValues,1);
      dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
      dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
      dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
      dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
      delta = cell(numexamples,length(weights));%dVdU .* dLdV
      for n = 1:numexamples
      for ln = length(weights):-1:1
      if ln == length(weights)
      dUdW{n,ln} = netValues{n,ln}';
      dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer
      dLdV{n,ln} = ydelta(n,:);
      delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
      dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function
      % [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]

      else
      %logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function
      reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
      dUdW{n,ln} = netValues{n,ln}';
      %dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative
      dVdU{n,ln} = sign(reluvalue); %relu derivative
      dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')'; %start from index2 because index1 has holds the weight for the bias one level up
      delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
      dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;
      % [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]
      end
      end
      end
      end

      function newweights = updateweights(dWdL, weights, nue)
      newweights = cell(size(weights));
      for ln = 1:length(weights)
      for n = 1:size(dWdL,1)
      if n==1
      meandWdL = dWdL{n,ln}/size(dWdL,1);
      else
      meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch
      end
      end
      newweights{ln} = weights{ln} - meandWdL*nue;
      end
      end






      neural-network






      share|improve this question









      New contributor




      DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 22 hours ago







      DKreitzman













      New contributor




      DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 22 hours ago









      DKreitzmanDKreitzman

      11




      11




      New contributor




      DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes












          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.













          DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.












          DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Vallis Paradisi

          Tabula Rosettana