DNN practice: errors and strange behavior
$begingroup$
I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.
The test outputs are a sin function and a linear function of the inputs, with no noise.
In short I have two questions:
- define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.
- When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.
I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.
Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2
The entire matlab code is below:
rng('default')
nue = .01;
batchsize = 1;
X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;
%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively
%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);
% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);
%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')
subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')
figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')
%************* functions below **********************
function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer
%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end
k=0;losslog=;
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end
function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weights{end},2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValues{n, ln-1}*weights{ln-1};
%netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x
end
end
yhat(n,:) = netValues{n,end};
end
end
function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end
function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdW{n,ln} = netValues{n,ln}';
dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer
dLdV{n,ln} = ydelta(n,:);
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]
else
%logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function
reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdW{n,ln} = netValues{n,ln}';
%dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdU{n,ln} = sign(reluvalue); %relu derivative
dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')'; %start from index2 because index1 has holds the weight for the bias one level up
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;
% [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]
end
end
end
end
function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdL{n,ln}/size(dWdL,1);
else
meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweights{ln} = weights{ln} - meandWdL*nue;
end
end
neural-network
New contributor
$endgroup$
add a comment |
$begingroup$
I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.
The test outputs are a sin function and a linear function of the inputs, with no noise.
In short I have two questions:
- define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.
- When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.
I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.
Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2
The entire matlab code is below:
rng('default')
nue = .01;
batchsize = 1;
X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;
%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively
%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);
% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);
%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')
subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')
figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')
%************* functions below **********************
function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer
%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end
k=0;losslog=;
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end
function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weights{end},2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValues{n, ln-1}*weights{ln-1};
%netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x
end
end
yhat(n,:) = netValues{n,end};
end
end
function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end
function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdW{n,ln} = netValues{n,ln}';
dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer
dLdV{n,ln} = ydelta(n,:);
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]
else
%logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function
reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdW{n,ln} = netValues{n,ln}';
%dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdU{n,ln} = sign(reluvalue); %relu derivative
dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')'; %start from index2 because index1 has holds the weight for the bias one level up
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;
% [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]
end
end
end
end
function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdL{n,ln}/size(dWdL,1);
else
meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweights{ln} = weights{ln} - meandWdL*nue;
end
end
neural-network
New contributor
$endgroup$
add a comment |
$begingroup$
I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.
The test outputs are a sin function and a linear function of the inputs, with no noise.
In short I have two questions:
- define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.
- When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.
I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.
Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2
The entire matlab code is below:
rng('default')
nue = .01;
batchsize = 1;
X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;
%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively
%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);
% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);
%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')
subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')
figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')
%************* functions below **********************
function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer
%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end
k=0;losslog=;
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end
function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weights{end},2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValues{n, ln-1}*weights{ln-1};
%netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x
end
end
yhat(n,:) = netValues{n,end};
end
end
function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end
function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdW{n,ln} = netValues{n,ln}';
dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer
dLdV{n,ln} = ydelta(n,:);
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]
else
%logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function
reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdW{n,ln} = netValues{n,ln}';
%dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdU{n,ln} = sign(reluvalue); %relu derivative
dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')'; %start from index2 because index1 has holds the weight for the bias one level up
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;
% [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]
end
end
end
end
function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdL{n,ln}/size(dWdL,1);
else
meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweights{ln} = weights{ln} - meandWdL*nue;
end
end
neural-network
New contributor
$endgroup$
I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.
The test outputs are a sin function and a linear function of the inputs, with no noise.
In short I have two questions:
- define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.
- When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.
I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.
Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2
The entire matlab code is below:
rng('default')
nue = .01;
batchsize = 1;
X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;
%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively
%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);
% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);
%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')
subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')
figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')
%************* functions below **********************
function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer
%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end
k=0;losslog=;
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end
function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weights{end},2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValues{n, ln-1}*weights{ln-1};
%netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x
end
end
yhat(n,:) = netValues{n,end};
end
end
function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end
function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdW{n,ln} = netValues{n,ln}';
dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer
dLdV{n,ln} = ydelta(n,:);
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]
else
%logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function
reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdW{n,ln} = netValues{n,ln}';
%dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdU{n,ln} = sign(reluvalue); %relu derivative
dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')'; %start from index2 because index1 has holds the weight for the bias one level up
delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};
dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;
% [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]
end
end
end
end
function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdL{n,ln}/size(dWdL,1);
else
meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweights{ln} = weights{ln} - meandWdL*nue;
end
end
neural-network
neural-network
New contributor
New contributor
edited 22 hours ago
DKreitzman
New contributor
asked 22 hours ago
DKreitzmanDKreitzman
11
11
New contributor
New contributor
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.
DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.
DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.
DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown