DNN practice: errors and strange behavior

I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.

The test outputs are a sin function and a linear function of the inputs, with no noise.

In short I have two questions:

define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.

Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2

The entire matlab code is below:

rng('default')

nue = .01;

batchsize = 1;



X = rand(200100,3);

y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs

numTestDays=100;



%define hidden layer structure

HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively



%run NN machinery

[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);



% predict y for out of sample data

[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);



%plot output

figure; 

subplot(1,2,1)

scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])

hold all

scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))

title('y1 and y1 NN model')



subplot(1,2,2)

scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])

hold all

scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))

title('y2 and y2 NN model')



figure; plot(losslog(5:end))

xlabel('n'); ylabel('loss'); title('loss of training example n')



%************* functions below **********************



function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)

    numdata = size(X,1); %num data points

    dimIn = size(X,2);   %dim of data

    dimOut = size(y,2);  %num outputs we are modeling    

    numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer

    layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer



    %create and initialize weights

    weights = cell(1,numLayers);

    rng('default');

    for ln = 1:numLayers

        if ln == 1

            weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term

        else

            weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term

        end                   

    end



    k=0;losslog=;

    for n = batchsize:batchsize:numdata

        theseidx = n-batchsize+1:n;

        [netValues yhat] = projectforward(X(theseidx,:), weights);

        [loss ydelta]    = calculateLoss(yhat, y(theseidx,:));

        dLdW             = calculatePartials(netValues, weights, ydelta);

        weights          = updateweights(dLdW, weights, nue);

        k=k+1; losslog(k)=mean(loss);

    end

    finalweights=weights;

end



function [netValues yhat] = projectforward(X, weights)

    netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs

    yhat = nan(size(X,1), size(weights{end},2));

    for n = 1:size(X,1)

        for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer

            if ln ==1

                netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0

            elseif ln < length(weights)+1

                tempvals = netValues{n, ln-1}*weights{ln-1};

                %netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical

                netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)

            elseif ln == length(weights)+1 

                netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x

            end

        end

        yhat(n,:) = netValues{n,end};

    end

end



function [loss ydelta]= calculateLoss(yhat, y)

    ydelta = yhat-y;

    loss = sum(ydelta.^2, 2)/2;

end



function dLdW = calculatePartials(netValues, weights, ydelta)

    numexamples=size(netValues,1); 

    dLdW = cell(numexamples,length(weights)); %dLoss/dWeights

    dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput

    dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)

    dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights

    delta = cell(numexamples,length(weights));%dVdU .* dLdV

    for n = 1:numexamples

        for ln = length(weights):-1:1

            if ln == length(weights)

                dUdW{n,ln} = netValues{n,ln}';

                dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer

                dLdV{n,ln} = ydelta(n,:);

                delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};

                dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function

             %   [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]



            else

                %logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function   

                reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream

                dUdW{n,ln} = netValues{n,ln}';

                %dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative

                dVdU{n,ln} = sign(reluvalue); %relu derivative

                dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')';   %start from index2 because index1 has holds the weight for the bias one level up

                delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};                

                dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;          

             %   [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]

            end            

        end

    end

end



function newweights = updateweights(dWdL, weights, nue)

    newweights = cell(size(weights));

    for ln = 1:length(weights)

        for n = 1:size(dWdL,1)

            if n==1

                meandWdL = dWdL{n,ln}/size(dWdL,1);

            else

                meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch

            end

        end

        newweights{ln} =  weights{ln} - meandWdL*nue;

    end

end

edited 22 hours ago

asked 22 hours ago

DKreitzman

New contributor

add a comment |

I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.

The test outputs are a sin function and a linear function of the inputs, with no noise.

In short I have two questions:

define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.

Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2

The entire matlab code is below:

rng('default')

nue = .01;

batchsize = 1;



X = rand(200100,3);

y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs

numTestDays=100;



%define hidden layer structure

HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively



%run NN machinery

[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);



% predict y for out of sample data

[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);



%plot output

figure; 

subplot(1,2,1)

scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])

hold all

scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))

title('y1 and y1 NN model')



subplot(1,2,2)

scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])

hold all

scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))

title('y2 and y2 NN model')



figure; plot(losslog(5:end))

xlabel('n'); ylabel('loss'); title('loss of training example n')



%************* functions below **********************



function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)

    numdata = size(X,1); %num data points

    dimIn = size(X,2);   %dim of data

    dimOut = size(y,2);  %num outputs we are modeling    

    numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer

    layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer



    %create and initialize weights

    weights = cell(1,numLayers);

    rng('default');

    for ln = 1:numLayers

        if ln == 1

            weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term

        else

            weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term

        end                   

    end



    k=0;losslog=;

    for n = batchsize:batchsize:numdata

        theseidx = n-batchsize+1:n;

        [netValues yhat] = projectforward(X(theseidx,:), weights);

        [loss ydelta]    = calculateLoss(yhat, y(theseidx,:));

        dLdW             = calculatePartials(netValues, weights, ydelta);

        weights          = updateweights(dLdW, weights, nue);

        k=k+1; losslog(k)=mean(loss);

    end

    finalweights=weights;

end



function [netValues yhat] = projectforward(X, weights)

    netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs

    yhat = nan(size(X,1), size(weights{end},2));

    for n = 1:size(X,1)

        for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer

            if ln ==1

                netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0

            elseif ln < length(weights)+1

                tempvals = netValues{n, ln-1}*weights{ln-1};

                %netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical

                netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)

            elseif ln == length(weights)+1 

                netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x

            end

        end

        yhat(n,:) = netValues{n,end};

    end

end



function [loss ydelta]= calculateLoss(yhat, y)

    ydelta = yhat-y;

    loss = sum(ydelta.^2, 2)/2;

end



function dLdW = calculatePartials(netValues, weights, ydelta)

    numexamples=size(netValues,1); 

    dLdW = cell(numexamples,length(weights)); %dLoss/dWeights

    dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput

    dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)

    dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights

    delta = cell(numexamples,length(weights));%dVdU .* dLdV

    for n = 1:numexamples

        for ln = length(weights):-1:1

            if ln == length(weights)

                dUdW{n,ln} = netValues{n,ln}';

                dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer

                dLdV{n,ln} = ydelta(n,:);

                delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};

                dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function

             %   [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]



            else

                %logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function   

                reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream

                dUdW{n,ln} = netValues{n,ln}';

                %dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative

                dVdU{n,ln} = sign(reluvalue); %relu derivative

                dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')';   %start from index2 because index1 has holds the weight for the bias one level up

                delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};                

                dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;          

             %   [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]

            end            

        end

    end

end



function newweights = updateweights(dWdL, weights, nue)

    newweights = cell(size(weights));

    for ln = 1:length(weights)

        for n = 1:size(dWdL,1)

            if n==1

                meandWdL = dWdL{n,ln}/size(dWdL,1);

            else

                meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch

            end

        end

        newweights{ln} =  weights{ln} - meandWdL*nue;

    end

end

edited 22 hours ago

asked 22 hours ago

DKreitzman

New contributor

add a comment |

I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.

The test outputs are a sin function and a linear function of the inputs, with no noise.

In short I have two questions:

define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.

Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2

The entire matlab code is below:

rng('default')

nue = .01;

batchsize = 1;



X = rand(200100,3);

y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs

numTestDays=100;



%define hidden layer structure

HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively



%run NN machinery

[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);



% predict y for out of sample data

[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);



%plot output

figure; 

subplot(1,2,1)

scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])

hold all

scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))

title('y1 and y1 NN model')



subplot(1,2,2)

scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])

hold all

scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))

title('y2 and y2 NN model')



figure; plot(losslog(5:end))

xlabel('n'); ylabel('loss'); title('loss of training example n')



%************* functions below **********************



function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)

    numdata = size(X,1); %num data points

    dimIn = size(X,2);   %dim of data

    dimOut = size(y,2);  %num outputs we are modeling    

    numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer

    layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer



    %create and initialize weights

    weights = cell(1,numLayers);

    rng('default');

    for ln = 1:numLayers

        if ln == 1

            weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term

        else

            weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term

        end                   

    end



    k=0;losslog=;

    for n = batchsize:batchsize:numdata

        theseidx = n-batchsize+1:n;

        [netValues yhat] = projectforward(X(theseidx,:), weights);

        [loss ydelta]    = calculateLoss(yhat, y(theseidx,:));

        dLdW             = calculatePartials(netValues, weights, ydelta);

        weights          = updateweights(dLdW, weights, nue);

        k=k+1; losslog(k)=mean(loss);

    end

    finalweights=weights;

end



function [netValues yhat] = projectforward(X, weights)

    netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs

    yhat = nan(size(X,1), size(weights{end},2));

    for n = 1:size(X,1)

        for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer

            if ln ==1

                netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0

            elseif ln < length(weights)+1

                tempvals = netValues{n, ln-1}*weights{ln-1};

                %netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical

                netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)

            elseif ln == length(weights)+1 

                netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x

            end

        end

        yhat(n,:) = netValues{n,end};

    end

end



function [loss ydelta]= calculateLoss(yhat, y)

    ydelta = yhat-y;

    loss = sum(ydelta.^2, 2)/2;

end



function dLdW = calculatePartials(netValues, weights, ydelta)

    numexamples=size(netValues,1); 

    dLdW = cell(numexamples,length(weights)); %dLoss/dWeights

    dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput

    dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)

    dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights

    delta = cell(numexamples,length(weights));%dVdU .* dLdV

    for n = 1:numexamples

        for ln = length(weights):-1:1

            if ln == length(weights)

                dUdW{n,ln} = netValues{n,ln}';

                dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer

                dLdV{n,ln} = ydelta(n,:);

                delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};

                dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function

             %   [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]



            else

                %logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function   

                reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream

                dUdW{n,ln} = netValues{n,ln}';

                %dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative

                dVdU{n,ln} = sign(reluvalue); %relu derivative

                dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')';   %start from index2 because index1 has holds the weight for the bias one level up

                delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};                

                dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;          

             %   [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]

            end            

        end

    end

end



function newweights = updateweights(dWdL, weights, nue)

    newweights = cell(size(weights));

    for ln = 1:length(weights)

        for n = 1:size(dWdL,1)

            if n==1

                meandWdL = dWdL{n,ln}/size(dWdL,1);

            else

                meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch

            end

        end

        newweights{ln} =  weights{ln} - meandWdL*nue;

    end

end

edited 22 hours ago

asked 22 hours ago

DKreitzman

New contributor

I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.

The test outputs are a sin function and a linear function of the inputs, with no noise.

In short I have two questions:

define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.

Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2

The entire matlab code is below:

rng('default')

nue = .01;

batchsize = 1;



X = rand(200100,3);

y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs

numTestDays=100;



%define hidden layer structure

HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively



%run NN machinery

[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);



% predict y for out of sample data

[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);



%plot output

figure; 

subplot(1,2,1)

scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])

hold all

scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))

title('y1 and y1 NN model')



subplot(1,2,2)

scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])

hold all

scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))

title('y2 and y2 NN model')



figure; plot(losslog(5:end))

xlabel('n'); ylabel('loss'); title('loss of training example n')



%************* functions below **********************



function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)

    numdata = size(X,1); %num data points

    dimIn = size(X,2);   %dim of data

    dimOut = size(y,2);  %num outputs we are modeling    

    numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer

    layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer



    %create and initialize weights

    weights = cell(1,numLayers);

    rng('default');

    for ln = 1:numLayers

        if ln == 1

            weights{ln} = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term

        else

            weights{ln} = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term

        end                   

    end



    k=0;losslog=;

    for n = batchsize:batchsize:numdata

        theseidx = n-batchsize+1:n;

        [netValues yhat] = projectforward(X(theseidx,:), weights);

        [loss ydelta]    = calculateLoss(yhat, y(theseidx,:));

        dLdW             = calculatePartials(netValues, weights, ydelta);

        weights          = updateweights(dLdW, weights, nue);

        k=k+1; losslog(k)=mean(loss);

    end

    finalweights=weights;

end



function [netValues yhat] = projectforward(X, weights)

    netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs

    yhat = nan(size(X,1), size(weights{end},2));

    for n = 1:size(X,1)

        for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer

            if ln ==1

                netValues{n, ln} = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0

            elseif ln < length(weights)+1

                tempvals = netValues{n, ln-1}*weights{ln-1};

                %netValues{n, ln} = [1 1./(1+exp(-tempvals))]; %activation is logistical

                netValues{n, ln} = [1 max(0, tempvals)]; % activation is relu(x)

            elseif ln == length(weights)+1 

                netValues{n, ln} = netValues{n, ln-1}*weights{ln-1}; %last layer activationf(x) = x

            end

        end

        yhat(n,:) = netValues{n,end};

    end

end



function [loss ydelta]= calculateLoss(yhat, y)

    ydelta = yhat-y;

    loss = sum(ydelta.^2, 2)/2;

end



function dLdW = calculatePartials(netValues, weights, ydelta)

    numexamples=size(netValues,1); 

    dLdW = cell(numexamples,length(weights)); %dLoss/dWeights

    dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput

    dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)

    dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights

    delta = cell(numexamples,length(weights));%dVdU .* dLdV

    for n = 1:numexamples

        for ln = length(weights):-1:1

            if ln == length(weights)

                dUdW{n,ln} = netValues{n,ln}';

                dVdU{n,ln} = ones(size(netValues{n,ln+1})); %d/dx f(x), where f(x)=x in the output layer

                dLdV{n,ln} = ydelta(n,:);

                delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};

                dLdW{n,ln} = dUdW{n,ln}.*delta{n,ln} ; %using L = (yhat-y)^2/2 and linear activation function

             %   [ size(dLdV{n,ln}) size(dVdU{n,ln}) size(dUdW{n,ln}) size(delta{n,ln}) size( dLdW{n,ln})]



            else

                %logisticvalue = 1./(1+exp(-netValues{n,ln+1}(2:end))); %logistic activation function   

                reluvalue = max(0,netValues{n,ln+1}(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream

                dUdW{n,ln} = netValues{n,ln}';

                %dVdU{n,ln} = logisticvalue.*(1-logisticvalue); %logistic derivative

                dVdU{n,ln} = sign(reluvalue); %relu derivative

                dLdV{n,ln} = (weights{ln+1}(2:end,:) * delta{n,ln+1}')';   %start from index2 because index1 has holds the weight for the bias one level up

                delta{n,ln} = dVdU{n,ln}.*dLdV{n,ln};                

                dLdW{n,ln} = dUdW{n,ln} .* delta{n,ln} ;          

             %   [ size(dUdW{n,ln}) size(dVdU{n,ln}) size(dLdV{n,ln}) size(weights{ln+1}(2:end,:)) size(delta{n,ln+1}) size( dLdW{n,ln})]

            end            

        end

    end

end



function newweights = updateweights(dWdL, weights, nue)

    newweights = cell(size(weights));

    for ln = 1:length(weights)

        for n = 1:size(dWdL,1)

            if n==1

                meandWdL = dWdL{n,ln}/size(dWdL,1);

            else

                meandWdL = meandWdL + dWdL{n,ln}/size(dWdL,1); %average dWdL over all training examples in this batch

            end

        end

        newweights{ln} =  weights{ln} - meandWdL*nue;

    end

end

neural-network

edited 22 hours ago

asked 22 hours ago

DKreitzman

New contributor

edited 22 hours ago

asked 22 hours ago

DKreitzman

New contributor

edited 22 hours ago

asked 22 hours ago

DKreitzman

New contributor

asked 22 hours ago

DKreitzman

asked 22 hours ago

DKreitzman

New contributor

DKreitzman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

DKreitzman is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk