SGD vs SGD in mini batches

So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?

Here is my code(Java)

 public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){

    //Set verbose

    setVerbose(verbose);



    //Create training set

    TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);



    //Loop through Epochs

    for(int i = 0; i<epochs;i++){

        //Print Progress

        print("rTrained: " + i + "/" + epochs);



        //Shuffle training set

        trainingSet.shuffle();



        //Create the mini batches

        TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);



        //Loop through mini batches

        for(int j = 0; j<mini_batches.length;j++){

            update_mini_batch(mini_batches[j]);

        }

    }



    //Print Progress

    print("rTrained: " + epochs + "/" + epochs);

    print("nDone!");

}



   private Pair backprop(double inputs, double target_outputs){

    //Create Expected output column matrix

    Matrix EO = Matrix.fromArray(new double{target_outputs});



    //Forward Propagate inputs

    feedForward(inputs);



    //Get the Errors which is also the Bias Delta

    Matrix Errors = calculateError(EO);



    //Weight Delta Matrix

    Matrix dCdW = new Matrix[Errors.length];



    //Calculate the Deltas

    //Calculating the first Layers Delta

    dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);



    //Rest of network

    for (int i = 1; i < Errors.length; i++) {

        dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);

    }



    return new Pair(dCdW,Errors);

}

private void update_mini_batch(TrainingSet.Data mini_batch){

    //Get first deltas

    Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);



    //Loop through mini batch and sum the deltas

    for(int i = 1; i< mini_batch.length;i++){

        deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));

    }



    //Multiply deltas by the learning rate

    //and divide by the mini batch size to get

    //the mean of the deltas

    deltas.multiply(learningRate/mini_batch.length);



    //Update Weights and Biases

    for(int i= 0; i<W.length;i++){

        W[i].subtract(deltas.dCdW[i]);

        B[i].subtract(deltas.dCdB[i]);

    }

}

edited 2 days ago

asked 2 days ago

Itay Bachar

New contributor

$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago

$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago

add a comment |

Here is my code(Java)

 public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){

    //Set verbose

    setVerbose(verbose);



    //Create training set

    TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);



    //Loop through Epochs

    for(int i = 0; i<epochs;i++){

        //Print Progress

        print("rTrained: " + i + "/" + epochs);



        //Shuffle training set

        trainingSet.shuffle();



        //Create the mini batches

        TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);



        //Loop through mini batches

        for(int j = 0; j<mini_batches.length;j++){

            update_mini_batch(mini_batches[j]);

        }

    }



    //Print Progress

    print("rTrained: " + epochs + "/" + epochs);

    print("nDone!");

}



   private Pair backprop(double inputs, double target_outputs){

    //Create Expected output column matrix

    Matrix EO = Matrix.fromArray(new double{target_outputs});



    //Forward Propagate inputs

    feedForward(inputs);



    //Get the Errors which is also the Bias Delta

    Matrix Errors = calculateError(EO);



    //Weight Delta Matrix

    Matrix dCdW = new Matrix[Errors.length];



    //Calculate the Deltas

    //Calculating the first Layers Delta

    dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);



    //Rest of network

    for (int i = 1; i < Errors.length; i++) {

        dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);

    }



    return new Pair(dCdW,Errors);

}

private void update_mini_batch(TrainingSet.Data mini_batch){

    //Get first deltas

    Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);



    //Loop through mini batch and sum the deltas

    for(int i = 1; i< mini_batch.length;i++){

        deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));

    }



    //Multiply deltas by the learning rate

    //and divide by the mini batch size to get

    //the mean of the deltas

    deltas.multiply(learningRate/mini_batch.length);



    //Update Weights and Biases

    for(int i= 0; i<W.length;i++){

        W[i].subtract(deltas.dCdW[i]);

        B[i].subtract(deltas.dCdB[i]);

    }

}

edited 2 days ago

asked 2 days ago

Itay Bachar

New contributor

$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago

$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago

add a comment |

Here is my code(Java)

 public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){

    //Set verbose

    setVerbose(verbose);



    //Create training set

    TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);



    //Loop through Epochs

    for(int i = 0; i<epochs;i++){

        //Print Progress

        print("rTrained: " + i + "/" + epochs);



        //Shuffle training set

        trainingSet.shuffle();



        //Create the mini batches

        TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);



        //Loop through mini batches

        for(int j = 0; j<mini_batches.length;j++){

            update_mini_batch(mini_batches[j]);

        }

    }



    //Print Progress

    print("rTrained: " + epochs + "/" + epochs);

    print("nDone!");

}



   private Pair backprop(double inputs, double target_outputs){

    //Create Expected output column matrix

    Matrix EO = Matrix.fromArray(new double{target_outputs});



    //Forward Propagate inputs

    feedForward(inputs);



    //Get the Errors which is also the Bias Delta

    Matrix Errors = calculateError(EO);



    //Weight Delta Matrix

    Matrix dCdW = new Matrix[Errors.length];



    //Calculate the Deltas

    //Calculating the first Layers Delta

    dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);



    //Rest of network

    for (int i = 1; i < Errors.length; i++) {

        dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);

    }



    return new Pair(dCdW,Errors);

}

private void update_mini_batch(TrainingSet.Data mini_batch){

    //Get first deltas

    Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);



    //Loop through mini batch and sum the deltas

    for(int i = 1; i< mini_batch.length;i++){

        deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));

    }



    //Multiply deltas by the learning rate

    //and divide by the mini batch size to get

    //the mean of the deltas

    deltas.multiply(learningRate/mini_batch.length);



    //Update Weights and Biases

    for(int i= 0; i<W.length;i++){

        W[i].subtract(deltas.dCdW[i]);

        B[i].subtract(deltas.dCdB[i]);

    }

}

edited 2 days ago

asked 2 days ago

Itay Bachar

New contributor

Here is my code(Java)

 public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){

    //Set verbose

    setVerbose(verbose);



    //Create training set

    TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);



    //Loop through Epochs

    for(int i = 0; i<epochs;i++){

        //Print Progress

        print("rTrained: " + i + "/" + epochs);



        //Shuffle training set

        trainingSet.shuffle();



        //Create the mini batches

        TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);



        //Loop through mini batches

        for(int j = 0; j<mini_batches.length;j++){

            update_mini_batch(mini_batches[j]);

        }

    }



    //Print Progress

    print("rTrained: " + epochs + "/" + epochs);

    print("nDone!");

}



   private Pair backprop(double inputs, double target_outputs){

    //Create Expected output column matrix

    Matrix EO = Matrix.fromArray(new double{target_outputs});



    //Forward Propagate inputs

    feedForward(inputs);



    //Get the Errors which is also the Bias Delta

    Matrix Errors = calculateError(EO);



    //Weight Delta Matrix

    Matrix dCdW = new Matrix[Errors.length];



    //Calculate the Deltas

    //Calculating the first Layers Delta

    dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);



    //Rest of network

    for (int i = 1; i < Errors.length; i++) {

        dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);

    }



    return new Pair(dCdW,Errors);

}

private void update_mini_batch(TrainingSet.Data mini_batch){

    //Get first deltas

    Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);



    //Loop through mini batch and sum the deltas

    for(int i = 1; i< mini_batch.length;i++){

        deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));

    }



    //Multiply deltas by the learning rate

    //and divide by the mini batch size to get

    //the mean of the deltas

    deltas.multiply(learningRate/mini_batch.length);



    //Update Weights and Biases

    for(int i= 0; i<W.length;i++){

        W[i].subtract(deltas.dCdW[i]);

        B[i].subtract(deltas.dCdB[i]);

    }

}

neural-network java mini-batch-gradient-descent

edited 2 days ago

asked 2 days ago

Itay Bachar

New contributor

edited 2 days ago

asked 2 days ago

Itay Bachar

New contributor

edited 2 days ago

asked 2 days ago

Itay Bachar

New contributor

asked 2 days ago

Itay Bachar

asked 2 days ago

Itay Bachar

New contributor

Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago

$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago

add a comment |

$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago

$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago

did you try it for different sizes of training sets? if training sets are too small you may not notice the difference

– Javi
2 days ago

Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300

– Itay Bachar
2 days ago

add a comment |

1 Answer
1

active

oldest

votes

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.

If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.

For your case, I recommend you two things

Try different batch sizes.

Make sure you shuffle your batches!!! that will certainly help you a bit.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46798%2fsgd-vs-sgd-in-mini-batches%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.

If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.

For your case, I recommend you two things

Try different batch sizes.

Make sure you shuffle your batches!!! that will certainly help you a bit.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday

add a comment |

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.

If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.

For your case, I recommend you two things

Try different batch sizes.

Make sure you shuffle your batches!!! that will certainly help you a bit.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday

add a comment |

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.

If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.

For your case, I recommend you two things

Try different batch sizes.

Make sure you shuffle your batches!!! that will certainly help you a bit.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.

If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.

For your case, I recommend you two things

Try different batch sizes.

Make sure you shuffle your batches!!! that will certainly help you a bit.

answered 2 days ago

Juan Antonio Gomez Moriano

656213

answered 2 days ago

Juan Antonio Gomez Moriano

656213

answered 2 days ago

Juan Antonio Gomez Moriano

656213

answered 2 days ago

Juan Antonio Gomez Moriano

656213

$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday

add a comment |

$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday

Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…

– Itay Bachar
yesterday

add a comment |

Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk