SGD vs SGD in mini batches












0












$begingroup$


So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?



Here is my code(Java)



 public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){
//Set verbose
setVerbose(verbose);

//Create training set
TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);

//Loop through Epochs
for(int i = 0; i<epochs;i++){
//Print Progress
print("rTrained: " + i + "/" + epochs);

//Shuffle training set
trainingSet.shuffle();

//Create the mini batches
TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);

//Loop through mini batches
for(int j = 0; j<mini_batches.length;j++){
update_mini_batch(mini_batches[j]);
}
}

//Print Progress
print("rTrained: " + epochs + "/" + epochs);
print("nDone!");
}

private Pair backprop(double inputs, double target_outputs){
//Create Expected output column matrix
Matrix EO = Matrix.fromArray(new double{target_outputs});

//Forward Propagate inputs
feedForward(inputs);

//Get the Errors which is also the Bias Delta
Matrix Errors = calculateError(EO);

//Weight Delta Matrix
Matrix dCdW = new Matrix[Errors.length];

//Calculate the Deltas
//Calculating the first Layers Delta
dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);

//Rest of network
for (int i = 1; i < Errors.length; i++) {
dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);
}

return new Pair(dCdW,Errors);
}
private void update_mini_batch(TrainingSet.Data mini_batch){
//Get first deltas
Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);

//Loop through mini batch and sum the deltas
for(int i = 1; i< mini_batch.length;i++){
deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));
}

//Multiply deltas by the learning rate
//and divide by the mini batch size to get
//the mean of the deltas
deltas.multiply(learningRate/mini_batch.length);

//Update Weights and Biases
for(int i= 0; i<W.length;i++){
W[i].subtract(deltas.dCdW[i]);
B[i].subtract(deltas.dCdB[i]);
}
}









share|improve this question









New contributor




Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
    $endgroup$
    – Javi
    2 days ago










  • $begingroup$
    Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
    $endgroup$
    – Itay Bachar
    2 days ago
















0












$begingroup$


So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?



Here is my code(Java)



 public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){
//Set verbose
setVerbose(verbose);

//Create training set
TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);

//Loop through Epochs
for(int i = 0; i<epochs;i++){
//Print Progress
print("rTrained: " + i + "/" + epochs);

//Shuffle training set
trainingSet.shuffle();

//Create the mini batches
TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);

//Loop through mini batches
for(int j = 0; j<mini_batches.length;j++){
update_mini_batch(mini_batches[j]);
}
}

//Print Progress
print("rTrained: " + epochs + "/" + epochs);
print("nDone!");
}

private Pair backprop(double inputs, double target_outputs){
//Create Expected output column matrix
Matrix EO = Matrix.fromArray(new double{target_outputs});

//Forward Propagate inputs
feedForward(inputs);

//Get the Errors which is also the Bias Delta
Matrix Errors = calculateError(EO);

//Weight Delta Matrix
Matrix dCdW = new Matrix[Errors.length];

//Calculate the Deltas
//Calculating the first Layers Delta
dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);

//Rest of network
for (int i = 1; i < Errors.length; i++) {
dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);
}

return new Pair(dCdW,Errors);
}
private void update_mini_batch(TrainingSet.Data mini_batch){
//Get first deltas
Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);

//Loop through mini batch and sum the deltas
for(int i = 1; i< mini_batch.length;i++){
deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));
}

//Multiply deltas by the learning rate
//and divide by the mini batch size to get
//the mean of the deltas
deltas.multiply(learningRate/mini_batch.length);

//Update Weights and Biases
for(int i= 0; i<W.length;i++){
W[i].subtract(deltas.dCdW[i]);
B[i].subtract(deltas.dCdB[i]);
}
}









share|improve this question









New contributor




Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
    $endgroup$
    – Javi
    2 days ago










  • $begingroup$
    Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
    $endgroup$
    – Itay Bachar
    2 days ago














0












0








0





$begingroup$


So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?



Here is my code(Java)



 public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){
//Set verbose
setVerbose(verbose);

//Create training set
TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);

//Loop through Epochs
for(int i = 0; i<epochs;i++){
//Print Progress
print("rTrained: " + i + "/" + epochs);

//Shuffle training set
trainingSet.shuffle();

//Create the mini batches
TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);

//Loop through mini batches
for(int j = 0; j<mini_batches.length;j++){
update_mini_batch(mini_batches[j]);
}
}

//Print Progress
print("rTrained: " + epochs + "/" + epochs);
print("nDone!");
}

private Pair backprop(double inputs, double target_outputs){
//Create Expected output column matrix
Matrix EO = Matrix.fromArray(new double{target_outputs});

//Forward Propagate inputs
feedForward(inputs);

//Get the Errors which is also the Bias Delta
Matrix Errors = calculateError(EO);

//Weight Delta Matrix
Matrix dCdW = new Matrix[Errors.length];

//Calculate the Deltas
//Calculating the first Layers Delta
dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);

//Rest of network
for (int i = 1; i < Errors.length; i++) {
dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);
}

return new Pair(dCdW,Errors);
}
private void update_mini_batch(TrainingSet.Data mini_batch){
//Get first deltas
Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);

//Loop through mini batch and sum the deltas
for(int i = 1; i< mini_batch.length;i++){
deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));
}

//Multiply deltas by the learning rate
//and divide by the mini batch size to get
//the mean of the deltas
deltas.multiply(learningRate/mini_batch.length);

//Update Weights and Biases
for(int i= 0; i<W.length;i++){
W[i].subtract(deltas.dCdW[i]);
B[i].subtract(deltas.dCdB[i]);
}
}









share|improve this question









New contributor




Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?



Here is my code(Java)



 public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){
//Set verbose
setVerbose(verbose);

//Create training set
TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);

//Loop through Epochs
for(int i = 0; i<epochs;i++){
//Print Progress
print("rTrained: " + i + "/" + epochs);

//Shuffle training set
trainingSet.shuffle();

//Create the mini batches
TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);

//Loop through mini batches
for(int j = 0; j<mini_batches.length;j++){
update_mini_batch(mini_batches[j]);
}
}

//Print Progress
print("rTrained: " + epochs + "/" + epochs);
print("nDone!");
}

private Pair backprop(double inputs, double target_outputs){
//Create Expected output column matrix
Matrix EO = Matrix.fromArray(new double{target_outputs});

//Forward Propagate inputs
feedForward(inputs);

//Get the Errors which is also the Bias Delta
Matrix Errors = calculateError(EO);

//Weight Delta Matrix
Matrix dCdW = new Matrix[Errors.length];

//Calculate the Deltas
//Calculating the first Layers Delta
dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);

//Rest of network
for (int i = 1; i < Errors.length; i++) {
dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);
}

return new Pair(dCdW,Errors);
}
private void update_mini_batch(TrainingSet.Data mini_batch){
//Get first deltas
Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);

//Loop through mini batch and sum the deltas
for(int i = 1; i< mini_batch.length;i++){
deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));
}

//Multiply deltas by the learning rate
//and divide by the mini batch size to get
//the mean of the deltas
deltas.multiply(learningRate/mini_batch.length);

//Update Weights and Biases
for(int i= 0; i<W.length;i++){
W[i].subtract(deltas.dCdW[i]);
B[i].subtract(deltas.dCdB[i]);
}
}






neural-network java mini-batch-gradient-descent






share|improve this question









New contributor




Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago







Itay Bachar













New contributor




Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Itay BacharItay Bachar

32




32




New contributor




Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Itay Bachar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
    $endgroup$
    – Javi
    2 days ago










  • $begingroup$
    Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
    $endgroup$
    – Itay Bachar
    2 days ago


















  • $begingroup$
    did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
    $endgroup$
    – Javi
    2 days ago










  • $begingroup$
    Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
    $endgroup$
    – Itay Bachar
    2 days ago
















$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago




$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago












$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago




$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago










1 Answer
1






active

oldest

votes


















0












$begingroup$

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.



If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.



For your case, I recommend you two things




  1. Try different batch sizes.

  2. Make sure you shuffle your batches!!! that will certainly help you a bit.






share|improve this answer









$endgroup$













  • $begingroup$
    Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
    $endgroup$
    – Itay Bachar
    yesterday













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46798%2fsgd-vs-sgd-in-mini-batches%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.



If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.



For your case, I recommend you two things




  1. Try different batch sizes.

  2. Make sure you shuffle your batches!!! that will certainly help you a bit.






share|improve this answer









$endgroup$













  • $begingroup$
    Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
    $endgroup$
    – Itay Bachar
    yesterday


















0












$begingroup$

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.



If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.



For your case, I recommend you two things




  1. Try different batch sizes.

  2. Make sure you shuffle your batches!!! that will certainly help you a bit.






share|improve this answer









$endgroup$













  • $begingroup$
    Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
    $endgroup$
    – Itay Bachar
    yesterday
















0












0








0





$begingroup$

My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.



If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.



For your case, I recommend you two things




  1. Try different batch sizes.

  2. Make sure you shuffle your batches!!! that will certainly help you a bit.






share|improve this answer









$endgroup$



My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.



If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.



For your case, I recommend you two things




  1. Try different batch sizes.

  2. Make sure you shuffle your batches!!! that will certainly help you a bit.







share|improve this answer












share|improve this answer



share|improve this answer










answered 2 days ago









Juan Antonio Gomez MorianoJuan Antonio Gomez Moriano

656213




656213












  • $begingroup$
    Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
    $endgroup$
    – Itay Bachar
    yesterday




















  • $begingroup$
    Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
    $endgroup$
    – Itay Bachar
    yesterday


















$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday






$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday












Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.













Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.












Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46798%2fsgd-vs-sgd-in-mini-batches%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to label and detect the document text images

Tabula Rosettana

Aureus (color)