SGD vs SGD in mini batches
$begingroup$
So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?
Here is my code(Java)
public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){
//Set verbose
setVerbose(verbose);
//Create training set
TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);
//Loop through Epochs
for(int i = 0; i<epochs;i++){
//Print Progress
print("rTrained: " + i + "/" + epochs);
//Shuffle training set
trainingSet.shuffle();
//Create the mini batches
TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);
//Loop through mini batches
for(int j = 0; j<mini_batches.length;j++){
update_mini_batch(mini_batches[j]);
}
}
//Print Progress
print("rTrained: " + epochs + "/" + epochs);
print("nDone!");
}
private Pair backprop(double inputs, double target_outputs){
//Create Expected output column matrix
Matrix EO = Matrix.fromArray(new double{target_outputs});
//Forward Propagate inputs
feedForward(inputs);
//Get the Errors which is also the Bias Delta
Matrix Errors = calculateError(EO);
//Weight Delta Matrix
Matrix dCdW = new Matrix[Errors.length];
//Calculate the Deltas
//Calculating the first Layers Delta
dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);
//Rest of network
for (int i = 1; i < Errors.length; i++) {
dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);
}
return new Pair(dCdW,Errors);
}
private void update_mini_batch(TrainingSet.Data mini_batch){
//Get first deltas
Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);
//Loop through mini batch and sum the deltas
for(int i = 1; i< mini_batch.length;i++){
deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));
}
//Multiply deltas by the learning rate
//and divide by the mini batch size to get
//the mean of the deltas
deltas.multiply(learningRate/mini_batch.length);
//Update Weights and Biases
for(int i= 0; i<W.length;i++){
W[i].subtract(deltas.dCdW[i]);
B[i].subtract(deltas.dCdB[i]);
}
}
neural-network java mini-batch-gradient-descent
New contributor
$endgroup$
add a comment |
$begingroup$
So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?
Here is my code(Java)
public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){
//Set verbose
setVerbose(verbose);
//Create training set
TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);
//Loop through Epochs
for(int i = 0; i<epochs;i++){
//Print Progress
print("rTrained: " + i + "/" + epochs);
//Shuffle training set
trainingSet.shuffle();
//Create the mini batches
TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);
//Loop through mini batches
for(int j = 0; j<mini_batches.length;j++){
update_mini_batch(mini_batches[j]);
}
}
//Print Progress
print("rTrained: " + epochs + "/" + epochs);
print("nDone!");
}
private Pair backprop(double inputs, double target_outputs){
//Create Expected output column matrix
Matrix EO = Matrix.fromArray(new double{target_outputs});
//Forward Propagate inputs
feedForward(inputs);
//Get the Errors which is also the Bias Delta
Matrix Errors = calculateError(EO);
//Weight Delta Matrix
Matrix dCdW = new Matrix[Errors.length];
//Calculate the Deltas
//Calculating the first Layers Delta
dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);
//Rest of network
for (int i = 1; i < Errors.length; i++) {
dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);
}
return new Pair(dCdW,Errors);
}
private void update_mini_batch(TrainingSet.Data mini_batch){
//Get first deltas
Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);
//Loop through mini batch and sum the deltas
for(int i = 1; i< mini_batch.length;i++){
deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));
}
//Multiply deltas by the learning rate
//and divide by the mini batch size to get
//the mean of the deltas
deltas.multiply(learningRate/mini_batch.length);
//Update Weights and Biases
for(int i= 0; i<W.length;i++){
W[i].subtract(deltas.dCdW[i]);
B[i].subtract(deltas.dCdB[i]);
}
}
neural-network java mini-batch-gradient-descent
New contributor
$endgroup$
$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago
$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago
add a comment |
$begingroup$
So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?
Here is my code(Java)
public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){
//Set verbose
setVerbose(verbose);
//Create training set
TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);
//Loop through Epochs
for(int i = 0; i<epochs;i++){
//Print Progress
print("rTrained: " + i + "/" + epochs);
//Shuffle training set
trainingSet.shuffle();
//Create the mini batches
TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);
//Loop through mini batches
for(int j = 0; j<mini_batches.length;j++){
update_mini_batch(mini_batches[j]);
}
}
//Print Progress
print("rTrained: " + epochs + "/" + epochs);
print("nDone!");
}
private Pair backprop(double inputs, double target_outputs){
//Create Expected output column matrix
Matrix EO = Matrix.fromArray(new double{target_outputs});
//Forward Propagate inputs
feedForward(inputs);
//Get the Errors which is also the Bias Delta
Matrix Errors = calculateError(EO);
//Weight Delta Matrix
Matrix dCdW = new Matrix[Errors.length];
//Calculate the Deltas
//Calculating the first Layers Delta
dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);
//Rest of network
for (int i = 1; i < Errors.length; i++) {
dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);
}
return new Pair(dCdW,Errors);
}
private void update_mini_batch(TrainingSet.Data mini_batch){
//Get first deltas
Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);
//Loop through mini batch and sum the deltas
for(int i = 1; i< mini_batch.length;i++){
deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));
}
//Multiply deltas by the learning rate
//and divide by the mini batch size to get
//the mean of the deltas
deltas.multiply(learningRate/mini_batch.length);
//Update Weights and Biases
for(int i= 0; i<W.length;i++){
W[i].subtract(deltas.dCdW[i]);
B[i].subtract(deltas.dCdB[i]);
}
}
neural-network java mini-batch-gradient-descent
New contributor
$endgroup$
So I recently finished a mini batches algorithm for a library in building in java(artificial neural network lib). I then followed to train my network for an XOR problem in mini batches size of 2 or 3, for both I got worse accuracy to what I got from making it 1(which is basically just SGD). Now I understand that I need to train it on more epochs but I'm not noticing any speed up in runtime which from what I read should happen. Why is this?
Here is my code(Java)
public void SGD(double inputs,double expected_outputs,int mini_batch_size,int epochs, boolean verbose){
//Set verbose
setVerbose(verbose);
//Create training set
TrainingSet trainingSet = new TrainingSet(inputs,expected_outputs);
//Loop through Epochs
for(int i = 0; i<epochs;i++){
//Print Progress
print("rTrained: " + i + "/" + epochs);
//Shuffle training set
trainingSet.shuffle();
//Create the mini batches
TrainingSet.Data mini_batches = createMiniBatches(trainingSet,mini_batch_size);
//Loop through mini batches
for(int j = 0; j<mini_batches.length;j++){
update_mini_batch(mini_batches[j]);
}
}
//Print Progress
print("rTrained: " + epochs + "/" + epochs);
print("nDone!");
}
private Pair backprop(double inputs, double target_outputs){
//Create Expected output column matrix
Matrix EO = Matrix.fromArray(new double{target_outputs});
//Forward Propagate inputs
feedForward(inputs);
//Get the Errors which is also the Bias Delta
Matrix Errors = calculateError(EO);
//Weight Delta Matrix
Matrix dCdW = new Matrix[Errors.length];
//Calculate the Deltas
//Calculating the first Layers Delta
dCdW[0] = Matrix.dot(Matrix.transpose(I),Errors[0]);
//Rest of network
for (int i = 1; i < Errors.length; i++) {
dCdW[i] = Matrix.dot(Matrix.transpose(H[i - 1]), Errors[i]);
}
return new Pair(dCdW,Errors);
}
private void update_mini_batch(TrainingSet.Data mini_batch){
//Get first deltas
Pair deltas = backprop(mini_batch[0].input,mini_batch[0].output);
//Loop through mini batch and sum the deltas
for(int i = 1; i< mini_batch.length;i++){
deltas.add(backprop(mini_batch[i].input,mini_batch[i].output));
}
//Multiply deltas by the learning rate
//and divide by the mini batch size to get
//the mean of the deltas
deltas.multiply(learningRate/mini_batch.length);
//Update Weights and Biases
for(int i= 0; i<W.length;i++){
W[i].subtract(deltas.dCdW[i]);
B[i].subtract(deltas.dCdB[i]);
}
}
neural-network java mini-batch-gradient-descent
neural-network java mini-batch-gradient-descent
New contributor
New contributor
edited 2 days ago
Itay Bachar
New contributor
asked 2 days ago
Itay BacharItay Bachar
32
32
New contributor
New contributor
$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago
$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago
add a comment |
$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago
$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago
$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago
$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago
$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago
$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.
If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.
For your case, I recommend you two things
- Try different batch sizes.
- Make sure you shuffle your batches!!! that will certainly help you a bit.
$endgroup$
$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46798%2fsgd-vs-sgd-in-mini-batches%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.
If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.
For your case, I recommend you two things
- Try different batch sizes.
- Make sure you shuffle your batches!!! that will certainly help you a bit.
$endgroup$
$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday
add a comment |
$begingroup$
My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.
If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.
For your case, I recommend you two things
- Try different batch sizes.
- Make sure you shuffle your batches!!! that will certainly help you a bit.
$endgroup$
$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday
add a comment |
$begingroup$
My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.
If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.
For your case, I recommend you two things
- Try different batch sizes.
- Make sure you shuffle your batches!!! that will certainly help you a bit.
$endgroup$
My understanding is that mini-batches are not really for speeding up the calculations... but to actually allow large datasets to be calculated.
If you have 1,000,000 examples, it would be tricky for a computer to compute forward and backward passes, but passing batches of 5,000 elements would be more feasible.
For your case, I recommend you two things
- Try different batch sizes.
- Make sure you shuffle your batches!!! that will certainly help you a bit.
answered 2 days ago
Juan Antonio Gomez MorianoJuan Antonio Gomez Moriano
656213
656213
$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday
add a comment |
$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday
$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday
$begingroup$
Alright, I mean the XOR problem has only 4 possible inputs so that might be why batches may be slower in this case. I also have shuffling implemented already. Lastly, When i try to train on a large dataset like MNIST dataset, the network doesn't learn at all from my experience with this library. I posted another issue if you think you could help me. datascience.stackexchange.com/questions/46651/…
$endgroup$
– Itay Bachar
yesterday
add a comment |
Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.
Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.
Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.
Itay Bachar is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46798%2fsgd-vs-sgd-in-mini-batches%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
did you try it for different sizes of training sets? if training sets are too small you may not notice the difference
$endgroup$
– Javi
2 days ago
$begingroup$
Well the training set itself is 60k for the mnist dataset, and I did mini batches of 300
$endgroup$
– Itay Bachar
2 days ago