In which epoch should i stop the training to avoid overfitting

I'm working on an age estimation project trying to classify a given face in a predefined age range. For that purpose I'm training a deep NN using the keras library. The accuracy for the training and the validation sets is shown in the graph below:

As you can see the validation accuracy keeps rising with smaller steps than the training accuracy. Should i stop training at the epoch 280 in which the training and the validation accuracy have the same value or should i proceed the training process as long as the validation accuracy is rising, even thought the training accuracy value is also getting at overfitted values (eg. 93%).

edited Jun 5 '18 at 10:44

David Masip

2,6211529

asked May 29 '18 at 9:33

Yiannis Ath

3816

add a comment |

edited Jun 5 '18 at 10:44

David Masip

2,6211529

asked May 29 '18 at 9:33

Yiannis Ath

3816

add a comment |

edited Jun 5 '18 at 10:44

David Masip

2,6211529

asked May 29 '18 at 9:33

Yiannis Ath

3816

machine-learning neural-network deep-learning accuracy overfitting

edited Jun 5 '18 at 10:44

David Masip

2,6211529

asked May 29 '18 at 9:33

Yiannis Ath

3816

edited Jun 5 '18 at 10:44

David Masip

2,6211529

asked May 29 '18 at 9:33

Yiannis Ath

3816

edited Jun 5 '18 at 10:44

David Masip

2,6211529

edited Jun 5 '18 at 10:44

David Masip

2,6211529

edited Jun 5 '18 at 10:44

David Masip

2,6211529

asked May 29 '18 at 9:33

Yiannis Ath

3816

asked May 29 '18 at 9:33

Yiannis Ath

3816

asked May 29 '18 at 9:33

Yiannis Ath

3816

add a comment |

5 Answers
5

active

oldest

votes

As long as your validation accuracy increases, you should keep training. I would stop when the test accuracy starts decreasing (this is known as early stopping). The general advise is always to keep the model that performs the best in your validation set.

Although it is right that your model overfits a little since epoch 280, it is not necessarily a bad thing provided that your validation accuracy is high. In general, most machine learning models will have higher training accuracy compared to validation accuracy, but this doesn't have to be bad.

In addition to the accuracy, Another thing to take into account is the loss in the test set. This tutorial explains the overfitting and underfitting very well https://www.tensorflow.org/tutorials/keras/overfit_and_underfit

edited 14 hours ago

jperezmartin

answered May 29 '18 at 12:23

David Masip

2,6211529

add a comment |

"Early Stopping" is the concept which needs to be used here.
As mentioned in wikipedia about early stopping,

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner's performance on data outside of the training set. Past that point, however, improving the learner's fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.

At epoch > 280 in your graph, validation accuracy becomes lesser than training accuracy and hence it becomes a case of overfitting. In order to avoid overfitting here, training further is not recommended. However you may choose to train the model beyond the epoch where training and validation accuracy matches if the resulting validation accuracy is sufficient for the particular problem you are working on.

answered May 30 '18 at 6:37

Divyanshu Shekhar

184113

add a comment |

Keep training until your validation accuracy saturates (or starts dropping). Since the accuracy increases slowly, try to increase your learning rate parameter etato force the network to converge faster to the optimum weights. Be aware though, if you increase it too much though, it will become unstable.

answered May 29 '18 at 12:28

pcko1

1,681418

add a comment |

You should also look for training error Vs testing error than training accuracy and testing accuracy.

answered Oct 6 '18 at 11:01

Ashok Kumar Jayaraman

1212

1

$begingroup$
isn't it the same?
$endgroup$
– Francesco Pegoraro
Oct 6 '18 at 11:24

add a comment |

If you're using keras or tensorflow.keras, this parameter is known as patience in the EarlyStopping callback.

It equals the number of epochs with no validation accuracy improvement to trigger the end of the training phase. I usually set it to 2 or 3, 1 is usually too sensitive to noise.

answered 8 hours ago

Learning is a mess

234211

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32306%2fin-which-epoch-should-i-stop-the-training-to-avoid-overfitting%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

edited 14 hours ago

jperezmartin

answered May 29 '18 at 12:23

David Masip

2,6211529

add a comment |

edited 14 hours ago

jperezmartin

answered May 29 '18 at 12:23

David Masip

2,6211529

add a comment |

edited 14 hours ago

jperezmartin

answered May 29 '18 at 12:23

David Masip

2,6211529

edited 14 hours ago

jperezmartin

answered May 29 '18 at 12:23

David Masip

2,6211529

edited 14 hours ago

jperezmartin

edited 14 hours ago

jperezmartin

edited 14 hours ago

jperezmartin

answered May 29 '18 at 12:23

David Masip

2,6211529

answered May 29 '18 at 12:23

David Masip

2,6211529

answered May 29 '18 at 12:23

David Masip

2,6211529

add a comment |

"Early Stopping" is the concept which needs to be used here.
As mentioned in wikipedia about early stopping,

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner's performance on data outside of the training set. Past that point, however, improving the learner's fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.

answered May 30 '18 at 6:37

Divyanshu Shekhar

184113

add a comment |

"Early Stopping" is the concept which needs to be used here.
As mentioned in wikipedia about early stopping,

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner's performance on data outside of the training set. Past that point, however, improving the learner's fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.

answered May 30 '18 at 6:37

Divyanshu Shekhar

184113

add a comment |

"Early Stopping" is the concept which needs to be used here.
As mentioned in wikipedia about early stopping,

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner's performance on data outside of the training set. Past that point, however, improving the learner's fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.

answered May 30 '18 at 6:37

Divyanshu Shekhar

184113

"Early Stopping" is the concept which needs to be used here.
As mentioned in wikipedia about early stopping,

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner's performance on data outside of the training set. Past that point, however, improving the learner's fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.

answered May 30 '18 at 6:37

Divyanshu Shekhar

184113

answered May 30 '18 at 6:37

Divyanshu Shekhar

184113

answered May 30 '18 at 6:37

Divyanshu Shekhar

184113

answered May 30 '18 at 6:37

Divyanshu Shekhar

184113

add a comment |

answered May 29 '18 at 12:28

pcko1

1,681418

add a comment |

answered May 29 '18 at 12:28

pcko1

1,681418

add a comment |

answered May 29 '18 at 12:28

pcko1

1,681418

answered May 29 '18 at 12:28

pcko1

1,681418

answered May 29 '18 at 12:28

pcko1

1,681418

answered May 29 '18 at 12:28

pcko1

1,681418

answered May 29 '18 at 12:28

pcko1

1,681418

add a comment |

You should also look for training error Vs testing error than training accuracy and testing accuracy.

answered Oct 6 '18 at 11:01

Ashok Kumar Jayaraman

1212

1

$begingroup$
isn't it the same?
$endgroup$
– Francesco Pegoraro
Oct 6 '18 at 11:24

add a comment |

You should also look for training error Vs testing error than training accuracy and testing accuracy.

answered Oct 6 '18 at 11:01

Ashok Kumar Jayaraman

1212

1

$begingroup$
isn't it the same?
$endgroup$
– Francesco Pegoraro
Oct 6 '18 at 11:24

add a comment |

You should also look for training error Vs testing error than training accuracy and testing accuracy.

answered Oct 6 '18 at 11:01

Ashok Kumar Jayaraman

1212

You should also look for training error Vs testing error than training accuracy and testing accuracy.

answered Oct 6 '18 at 11:01

Ashok Kumar Jayaraman

1212

answered Oct 6 '18 at 11:01

Ashok Kumar Jayaraman

1212

answered Oct 6 '18 at 11:01

Ashok Kumar Jayaraman

1212

answered Oct 6 '18 at 11:01

Ashok Kumar Jayaraman

1212

1

$begingroup$
isn't it the same?
$endgroup$
– Francesco Pegoraro
Oct 6 '18 at 11:24

add a comment |

1

$begingroup$
isn't it the same?
$endgroup$
– Francesco Pegoraro
Oct 6 '18 at 11:24

isn't it the same?

– Francesco Pegoraro
Oct 6 '18 at 11:24

add a comment |

If you're using keras or tensorflow.keras, this parameter is known as patience in the EarlyStopping callback.

It equals the number of epochs with no validation accuracy improvement to trigger the end of the training phase. I usually set it to 2 or 3, 1 is usually too sensitive to noise.

answered 8 hours ago

Learning is a mess

234211

add a comment |

If you're using keras or tensorflow.keras, this parameter is known as patience in the EarlyStopping callback.

It equals the number of epochs with no validation accuracy improvement to trigger the end of the training phase. I usually set it to 2 or 3, 1 is usually too sensitive to noise.

answered 8 hours ago

Learning is a mess

234211

add a comment |

If you're using keras or tensorflow.keras, this parameter is known as patience in the EarlyStopping callback.

It equals the number of epochs with no validation accuracy improvement to trigger the end of the training phase. I usually set it to 2 or 3, 1 is usually too sensitive to noise.

answered 8 hours ago

Learning is a mess

234211

If you're using keras or tensorflow.keras, this parameter is known as patience in the EarlyStopping callback.

It equals the number of epochs with no validation accuracy improvement to trigger the end of the training phase. I usually set it to 2 or 3, 1 is usually too sensitive to noise.

answered 8 hours ago

Learning is a mess

234211

answered 8 hours ago

Learning is a mess

234211

answered 8 hours ago

Learning is a mess

234211

answered 8 hours ago

Learning is a mess

234211

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk