Given a dataset, we usually split it into training, validation, and test sets. We feed the training set to the model (neural network in this post) to optimize the model so that we can apply the current model to the validation set and keep optimizing the model until the performance on validation has met some criterion (i.e. good enough). After the optimizing/training is done, we apply the model the the test set to show the real ability to the unseen data (because the test set is never used when we optimize the model).
However, in neural network training, it confuses me. In a feed-forward neural network using back propagation to optimize the network, we feed the training set and compute the loss function and get the gradient of the parameters and the accuracy on training set; we run many epochs (iterations) until the stop criterion on training set is met. Here is my question: what about validation?
And these are my summary with a figure
1. The performance on training set will increase (until it converges) as we run more epochs
2. We stop optimizing when the accuracy on validation set begins to decrease (over-fitting)
3. We could also monitor the performance on training set for early stopping, which gives up training (because we think further training does not worth the computational cost)
4. In short, accuracy on training set is to stop the unnecessary computation; accuracy on validation is to avoid overfitting. Both of them are used to stop training the model, but with different purposes.
A good thread about the training and validation sets.
http://stackoverflow.com/questions/2976452/whats-is-the-difference-between-train-validation-and-test-set-in-neural-networ