Pages

Wednesday, February 10, 2016

[ML] Misleading about training set and validation set on neural network training

This post aims to help myself clarify the roles of training set and validation when optimizing/training a neural network.

Given a dataset, we usually split it into training, validation, and test sets. We feed the training set to the model (neural network in this post) to optimize the model so that we can apply the current model to the validation set and keep optimizing the model until the performance on validation has met some criterion (i.e. good enough). After the optimizing/training is done, we apply the model the the test set to show the real ability to the unseen data (because the test set is never used when we optimize the model).

Moreover, when the performance on validation set begins to decrease, indicating overfitting, which means our optimized model only work well on the training data, and probably not the others. We should stop further training to avoid overfitting.

However, in neural network training, it confuses me. In a feed-forward neural network using back propagation to optimize the network, we feed the training set and compute the loss function and get the gradient of the parameters and the accuracy on training set; we run many epochs (iterations) until the stop criterion on training set is met. Here is my question: what about validation?

And these are my summary with a figure
















1. The performance on training set will increase (until it converges) as we run more epochs
2. We stop optimizing when the accuracy on validation set begins to decrease (over-fitting)
3. We could also monitor the performance on training set for early stopping, which gives up training (because we think further training does not worth the computational cost)
4. In short, accuracy on training set is to stop the unnecessary computation; accuracy on validation is to avoid overfitting. Both of them are used to stop training the model, but with different purposes.

A good thread about the training and validation sets.
http://stackoverflow.com/questions/2976452/whats-is-the-difference-between-train-validation-and-test-set-in-neural-networ