In machine learning it is pretty obvious to me that you need to split your dataset into 2 parts:
- a training set that you can use to train your model and find optimal parameters
- a test set that you can use to test your trained model and see how well it generalises.
It is important that the test data is never used during the training phase. Using “unseen” data is what allows us to test how well our model generalises. It makes sure your model doesn’t overfit.
Continue reading “How to split a dataset”