# k-Nearest Neighbours

The k-Nearest Neighbours is based on a simple idea: similar points tend to have similar outcomes.

Therefore the idea is to memorise all the points in the dataset. The prediction for a new entry is made by finding the closest point in the dataset. Then the prediction for the new entry is simply the same outcome as the value associated to its closest point.

If 2 points are close enough so should be their outcomes.

The name k-NN comes from the fact that you can look for the k closest points and compute (e.g. average) the outcome of the new point from the outcomes of the k-nearest points.

This algorithm is quite simple however as the dataset grows the computation power increases and becomes particularly difficult to handle for large dataset.

The naive and brute force approach is to compute the distance between the new entry and every points in the dataset and keep only the k closest points.

Another approach is to use the points in the dataset to split the space into different regions. Then for each new entry we need to determine in which region it belongs to and from there we can find the k nearest points.

In this approach the points of the dataset are stored in a kD-tree (k-dimensional tree) and each node in the tree splits the space in 2.