Machine learning applications widespread every day in many domains. One of today’s most powerful techniques is the neural network. This technique is employed in many applications such as image recognition, speech analysis and translation, self-driving cars, etc… In fact such learning algorithms have been known for decades. But only recently it has become mainstream supported by […]

# Monthly archives: April 2016

## k-means clustering

k-means is a clustering algorithm which divides space into k different clusters. Each cluster is represented by its centre of mass (i.e. barycentre) and data points are assigned to the cluster with the nearest barycentre. Algorithm The learning algorithm starts by choosing k random points. Each of these is the centre of mass of a […]

## Confusion matrix

When you train several models over a dataset you need a way to compare the model performances and choose the one that best suites your needs. As we will see there are different ways to compare the results and then pick the best one. Let’s start with what scores we can get out of the training […]

## k-Nearest Neighbours

The k-Nearest Neighbours is based on a simple idea: similar points tend to have similar outcomes. Therefore the idea is to memorise all the points in the dataset. The prediction for a new entry is made by finding the closest point in the dataset. Then the prediction for the new entry is simply the same outcome as the […]

## Frequentists vs Bayesians

## How to split a dataset

In machine learning it is pretty obvious to me that you need to split your dataset into 2 parts: a training set that you can use to train your model and find optimal parameters a test set that you can use to test your trained model and see how well it generalises. It is important […]

## Weight decay regularisation

Most machine learning techniques follow a similar strategy: Get the best possible model on the training dataset Generalise by testing the model on the test dataset The test dataset consists of data that are never used during training and it allows to test how the algorithm will perform over “not seen before” data.

## Stochastic gradient descent

With gradient descent we try to optimise a function that runs over the entire dataset. represents the “cost” over the entire dataset. When working with big datasets this yield to complex function optimisation and slow computation time. This is also a problem when dealing with streaming data as we need to wait for the stream to end […]

## Gradient descent

If you want to predict something from your data, you need to put a strategy in place. I mean you need a way to measure how good your predictions are … and then try to make the best ones. This is usually done by taking some data for which you already know the outcome and […]

## PCA: Principal Component Analysis

PCA stands for Principal Component Analysis. It is a mathematical concept which I am not going to explain in great details here as there are already plenty of books on the subject. Rather I would like to give a practical feeling of what it does and when to use it. The idea behind PCA is that […]