Tensorflow introduction

Following my previous post on neural network I thought it would be nice to see how to implement these concepts with tensorflow.

Tensor flow is a new library developed by google. It is aimed at building fast and efficient machine learning pipelines.

Actually it is based on the computation graph that we discussed earlier.

It provides a C++ and Python interface and can run on CPU or GPU (linux only).

Enough talking let’s get started. If you have docker installed on your machine (if not I highly recommend doing so) you can just spawn up a new container with tensor flow already installed.

Start the docker container with:

docker run -i -t gcr.io/tensorflow/tensorflow /usr/bin/python

This will connect you to the docker container and open a Python shell.

Next we need to import tensor flow and numpy

import tensoflow as tf
import numpy as np

In tensorflow all variables are represented as tensors. A tensor is just a n-dimensions array. So it can be used to represents anything from a scalar (0-dimension),  a vector (1-dimension), a matrix (2-dimensions) and more.

Moreover each variable is also assigned a type (e.g. Float, Double, Int32, …)

Let’s start with something easy: declaring 2 scalar variables:

a = tf.constant(2)
b = tf.constant(3)

and then defining a multiplication operation that multiplies these 2 variables:

c = a * b

At this point c is not equal to 6 because it didn’t compute the result of the multiplication. c represents the multiplication operation of variable a by variable b.

To get the results we need to evaluate this operation. In tensor flow this is done inside a session:

with tf.Session() as sess:
   result = sess.run(c)

Here we have used constant to declare a and b because their value didn’t change.

We could have done the same with a variable. The difference is that the value of a variable might change over time. Before being used in a computation it must be initialised.

with tf.Session() as sess:
   a = tf.constant(4)
   x = tf.Variable(5)
   c = a * b

Ok but using a variable it more complicated than using a constant and it doesn’t do much in this example.

So let’s see how we can update the variable (change its state) over time

# create our variable x with initial value 1
x = tf.Variable(1)
two = tf.constant(2)
# multiply x by 2
doubled_val = tf.mul(two, x)
# assign the result to x
double_op = tf.assign(x, doubled_val)

# we are now ready to run the double operation
# but we need to initialise our variable first
init_op = tf.initialize_all_variables()

with tf.Session() as sess:
   # double x, 5 times
   for i in range(5):

Much better. Things start to look a bit more interesting. I think we are now ready to implement our first neuron. But before we get starting there is one more thing we need to know: Placeholders. A Placeholder is like a constant but its value will be provided at runtime when we call sess.run().

# input x
x = tf.placeholder(tf.int32, shape=[4, 2])
# bias b
b = tf.constant(-1)
# weight w
w = tf.constant([[1], [1]])
u = tf.matmul(x, w) + b
y = tf.nn.relu(u)

# defines our set of inputs
inputs = [[0, 0], [0, 1], [1, 0], [1, 1]]

init_op = tf.initialize_all_variables()

with tf.Session() as sess:
   output = sess.run(y, feed_dict={x: inputs})

Congrats! You’ve  just implemented a neuron the performs the AND function.

So now let’s see how we can turn this neuron into a whole network.

Luckily TensorFlow provides us with all the machinery. We just need to define the computation graph for our cost function and the network topology. Then tensor flow will help us to train our model and update its weights with back propagation.

So now let’s try to implement a real neural network for the XOR function.

Our network contains one hidden layer with 2 neurons

Neural network to model the XOR function

We can model it in tensor flow

import tensorflow as tf
import numpy as np

# our input data
# None means we don't know the number of rows yet
x = tf.placeholder(tf.float32, shape=[None, 2]) 

# the hidden layer
wh = tf.Variable(tf.random_normal([2, 2]))
bh = tf.Variable(tf.random_normal([2]))
h = tf.nn.relu(tf.matmul(x, wh) + bh)

# the output layer
wo = tf.Variable(tf.random_normal([2, 1]))
bo = tf.Variable(tf.random_normal([1]))
# No activation function for the output layer
y = tf.matmul(h, wo) + bo

# The expected output values
y_ = tf.placeholder(tf.float32, shape=[None, 1]) 

# We need a cost function to measure the performance of our network
# Here we use a simple mean square
cost = tf.reduce_mean(tf.square(y_ - y))

# Now we're ready to train our network
train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

# Initialise everything
init = tf.initialize_all_variables()

# And start the session
sess = tf.Session()

# Our input data
input = [[0, 0], [0, 1], [1, 0], [1, 1]]
output = [[0], [1], [1], [0]]

# Train our model
for i in range(1000):
  sess.run(train, feed_dict={x: input, y_: output})

# Check that it works as expected
sess.run(tf.round(y), feed_dict={x: input})

# Print the network parameters (weights and biases)
print "hidden layer", sess.run(wh), sess.run(bh)
print "output layer", sess.run(wo), sess.run(bo)

Congratulations for your first neural network!

As a wrap-up here are some tricky things to pay attention to:

  • Carefully check the dimension of the variables in tensorflow. I’ve run into these issues a number of times
  • initialise the network parameters randomly (it didn’t work with zeroes)
  • Choose the gradient optimisation step carefully (too big values made the optimisation diverged)

That’s pretty cool to train a network however this is also way trickier than I expected. Especially getting the initialisation and optimisation step correctly can be challenging as the results were far away from my expectations although the implementation was correct.

This is a very basic network, I highly encourage you to head over tensorflow website and follow the tutorials (The MNIST tutorial is very detailed and is a nice introduction to neural network).