Following my previous post on ND4J I think it’s time for a proper introduction to deepLearning4J (a.k.a. DL4J).
In this post I’ll try to do something similar to the tensorflow introduction. That is installing DL4J, get our hands on it and then build a very basic neural network.
DL4J aims at being the neural network reference implementation on the JVM. That is true that there is a huge gap on the JVM regarding machine learning especially compared to other languages like Python.
DL4J follows the C++ backend approach where all the optimised code is written in C++ for performance reason and provides a java layer on top of it. (much like other framework in the Python world: then, tensorflow, …)
Let’s dive in and install DL4J. The installation is pretty simple you just need to add the required dependencies to your project.
If using sbt that is
libraryDependencies += "org.nd4j" % "nd4j-native-platform" % "0.6.0" libraryDependencies += "org.deeplearning4j" % "deeplearning4j-core" % "0.6.0"
That will automatically download all the dependencies including ND4J and JavaCPP and datavec (a library for loading and vectorising data).
All good, it seems we’re all setup to write some code.
DL4J provides a rather high-level API to model a neural network. No need to fiddle around with variables, let’s jump directly to my favourite toy exemple: Modelling an XOR function.
We’re going to implement the exact same XOR function that we did in the tensor flow introduction post: 2 inputs, 1 hidden layer with 2 units and 1 output.
We’ll need a very small dataset that contains all the possible inputs and the associated outputs. We create the dataset with ND4J:
import org.nd4j.linalg.dataset.DataSet import org.nd4j.linalg.factory.Nd4j val inputs = Nd4j.create( Array( 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0 ), Array(4, 2) // 4 x 2 matrix ) val outputs = Nd4j.create( Array( 0.0, 1.0, 1.0, 0.0 ), Array(4, 1) // 4 x 1 ) // the train dataset contains the inputs // and associated outputs val trainData = new DataSet(inputs, outputs)
Now that the data are ready we need to create our neural network. Nothing to difficult here, as you will see ND4J relies heavily on the builder pattern to provide a DSL-like syntax.
import org.deeplearning4j.nn.api.OptimizationAlgorithm import org.deeplearning4j.nn.conf.layers.{DenseLayer, OutputLayer} import org.deeplearning4j.nn.conf.NeuralNetConfiguration import org.deeplearning4j.nn.weights.WeightInit import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction val nnConf = new NeuralNetConfiguration.Builder() .optimizationAlgo(OptimizationAlgorithm.LINE_GRADIENT_DESCENT) .seed(1234567) .iterations(1) .learningRate(0.01) .useDropConnect(false) .miniBatch(false) .biasInit(0) .weightInit(WeightInit.ZERO) .list // required in order to declare the layers .layer(0, new DenseLayer.Builder() .nIn(2) .nOut(2) .activation("relu") .build() ) .layer(1, new OutputLayer.Builder(LossFunction.MSE) .nIn(2) .nOut(1) .activation("identity") .build() ) .pretrain(false) .backprop(true) .build()
A couple of things to note here:
- we are using linear gradient descent
- our network is too small for drop connect and mini-batch so we disable them
- all the weights and bias are initialised to 0
- there is no input layer. It’s the size of the inputs of the first hidden layer that indicates the number of inputs in the system
- The hidden layer uses the ReLU activation function whereas the output layer has no activation function
- Finally we indicate that our model is not pretrained and requires back propagation
It took me some time to figure out how the nIn and nOut work. A picture is probably worth a thousand of words so here is what we’ve built.
So far we have created a configuration (i.e. a description) of the neural network we’re going to use. So let’s create the neural network and train it a couples of times over our dataset.
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork val nn = new MultiLayerNetwork(nnConf) nn.init for (i <- 1 to 1000) { nn.fit(trainData) }
Good we know have a trained network that we can use to model the XOR function. So let’s check how well we did. DL4J provides us with an Evaluation object that will compute the performance of the model for us
import org.deeplearning4j.eval.Evaluation val evaluation = new Evaluation(1) // 1 output value for (i <- 0 to 3) { val output = outputs.getScalar(i) val input = inputs.getRow(i) evaluation.eval(output, nn.output(input)) } println(evaluation.stats)
If everything went fine you should get a perfect score here:
==========================Scores======================================== Accuracy: 1 Precision: 1 Recall: 1 F1 Score: 1 ========================================================================
And eventually let’s use our model to make some prediction:
for (i <- 0 to 3) { val data = inputs.getRow(i) val prediction = nn.output(data) println(s"${data.getInt(0)} xor ${data.getInt(1)} -> $prediction") }
You should see an output similar to this:
0 xor 0 -> 0.11 0 xor 1 -> 0.98 1 xor 0 -> 0.93 1 xor 1 -> 0.03
If you round the outputs you see that it works perfectly.
As with tensorflow the tricky part was to choose proper initialisation parameter values. The API is quite high-level and much closer to Keras than tensorflow.
Regarding the performance I haven’t paid much attention to it and run only on the CPU backend. However I found the training time (a couple of minutes) quite long for a network this small.
Anyway DL4J is a really nice addition to the JVM and it should hopefully make machine learning more accessible for millions of Java/JVM developers.