Deep Learning Frameworks

According to Wikipedia “Deep learning (also known as deep structured learning, hierarchical learning or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data”. Deep learning is impacting everything ranging from health care to transportation to manufacturing, and more. Companies are turning to deep learning solutions to solve hard problems, like speech recognition, object recognition, medical imagery and machine translation.

Deep Learning is made accessible using libraries which are open source projects. There is a plethora of frameworks available for Deep Learning. Let’s start comparing the most popular deep learning frameworks (Python based) out there right now to see what works best. The deep-lying space is exploding with frameworks right now it’s like every single week some major tech company decides to open source their own deep learning library and that’s not including the dozens of deep learning frameworks being released every single week on GitHub by developers.

Let’s start off with Scikit-learn. It was made to provide an easy-to-use interface for developers to use off-the-shelf general-purpose machine learning algorithms for both supervised and unsupervised learning cycle provides functions that like you apply classic machine learning algorithms like support vector machines (SVM), logistic regressions and K-nearest neighbour very easily but the one type of machine learning algorithm doesn’t let you implement is a neural network. It doesn’t provide GPU support either which is what helps neural network scale.

Every single general-purpose algorithm that Scikit-learn learned implemented has since been implemented in Tensorflow. There’s also Caffe which was basically the first mainstream production grade deep learning library started in 2013 the Caffe isn’t very flexible think of a neural network as a computational graph. In Caffe, each note is considered a layer so if you want new layer types you define the full forward backward and gradient updates these layers are building blocks that are unnecessarily big there’s an endless list of them that you can pick from.

In Tensorflow each note is considered a tensor operation like matrix a door matrix multiply or convolution and a layer can be defined as a composition of those operations so Tensorflow’s building blocks are smaller which allows for more modularity. Caffe also requires a lot of unnecessary verbosities if you want to support both the CPU and the GPU you need to implement extra functions for each and you have to define your model using a plain text editor that is just ghetto. The model should be defined programmatically because it’s better for modularity between different components also Caffe’s main architect now works on the Tensorflow team we’re all out of Caffe.

Speaking of modularity let’s talk about Keras. Keras has been the go-to source to get started with Deep Learning for a while because it provides a very high-level API (Application Programming Interface) to build deep learning models Keras sits on top of the other deep learning libraries like Theano and Tensorflow. It uses an object-oriented design so everything is considered an object be that layers, models, optimizers and all the parameters of the model can be accessed object properties which will give you the output tensor for the layer in the model and a list of symbolic weight tensors. This is a cleaner interface as opposed to the functional approach of making layers function that creates weights when being called. It has great documentation it’s all Gucci. But because it’s so general-purpose relaxes on the side of performance. Keras has been known to have performance issues when used with a Tensorflow back-end in since it’s not really optimized for it but it does work pretty well with the Theano back-end.

The frameworks that are neck-and-neck right now in the race to be the best library for both research and Industry are Tensorflow and Theano. Theano currently outperforms Tensorflow on a single GPU. Tensorflow outperforms Theano for parallel execution across multiple GPU’s. Keras has got more documentation because it’s been around for a while and it’s got native windows support which Tensorflow doesn’t yet. Tensorflow just gives you access to a bunch of optimizers right out-of-the-box things like gradient descent or Adam. Theano makes you do this from scratch then we have our training function which is again more verbose. Tensorflow seems to give more fine-grained control but at the cost of readability finally we’ll get to the actual training part itself they look pretty identical but Tensorflow’s methodology of encapsulating the computational graph feels conceptually cleaner than Theano’s.

Comparison of Tensorflow and Theano

Tensorflow_ex.py

import tensorflow as tf
import numpy as np

# Make 100 phony data points in NumPy.
x_data = np.float32(np.random.rand(2, 100)) # Random input
y_data = np.dot([0.100, 0.200], x_data) + 0.300

# Construct a linear model.
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b

# Minimize the squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# For initializing the variables.
init = tf.initialize_all_variables()

# Launch the graph
sess = tf.Session()
sess.run(init)

# Fit the plane.
for step in xrange(0, 201):
    sess.run(train)
    if step % 20 == 0:
        print step, sess.run(W), sess.run(b)

# Learns best fit is W: [[0.100  0.200]], b: [0.300]

Theano

import theano
import theano.tensor as T
import numpy

# Again, make 100 points in numpy
x_data = numpy.float32(numpy.random.rand(2, 100))
y_data = numpy.dot([0.100, 0.200], x_data) + 0.3

# Intialise the Theano model
X = T.matrix()
Y = T.vector()
b = theano.shared(numpy.random.uniform(-1, 1), name="b")
W = theano.shared(numpy.random.uniform(-1.0, 1.0, (1, 2)), name="W")
y = W.dot(X) + b 

# Compute the gradients WRT the mean-squared-error for each parameter
cost = T.mean(T.sqr(y - Y))
gradientW = T.grad(cost=cost, wrt=W)
gradientB = T.grad(cost=cost, wrt=b)
updates = [[W, W - gradientW * 0.5], [b, b - gradientB * 0.5]] 

train = theano.function(inputs=[X, Y], outputs=cost, updates=updates, allow_input_downcast=True) 

for i in xrange(0, 201):
    train(x_data, y_data)
    print W.get_value(), b.get_value()

Just take a look at some code to see some differences we’re going to compare two scripts in Tensorflow and beyond they both do the same thing initializing data and then learn the line of best fit for that data is it can predict future data points let’s look at the first step in both Tensorflow and Theano for generating the data pretty much the same way using numpy arrays so there’s not really a difference there let’s look at the model initialization parts this is the basic “y=mx+b” formula in Tensorflow it doesn’t require any special treatment of the x and y variables they’re just they’re natively but Theano we have to specifically say that the variables are symbolic inputs to the function the Tensorflow syntax for defining the B&W variables is cleaner than we implement our gradient descent function which is what helps us learn or trying to minimize the mean squared error over time which is what makes our model more accurate as we train the syntax for defining what we’re minimizing is pretty similar then we look at the actual optimizer which helps us do that will notice a difference in syntax again the flow just gives you access to a bunch of optimizers right out-of-the-box things like gradient descent or Adam .Theano makes you do this from scratch then we have our training function which is again more verbose see the trend here Theano so far is making us implement more code than Tensorflow so it seems to give us more fine-grained control but at the cost of readability finally we’ll get to the actual training part itself they look pretty identical but Tensorflow methodology of encapsulating the computational graph feels conceptually cleaner than Theano.

Tensorflow is just growing too fast that it seems so inevitable that whatever so cool feature it is lacking right now because of how new it is it will gain very rapidly. Another testament to this is the amount of activity happening in the Tensorflow repo over the Theano repo on GitHub right now. Keras may serve as an easy use wrapper around different libraries it’s not optimized for Tensorflow a better alternative you want to learn and get started easily with deep learning is TF-learn which is basically Keras but optimized for Tensorflow so to sum things up the best library to use for research is Tensorflow the world-class researchers at both Elon Musk’s OpenAI and Google’s Deepmind are now using it for production best library to use is still Tensorflow is it scaled better across multiple GPUs and its closest competitor Theano. Lastly for learning the best library to use is TF-learn which is a high-level wrapper around Tensorflow that lets you get started really easily.

Akashdeep Singh Jassal

Deep Learning Frameworks

Published by Akashdeep Jassal

Leave a comment Cancel reply

Share this:

Published by Akashdeep Jassal

Leave a comment Cancel reply