Keras is a Python library that provides a clean and convenient way to create a range of deep learning models on top of powerful libraries such as TensorFlow, Theano (update about Theano) or CNTK. Keras was developed and maintained by François Chollet, a Google engineer and it is released under the permissive MIT license.
Basic features of Keras (*)
I value his austerity and simplicity, without frills approach and maximizing readability. It makes it possible to express neural networks in a very modular way, considering a model as a sequence or a graph alone. A good approximation for beginners, because the components of a Keras model are discrete elements that can be combined in arbitrary ways. New components are intentionally easy to add and modify within the framework intended for engineers to trial and explore new ideas quickly. Last but not least, I think it’s great that everything is written in Python.
Define the model
The core data structure of Keras is a model, a way to organize layers. The keras.models.Sequential class is a wrapper for the neural network model:
from keras.models import Sequential model = Sequential()
Models in Keras are defined as a sequence of layers. There are fully connected layers, max pool layers, and activation layers, etc. You can add a layer to the model using the model’s add() function. For example, a simple model would look like this:
from keras.models import Sequential from keras.layers.core import Dense, Activation #Create the Sequential model model = Sequential() #1st Layer - Add an input layer of 32 nodes model.add(Dense, input_dim=32) #2nd Layer - Add a fully connected layer of 128 nodes model.add(Dense(units=128)) #3rd Layer - Add a softmax activation layer model.add(Activation('softmax')) #4th Layer - Add a fully connected layer model.add(Dense(10)) #5th Layer - Add a Sigmoid activation layer model.add(Activation('sigmoid'))
Keras will automatically infer the shape of all layers after the first layer. This means you only have to set the input dimensions for the first layer.
The first layer from above, model.add(Dense(input_dim=32)), sets the input dimension to 32, (meaning that the data coming in is 32-dimensional). The second layer takes in the output of the first layer and sets the output dimensions to 128. This means the second layer has 128 nodes. This chain of passing output to the next layer continues until the last layer, which is the output of the model. We can see that the output has dimension 10.
Once we have our model built and it looks good, configure its learning process with .compile(). Compiling the model uses the efficient numerical libraries of the backend used. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware.
When compiling, we must specify some additional properties required when training the network such as loss function to use to evaluate a set of weights, the optimizer used to search through different weights for the network or any optional metrics we would like to collect during training. For example:
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics = ['accuracy'])
In this example we specify the loss function to be categorical_crossentropy. We can also specify the optimizer, in this case we will use stocastic gradient descent (sgd). And finally, we can specify what metrics we want to evaluate the model with. Here we will use accuracy as a metric.
We can check our model architecture with the following command:
We have defined our model and compiled it ready for efficient computation. Now it is time to execute the model on some data. We can train or fit our model on our loaded data by calling the fit() function on the model.
The training process will run for a fixed number of iterations through the dataset called epochs. We will set the number of instances that are evaluated before a weight update in the network is performed with the batch_size argument. Finally we can use an optional parameter, verbose parameter, for how much information we want displayed in the standard output.
model.fit(x_train, y_train, epochs=1000, batch_size=32)
Evaluate the model
We have trained our neural network and now we can evaluate the performance of the network with the model.evaluate() function using the appropriate dataset:
loss_and_metrics = model.evaluate(x_test, y_test)
In order to generate predictions on new data you can use model.predict() function:
classes = model.predict(x_test, batch_size=128)
We have just seen how you can create your first neural network model in Keras. We are ready to get our hands dirty installing Keras and use it. Are you up to it?
Installation of Keras
Keras is a lightweight API and rather than providing an implementation of the required mathematical operations needed for deep learning it provides a consistent interface to efficient numerical libraries as TensorFlow, Theano or CNTK (called backends) . Keras is relatively straightforward to install if you already have a working Python and SciPy environment. You must also have an installation of TensorFlow (Theano or CNTK) on your system.
You can follow the following steps to install Keras in your laptop using TensorFlow as a backend:
1- In order to install Tensorflow in your laptop follow the installation instructions from www.tensorflow.org/install/. I recommend the virtualenv installation.
2- Validate your TensorFlow installation (assuming that your virtualenv directory is ~/tensorflow ):
$ source ~/tensorflow/bin/activate $ python import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello))
If the system outputs the following,
you are ready to install Keras.
3- Keras can be installed easily using PyPI, as follows:
$ source ~/tensorflow/bin/activate $ pip install keras
4- You can validate your installation checking your version of Keras on the command line using the following script:
$ python -c "import keras; print(keras.__version__)"
Running the above script you will see:
Using TensorFlow backend. 2.0.5
(At the time of writing, the most recent version of Keras is version 2.0.5.).
For more installation options you can see the official keras installation page at https://keras.io.
In case of use Tensorflow 1.1 or above as a backend, Keras is included in Tensorflow in the contrib packages. In this case you can access to Keras with the following command:
import tensorflow.contrib.keras as keras
Getting started with a Multi-Layer Perceptron
This example trains a simple deep neural network on the MNIST dataset. The MNIST dataset is composed by a set of black and white images containing hand-written digits, containing more than 60.000 examples for training a model, and 10.000 for testing it. The black and white images (bilevel) have been normalized into 20×20 pixel images, preserving the aspect ratio. After that, the images are centered in 28×28 (784) pixel frames by computing the mass center and moving it into the center of the frame.
The general idea for this example is that you’ll first load the data, then define the network, and then finally train the network.
import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout from keras.optimizers import adam, sgd batch_size = 128 num_classes = 10 epochs = 5 print('epochs:', epochs) # the data, shuffled and split between train and test sets (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.reshape(60000, 784) x_test = x_test.reshape(10000, 784) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 print(x_train.shape, 'train samples') print(x_test.shape, 'test samples') # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) model = Sequential() model.add(Dense(512, activation='relu', input_shape=(784,))) model.add(Dropout(0.2)) # model.add(Dense(512, activation='relu')) # model.add(Dropout(0.2)) model.add(Dense(10, activation='softmax')) model.summary() model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=0, validation_data=(x_test, y_test)) score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score) print('Test accuracy:', score)
Running the above program you will see:
$ python mnist_mlp.py Using TensorFlow backend. 60000 train samples 10000 test samples ______________________________________________________________ Layer (type) Output Shape Param # ============================================================== dense_1 (Dense) (None, 512) 262656 ______________________________________________________________ dropout_1 (Dropout) (None, 512) 0 ______________________________________________________________ dense_2 (Dense) (None, 512) 262656 ______________________________________________________________ dropout_2 (Dropout) (None, 512) 0 ______________________________________________________________ dense_3 (Dense) (None, 10) 5130 ============================================================== Total params: 407,050 Trainable params: 407,050 Non-trainable params: 0 ______________________________________________________________ Test loss: 0.317686294222 Test accuracy: 0.9121
This code is available on my GitHub .
If you want to improve the results you can, for instance, change the optimizer gradient descent by adam and obtain a test accuracy of 0.979. Or increase the number of epochs to 20, obtaining an accurary of 0.9842. You can even add two layers simply uncomment the second ReLu and Dropout layers and obtain an accuracy of 0.9849. However, in this case gains in accuracy are scarce and the number of model parameters increase greatly (407,050 to 669,706), and this involves increasing the running time of the learning process.
But the most important thing is to see how easily an engineer, can test their ideas in order to find the best model. Assuming however, that the engineer has sufficient computing resources to train their models!
In Keras you can use a set of functions called callbacks to get a view on internal states and statistics of the model during training. You can pass a list of callbacks (as the keyword argument
callbacks) to the
.fit() method of the
Modelclasses. The relevant methods of the callbacks will then be called at each stage of the training.
We will use callbacks in order to use TensorBoard. TensorBoard is a suite of visualization tools that allows to visualize your TensorFlow/Keras graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it. In our Keras lab we will use it to visualise information about our Keras network with the following code:
callbacks =  if tensorboard_active: callbacks.append(keras.callbacks.TensorBoard( log_dir=tensorboard_dir, histogram_freq=1, write_graph=True, write_images=True)) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(lr=learning_rate), metrics=['accuracy']) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test), callbacks=callbacks)
This code from our Keras’s lab contains the variables
tensorboard_active that allow the TensorBoard execution using the Keras callbacks. If you put
tensorboard_active to True , Keras will start to save TensorBoard data to
tensorboard_dir every epoch. This will allow you to visualize dynamic graphs of your training and test metrics, as well as activation histograms for the different layers in your model. You can use the following parameters in the Callback functions:
- log_dir: the path of the directory where to save the log files to be parsed by TensorBoard.
- histogram_freq: frequency (in epochs) at which to compute activation and weight histograms for the layers of the model. If set to 0, histograms won’t be computed. Validation data (or split) must be specified for histogram visualizations.
- write_graph: whether to visualize the graph in TensorBoard. The log file can become quite large when write_graph is set to True.
- write_grads: whether to visualize gradient histograms in TensorBoard.
histogram_freqmust be greater than 0.
- batch_size: size of batch of inputs to feed to the network for histograms computation.
- write_images: whether to write model weights to visualize as image in TensorBoard.
- embeddings_freq: frequency (in epochs) at which selected embedding layers will be saved.
- embeddings_layer_names: a list of names of layers to keep eye on. If None or empty list all the embedding layer will be watched.
- embeddings_metadata: a dictionary which maps layer name to a file name in which metadata for this embedding layer is saved. See the details about metadata files format. In case if the same metadata file is used for all embedding layers, string can be passed.
If you have installed TensorFlow, you should be able to launch TensorBoard from the command line:
After that you can go to
http://localhost:6006 through your browser and TensorBoard will start. We recommend Google Chrome in order to avoid compatibility and lag problems.
Pretty simple, right?
For a more in-depth tutorial about Keras, you can check out:
In the examples folder of the Kera’s Github repository , you will find a zoo of advanced models: question-answering with memory networks, text generation with stacked LSTMs, etc.
And there you go, you’ve trained your first neural network using Keras to analize MNIST dataset, however you still need to learn many techniques to improve the training process. I will do it in the future through this blog.
(*) In this post I’m assuming that the reader has some background in neural network concepts such as activation function or gradient descent. Otherwise I recommend to start reading the book “Firts contact with TensorFlow”