Skip to Content
Technical Articles

Using Machine Learning Foundations Bring Your Own Model from Scratch I

This is the first part of a two parts tutorial. In this part we create a TensorFlow model and save the trained model. In the second part we will take our saved model upload it to Machine Learning Foundation and use it. Our model will be a three-layer convolutional neuronal network classifying cifar-10 images.

Prerequisites:

You have installed TensorFlow, best the GPU version, at least version 1.11.0. If you have installed both the CPU as well as GPU version, it can happen that Python loads the CPU version and the network is not executed on GPU. In that case uninstall the CPU version.

This requires that you also have installed one of the following Python versions: 3.4, 3.5 or 3.6. At the time this tutorial was written TensorFlow does not support Python 3.7.

You have a global SAP Cloud Platform (SCP) account and have created a Space that contains an instance of service ml-foundation including a service key.

Target Reader Group

This tutorial shall give developers some insides how to use TensorFlow serving API and how to deploy and run an own model on SCP.

Even so a TensorFlow model is build up from scratch you won’t learn anything about neuronal networks or why certain values for hyperparameter where chosen.

Build up a TensorFlow Model

Get the Data

We want to build an images classification model that is using the CIFAR-10 dataset. This dataset covers 60000 color images of 32×32 pixels, which have been categories in 10 classes. The classes are ‘Airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’, ‘Dog’, ‘Frog’, ‘Horse’, ‘Ship’ and ‘Truck’. You can download the images from here: cifar-10-python.tar.gz. Extract the images to a new folder that cloud be called cifar10 blow your home directory. We want to use this folder to store all the files we create throughout the whole tutorial.

Create the Python Programs

In this tutorial we will create four files separating the different tasks that need to be fulfilled to build, run and save a TensorFlow model:

  1. Read and manage the images.
  2. Provide some helper functions to define the model.
  3. Provide some helper functions to save the model.
  4. Describe, train and save the model.

As mentioned, we will store all our python files in the same folder we have stored the images in.

Read Images

This first file, we want to call data_manager.py and store it in the same folder as the cifar10 images, will contain two classes that shall support us reading and converting the image data as well as providing them to the model.

First and of special interest CifarImageProvider. This class loads the images and labels from file system and converts them:

import numpy as np
import time
import pickle

class CifarImageProvider(object):

    def __init__(self, source_files, batch_size ):
        self._source = source_files
        self._i = 0
        self._batch_size = batch_size
        self.images = None
        self.labels = None
        self._start_time = 0.0
    
    def _unpickle(self, file):
        with open(file, 'rb') as fo:
            directory = pickle.load(fo, encoding="bytes")
        return directory
    
    def _one_hot(self, vec, vals=10):
        n = len(vec)
        out = np.zeros((n, vals))
        out[range(n), vec] = 1
        return out
    
    def load(self):
        data = [self._unpickle(f) for f in self._source]
        images = np.vstack([d[b"data"] for d in data])
        n = len(images)
        self.images = images.reshape(n,3,32,32).transpose(0,2,3,1).astype(float)/255 
        self.labels = self._one_hot(np.hstack([d[b"labels"] for d in data]), 10)
        return self
        
    def next_batch(self):
        x,y = self.images[self._i:self._i + self._batch_size], self.labels[self._i:self._i + self._batch_size] 
        self._i = (self._i + self._batch_size) % len(self.images)   
        return x,y
    def new_epoch(self):
        self._start_time = time.time()
        self._i = 0
        
    def end_epoch(self):
        return time.time() - self._start_time

This is already a lot of code and most is just some utility functionality. But we should have a look at the load function. As the name already intends it reads the images from file system, but it does something important in addition, it converts the images. In the original format (NCHW) the three colors were separated, so we have three layers 32×32 each. This is transposed so we have a 32x32x3 tensor (NHWC) or each pixel has now all the color information. Finally, the integer value from 0 to 255 are converter into floats between 0 and 1 as deep networks tent to work better then. This is important to remember, as this is the format we must provide the image data to our model and this will be the format an image needs to have for inference.

This is done by the following line:

self.images = images.reshape(n,3,32,32).transpose(0,2,3,1).astype(float)/255 

The second class, we create in data_manager.py, is just a small façade, so we can better deal with training and test data:

class CifarImageManager(object):
    BATCH_SIZE = 100
    
    def __init__(self, batch_size=BATCH_SIZE ):
        self.train = CifarImageProvider(["data_batch_{}".format(i) for i in range(1,6)], batch_size=batch_size).load()
        self.test = CifarImageProvider(["test_batch"], batch_size=batch_size).load()
        
        
    def print_statistics(self):
        print("Number of images train: {}".format(len(self.train.images)))
        print("Number of labels train: {}".format(len(self.train.labels)))
        print("Number of images test: {}".format(len(self.test.images)))
        print("Number of labels test: {}".format(len(self.test.labels)))
        
        
    def train_len(self):
        return len(self.train.images)
    
    
    def test_len(self):
        return len(self.test.images)

Having done that, it becomes time for a small test. We want to load the images and have a look at the statistic:

imageManager = CifarImageManager()
imageManager.print_statistics()

Open a console, navigate to your cifar10 folder and execute python cifar10.py command. After the program as finish you should see:

Number of images train: 50000

Number of labels train: 50000

Number of images test: 10000

Number of labels test: 10000

Creating Model Elements

The next file, model_elements.py, will contain some helper functions to let us build up our model. This is quite useful as we want to build up a CNN having three identical layers, using batch normalization, average pooling and parameterized rectified linear unit as activation function. As the later one, at least till TensorFlow 1.11, is not part of TensorFlow, we must create it in on our own.

import tensorflow as tf


def weight_variable(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev = 0.1))


def bias_variable(shape):
    return tf.Variable(tf.constant(0.1, shape = shape))


def pool_2x2(x):
    return tf.nn.avg_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')


def conv_layer_bn(input_layer, shape, training, activation = tf.nn.relu6):
    '''
    Convolutional layer with batch normalization and average pooling
    Args:
        input_layer: input tensor
        shape: target shape respectively filter. It has for dimensions: 
            filter_height, filter_width, in_channels, out_channels
        training: Indicator if the model is used in training or 
            inference, needed for batch normalization
        activation: Activation function to be used
    '''
    with tf.name_scope("CNN-layer"):
        W = weight_variable(shape)
        b = bias_variable([shape[3]])
        cnn = tf.nn.conv2d(input_layer, W, strides = [1, 1, 1, 1], padding = 'SAME') + b
        bn = tf.layers.batch_normalization(cnn, training = training, momentum = 0.9)
        return pool_2x2(activation(bn))


def full_layer(input_layer, size):
    '''
    Full connected layer. Main purpose here is to convert the output of the CNN layer,
        the so-called feature vector, into the classes
    Args:
        input_layer: input tensor
        size: number of output neurons    
    '''
    in_size = int(input_layer.get_shape()[1])
    W = weight_variable([in_size, size])
    b = bias_variable([size])
    return tf.matmul(input_layer, W) + b


def lrelu6(x, alpha = 0.2, name = "LeakyReLU"):
    '''
    Leaky Rectified Linear Unit with fixed saturation of 6. This activation 
        function is not part of tensorflow as of now.
    Args:
        x: tensor
        alpha: slope of the negative part
    '''
    return tf.maximum(alpha * x, tf.minimum(6.0, x), name = name)


def prelu6(x, name = "ParameterizedReLU"):
    '''
    Parametric Rectified Linear Unit with fixed saturation of 6. 
        This activation function is not part of tensorflow as of now.
        prelu learns the slope of the negative part
    Args: 
        x: tensor
    '''
    with tf.variable_scope(name_or_scope = None, default_name = "prelu"):
        alpha = tf.get_variable("prelu", shape = x.get_shape()[-1],
                                dtype = x.dtype, initializer = tf.constant_initializer(0.1))
        return lrelu6(x, alpha, name = name)

Preparing Model Saving

BYOM uses TensorFlow serving. This expects a model provided in a special format. This format can be created using a Saved Model Builder. TensorFlow Saved Model provides three types of APIs called Classify, Regress and Predict. The request implementation, which we will use in the second part, for the different APIs differ. PredictRequest on the one side expects values encoded via TensorProto. ClassificationRequest and RegressionRequest on the other side expect the data as Example. We just look at Predict, as it does not require to preprocess data during training and we get back all the information we need.

We start with creating tf_serving.py, which contains two helper functions supporting us in saving our trained model.

The first step we need to make is to create a folder to save our model in:

#---------------------------------------------------------------------
# Prepare Saving
#---------------------------------------------------------------------
from pathlib import Path
from tensorflow.saved_model import utils
from tensorflow.saved_model import signature_def_utils
from tensorflow.saved_model import signature_constants
import datetime


def build_path(prefix):
    path =  Path.joinpath(Path.home(), prefix, datetime.datetime.utcnow().strftime('%Y-%m-%dT%H%M%S') )
    print("Model saved to {}".format(path))
    return str(path)

The saved model builder requires to create this folder by its own, so you can’t overwrite a model. To ensure we get always a new one we add a timestamp as last folder.

Next, we define the PREDICT API that we will use for inference.

def predict_signature(input_layer, prediction):
    tensor_info_x = utils.build_tensor_info(input_layer)
    tensor_info_y = utils.build_tensor_info(prediction)
    return signature_def_utils.build_signature_def(
        inputs = {'images' : tensor_info_x},
        outputs = {'scores' : tensor_info_y},
        method_name = signature_constants.PREDICT_METHOD_NAME)

The helper function has two parameter input_layer and prediction. The first one is a tensor that is the entry point in our TensorFlow graph. It describes how the model expect the input and where to inject the input into the graph. The second one describes the output of the model, so on the one hand how the output looks like and on the other hand it gives the node in our graph from which the output shall be taken. We will see that when we build your model.

The API definition itself is called signature. As we build a prediction API our signature has one input and one output parameter. The names, ‘images’ and ‘scores’, are given by us. We cloud have chosen whatever we want.

Building Up the Model

Now we have all parts we need to build our model. For this we create cifar10.py. The script will be described in four blocks. Giving the change to explain each.

As mentioned we build a CNN for image classification. This model shall have three convolutional layers and one fully connected one. In the first block we start with declaring some hyperparameter:

Out of the hyperparameter the following shall be explained:

#---------------------------------------------------------------------
# Build TensorFlow Model 
#---------------------------------------------------------------------
import numpy as np
import tensorflow as tf
import tf_serving as tfs
import model_elements as cifar
from tensorflow.saved_model import builder as saved_builder
from tensorflow.saved_model import tag_constants
from data_manager import CifarImageManager
from pprint import pprint

HYPER_PARAMETER= {
    'TF_VERSION' : tf.__version__,
    'BATCH_SIZE' : 50,
    'NO_EPOCHS' : 50,
    'LEARNING_RATE' : 0.00075,
    'KEEP_PROB' : 0.5,
    'C1' : 64,
    'C2' : 128,
    'C3' : 256
}
EXPORT_DIR = 'tutorial'
pprint(HYPER_PARAMETER)
  • NO_EPOCHS give the number of epochs used for training.
  • C1, C2 and C3 give the number of filters per convolutional layer.

In addition, we define the path below the home directory to the folder the model shall be saved in.

In the next block, we create two functions to inform us about the progress during the training. First one to determine the test accuracy and second one to have a progress indicator.

def test(sess, epoche=0, end='\n'): 
    run_time = cifarManager.train.end_epoch()
    X = cifarManager.test.images.reshape(10,1000,32,32,3)
    Y = cifarManager.test.labels.reshape(10,1000,10)
    acc = np.mean([sess.run(accuracy, feed_dict={x: X[i], y_: Y[i], keep_prob: 1.0}) for i in range(10)])
    print("{} : {:.4}% : {}".format(epoche +1, acc*100, round(run_time, 0)), end=end)

    
def indicator(n):
    s = '[                                                                                                    ] '.replace(' ', '.', n) + str(n) + '%'
    print('\r', s, end='')

Now we can build up the model:

cifarManager = CifarImageManager(HYPER_PARAMETER['BATCH_SIZE'])

x = tf.placeholder(tf.float32, shape=[None,32,32,3], name='x')
y_ = tf.placeholder(tf.float32, shape=[None, 10], name='y_')
training = tf.placeholder_with_default(False, shape=(), name='training')
keep_prob = tf.placeholder_with_default(1.0, shape=(), name='keep_prob')

'''
Three convolutional layer 
'''
conv1 = cifar.conv_layer_bn(x, shape=[3,3,3,HYPER_PARAMETER['C1']], training=training, activation=cifar.prelu6)
conv2 = cifar.conv_layer_bn(conv1, shape=[3,3,HYPER_PARAMETER['C1'],HYPER_PARAMETER['C2']], training=training, activation=cifar.prelu6)
conv3 = cifar.conv_layer_bn(conv2, shape=[3,3,HYPER_PARAMETER['C2'],HYPER_PARAMETER['C3']], training=training, activation=cifar.prelu6)
'''
Convert output of cnn-layer into feature vector, plus drop-out to prevent overfitting
'''
conv3_flat = tf.reshape(conv3, [-1, 4*4*HYPER_PARAMETER['C3']])
conv3_drop = tf.nn.dropout(conv3_flat, keep_prob=keep_prob)
'''
Convert feature vector into on hot encoding representing the 10 classes
'''
full_1 = cifar.prelu6(cifar.full_layer(conv3_drop, 512))
full1_drop = tf.nn.dropout(full_1, keep_prob=keep_prob)
prediction = cifar.full_layer(full1_drop, 10)


cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=prediction, labels=y_))
train_step = tf.train.AdamOptimizer(HYPER_PARAMETER['LEARNING_RATE']).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Nothing too exciting if you are familiar with TensorFlow. Only three things shall be mentioned here:

  1. Give your variables and placeholder proper names. Otherwise, if something went wrong during deployment or interference, you will get problems finding the error.
  2. The shape of the placeholder x and y_ start with None. This is important, as it let TensorFlow accept any batch size including batch size one, which we need if we want to infer later in production single images.
  3. During inference we do not want to provide values for the placeholder training and keep_probe. So, we use placeholder_with_default As default value we choose the one we want to have during inference.

 

Finally, we have code to train and save the trained model.

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)    
builder = saved_builder.SavedModelBuilder(tfs.build_path(EXPORT_DIR))

    
with tf.Session() as sess:
    cifar.sess = sess
    sess.run(tf.global_variables_initializer())
    steps = cifarManager.train_len() // HYPER_PARAMETER['BATCH_SIZE']
    for j in range(HYPER_PARAMETER['NO_EPOCHS']):
        cifarManager.train.new_epoch()
        for i in range(steps):
            batch = cifarManager.train.next_batch()
            sess.run([train_step, extra_update_ops], feed_dict={training : True, x:batch[0], y_:batch[1], keep_prob: 0.5})
            indicator(i * 100 // steps)
        print(" ", end='')
        test(sess, j, end='')
        print()
    builder.add_meta_graph_and_variables(sess, 
                                         [tag_constants.SERVING], 
                                         signature_def_map={
                                             'predict_images' : tfs.predict_signature(x, prediction),
                                             signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY : tfs.predict_signature(x, prediction) })
builder.save()   

We want to have a look at three statements:

  1. builder = saved_builder.SavedModelBuilder(tfs.build_path(EXPORT_DIR)): This statement gives us the Saved Model Builder.
  2. add_meta_graph_and_variables( …: This is maybe the most interesting statement. We provide the session that is running, so the saver knows the graph and has access to all current Variable values. We tag our model to be foreseen for serving and provide a list of signature definitions. You should notice that we give every signature a name and that we can define a default signature, for the scope of this tutorial they are the same.
  3. save(): Finally save our trained model.

Now we have all the python code is there to train our model. Your folder should now contain 12 files:

Let’s run cifar10.py. We can do that e.g. by opening a console, navigate to the folder you create the python files in and call e.g. python cifar10.py. But be aware that this will run a while. If you are in a hurry or you are not using GPU you should halve the values of the hyperparameter C1, C2 and C3 and may reduce the number of epochs. If the python program has finished, you may see something like this:

That is, we achieved ~85% accuracy for the test data and it took round about 22 seconds per epoch. In case you have halved C1, C2 and C3 it is likely that you get the following:

There is a last thing that we need to do, we have to go to the folder the TensorFlow model was saved to. Here you should find two things. A file that contains the model and a folder for the variables:

Both need to be combined in a .zip file, which we want to call cifar10.zip.

That’s it for this tutorial. In the second part, as mentioned, we will upload our model to MLF deploy it and infer some examples.

2 Comments
You must be Logged on to comment or reply to a post.