Object Detection with Tensorflow(SSD) for Intellig...

YatseaLi · ‎07-25-2018

In my previous blog, we have seen how the off-the-shelf Object Detection is applied in Enterprise context. Now we will have a close look at how to implement custom object detection with tensorflow for serving intelligent solutions, especially how to train a custom object detector with custom dataset, and provision as RESTful API running on SAP Cloud Platform, Cloud Foundry, which can be consumed by your intelligent solution through loosely-coupled HTTP(s).

My blog series of Object Detection for Intelligent Enterprise:

Off-the-shelf Object Detection for Intelligent Enterprise

Object Detection with Tensorflow for Intelligent Enterprise (this blog)

Object Detection with YOLO for Intelligent Enterprise

Overview of Tensorflow Object Detection API

The TensorFlow Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models. For more detail, you may refer to its official document.

Train custom object detector with Tensorflow Object Detection API

1.Prepare Dataset

In order to train your own object detector, you need to prepare the dataset for training, including the images with the target objects, and labelling the object in the images. Here you also have my read-to-use shoe dataset(including images and VOC-Pascal format annotation files) for a quick start, which you can skip step 1 and step 2.

Step 1: Download Images with Target Objects

In my case, we need to be able to detect shoe in the SMB Market Place Solution for an intelligent online shopping experience on finding matched shoe with a photo through Facebook Messenger. So I need a dataset of shoe images, which I can easily to find from google by searching "shoe" images.

To download the images in bulk, I used a google chrome extension named Fatkun Batch Download Image.

Unselect the images without shoe and the carton images(no JPG format) from google search result.

Click More Options buttons to rename the image with format "shoes_{NO000}.JPEG" as attached screen, which will save the image as shoes_000.JPEG~shoe_999.JPEG
Some tips:
1).Save the image with to an appropriate format. In my case, JPEG format is required by annotation tool (in my case )afterwards.
2).The closer images(angle, background etc) in the training dataset to the real image input in your case, the more accurate detection results.
3).You may need from 300~600 images per class for a relatively very good detection result as expected. It may requires more image from different angle and background to have a nearly perfect detection. In my case I have 600 images downloaded(540 for training, 60 for testing).

Click Save Image button then you will have the images download

Step 2: Label the Images with the Target Objects

Now you need to annotate the image to mark the exact bounding box of each shoe for all the downloaded shoe images with the annotation tool.

In my case, I use LabelImg to label the shoe images with VOC-Pascal,which can annotate the images in VOC-Pascal or YOLO format.

As a result, VOC-Pascal format annotation are created. An example of VOC annotation in xml as below. However, tensorflow requires tfrecord format for training instead of VOC-Pascal format.

<annotation>

    <folder>shoe</folder>

    <filename>xshoes_001.JPEG</filename>

    <path>/Users/i033357/WorkSpace/12-Technology/tensorflow/labelImg/training/shoe/xshoes_001.JPEG</path>

    <source>

        <database>Unknown</database>

    </source>

    <size>

        <width>272</width>

        <height>272</height>

        <depth>3</depth>

    </size>

    <segmented>0</segmented>

    <object>

        <name>shoe</name>

        <pose>Unspecified</pose>

        <truncated>1</truncated>

        <difficult>0</difficult>

        <bndbox>

            <xmin>1</xmin>

            <ymin>155</ymin>

            <xmax>228</xmax>

            <ymax>250</ymax>

        </bndbox>

    </object>

    <object>

        <name>shoe</name>

        <pose>Unspecified</pose>

        <truncated>0</truncated>

        <difficult>1</difficult>

        <bndbox>

            <xmin>46</xmin>

            <ymin>150</ymin>

            <xmax>271</xmax>

            <ymax>238</ymax>

        </bndbox>

    </object>

</annotation>

Step 3: Convert the annotations from VOC-Pascal to TFRecord with script.

1).Download the sample source code available here for reference.

2).Structure the directories tree as below:
under training/dataset

dataset

-images (Place all the images here)

-annotations (Place all the annotation xml file here.)

voc_to_tfrecord.py

3).Run this script to convert the dataset from VOC format to tfrecord:

$ python voc_to_tfrecord.py

As a result, 10% of the dataset will be converted into test.record, the rest of the dataset as train.record. Now we are ready to train the custom object detector.

2.Training Custom Object Detection Model

Step 4: Follow this manual to install Tensorflow Object Detection API.

Step 5: Copy your own /training/dataset folder prepared in step 3 to the object_detection folder of Tensorflow Object Detection API.

Which you have downloaded and installed in step 4.

Step 6: Train the Custom Object Detection Model:

There are plenty of tutorials available online. I followed this tutorial for training my shoe model. The only difference is:

I use ssdlite_mobilenet_v2_coco.config and ssdlite_mobilenet_v2_coco pretrained model as reference instead of ssd_mobilenet_v1_pets.config and ssd_mobilenet_v1_coco. And you are free to choose your own reference from the official model zoo to fit for your own requirement on speed and accuracy.

For training environment:

Using a machine with GPU(nvidia), please pay attention to matched versions between CUDA and tensorflow.

Using Google Cloud ML Engine. Please follow this official tutorial with your own dataset prepared in last step instead of the Oxford-IIIT Pets Dataset.

Using a AWS EC2 instance based on AWS Deep Learning AMI

I stop the training at a stable average loss at 0.5 after around 10000 iteration.

3.Testing Custom Object Detection Model

I follow this tutorial for testing the custom object detection model.

As a result, I can test my shoe detector. Jupyter Notebook sample available here.

Running the custom object detector as RESTful API on SAP Cloud Platform, Cloud Foundry

I have implemented a generic RESTful API Wrapper for turning TensorFlow Object Detection API into a RESTful API of object detection with Flask, which can be deployed on SAP Cloud Platform, Cloud Foundry or on-premise environment. In addition, a generic object-oriented object_detector is implemented to use generic tensorflow frozen inference graph for object detection.

However, frozen inference graph should be used in test environment only. For production, Tensorflow Serving is recommended to serve your custom model with SavedModel format for flexibility and high scalability instead of frozen inference graph. Please refer to this blog about export your custom Object Detection Model as SavedModel format for TensorFlow Serving. With your custom model in SavedModel, now you can bring your own model to SAP Leonardo Machine Learning Foundation for serving, which embeds Tensorflow Serving, please refer to the tutorials below for details.

Deploy Model

Inference App

The project below can be reused to provision your custom object detection model of frozen inference graph as web service with easy configuration, which should be only used in experiment and test environment. The source code is published with MIT license available here. Please follow its manual to download, configure and deploy your custom object detector as Web Service.

As a result, you will have custom object detection provisioned as RESTful API.

POST /Detect

Object detection with given image url and detection threshold

Request sample:

{

	"ImageUrl": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSE36LOJ6NzReh-W_o5QKkgTUH7qbFygG_J1A0PWoPBnaH9UW50",

	"Threshold": 0.80

}

Response sample:

[

    {

        "box": {

            "y": 97.06493854522705,

            "x": 87.51638531684875,

            "w": 122.85414934158325,

            "h": 62.75526809692383

        },

        "name": "shoe",

        "prob": 0.9999971389770508

    }

]

Note: x-top left x, y-top left y, w-width, h-height

In addition, a web demo kit of object detection in real-time is included after your own deployment with local camera streaming in browser through
http://<YOUR_OBJECT_DETECTOR_HOST>:<PORT>/Camera