Skip to Content
Technical Articles
Author's profile photo Yatsea Li

Object Detection with Tensorflow(SSD) for Intelligent Enterprise

In my previous blog, we have seen how the off-the-shelf Object Detection is applied in Enterprise context. Now we will have a close look at how to implement custom object detection with tensorflow for serving intelligent solutions, especially how to train a custom object detector with custom dataset, and provision as RESTful API running on SAP Cloud Platform, Cloud Foundry, which can be consumed by your intelligent solution through loosely-coupled HTTP(s).

My blog series of Object Detection for Intelligent Enterprise:

Overview of Tensorflow Object Detection API

The TensorFlow Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models. For more detail, you may refer to its official document.

Train custom object detector with Tensorflow Object Detection API

1.Prepare Dataset

In order to train your own object detector, you need to prepare the dataset for training, including  the images with the target objects, and labelling the object in the images. Here you also have my read-to-use shoe dataset(including images and VOC-Pascal format annotation files) for a quick start, which you can skip step 1 and step 2.

Step 1: Download Images with Target Objects

In my case, we need to be able to detect shoe in the SMB Market Place Solution for an intelligent online shopping experience on finding matched shoe with a photo through Facebook Messenger. So I need a dataset of shoe images, which I can easily to find from google by searching “shoe” images.

To download the images in bulk, I used a google chrome extension named Fatkun Batch Download Image.

  • Unselect the images without shoe and the carton images(no JPG format) from google search result.
  • Click More Options buttons to rename the image with format “shoes_{NO000}.JPEG” as attached screen, which will save the image as shoes_000.JPEG~shoe_999.JPEG
    Some tips:
    1).Save the image with to an appropriate format. In my case, JPEG format is required by annotation tool (in my case )afterwards.
    2).The closer images(angle, background etc) in the training dataset to the real image input in your case, the more accurate detection results.
    3).You may need from 300~600 images per class for a relatively very good detection result as expected. It may requires more image from different angle and background to have a nearly perfect detection. In my case I have 600 images downloaded(540 for training, 60 for testing).
  • Click Save Image button then you will have the images download

Step 2: Label the Images with the Target Objects

Now you need to annotate the image to mark the exact bounding box of each shoe for all the downloaded shoe images with the annotation tool.

In my case, I use LabelImg to label the shoe images with VOC-Pascal,which can annotate the images in VOC-Pascal or YOLO format.

As a result, VOC-Pascal format annotation are created. An example of VOC annotation in xml as below. However, tensorflow requires tfrecord format for training instead of VOC-Pascal format.


Step 3: Convert the annotations from VOC-Pascal to TFRecord with script.

1).Download the sample source code available here for reference.
2).Structure the directories tree as below:
under training/dataset
     -images (Place all the images here)
     -annotations (Place all the annotation xml file here.)
3).Run this script to convert the dataset from VOC format to tfrecord:
$ python

As a result, 10% of the dataset will be converted into test.record, the rest of the dataset as train.record. Now we are ready to train the custom object detector.

2.Training Custom Object Detection Model

Step 4: Follow this manual to install Tensorflow Object Detection API.

Step 5: Copy your own /training/dataset folder prepared in step 3 to the object_detection folder of  Tensorflow Object Detection API.

Which you have downloaded and installed in step 4.

Step 6: Train the Custom Object Detection Model:

There are plenty of tutorials available online. I followed this tutorial  for training my shoe model. The only difference is:

For training environment:

I stop the training at a stable average loss at 0.5 after around 10000 iteration.

3.Testing Custom Object Detection Model

I follow this tutorial for testing the custom object detection model.

As a result, I can test my shoe detector. Jupyter Notebook sample available here.

Running the custom object detector as RESTful API on SAP Cloud Platform, Cloud Foundry

I have implemented a generic RESTful API Wrapper for turning TensorFlow Object Detection API into a RESTful API of object detection with Flask, which can be deployed on SAP Cloud Platform, Cloud Foundry or on-premise environment. In addition, a generic object-oriented object_detector is implemented to use generic tensorflow frozen inference graph for object detection.

However, frozen inference graph should be used in test environment only. For production, Tensorflow Serving is recommended to serve your custom model with SavedModel format for flexibility and high scalability instead of frozen inference graph. Please refer to this blog about export your custom Object Detection Model as SavedModel format for TensorFlow Serving. With your custom model in SavedModel, now you can bring your own model to SAP Leonardo Machine Learning Foundation for serving, which embeds Tensorflow Serving, please refer to the tutorials below for details.

The project below can be reused to provision your custom object detection model of frozen inference graph as web service with easy configuration, which should be only used in experiment and test environment. The source code is published with MIT license available here. Please follow its manual to download, configure and deploy your custom object detector as Web Service.

As a result, you will have custom object detection provisioned as RESTful API.

POST /Detect

Object detection with given image url and detection threshold

Request sample:

	"ImageUrl": "",
	"Threshold": 0.80

Response sample:

        "box": {
            "y": 97.06493854522705,
            "x": 87.51638531684875,
            "w": 122.85414934158325,
            "h": 62.75526809692383
        "name": "shoe",
        "prob": 0.9999971389770508
Note: x-top left x, y-top left y, w-width, h-height

 In addition, a web demo kit of object detection in real-time is included after your own deployment with local camera streaming in browser through

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.