Object Detection with YOLO for Intelligent Enterpr...

YatseaLi · ‎07-25-2018

In my previous blog, we have seen how the Object Detection with tensorflow and yolo is applied in Enterprise context in conjunction with SAP Leonardo Machine Learning Foundation. Now we will have a close look at how to implement custom object detection with yolo for creating intelligent solutions, especially how to train a custom object detector with custom dataset, and provision it as RESTful API running on SAP Cloud Platform, Cloud Foundry, being consumed by your intelligent solution through loosely-coupled HTTP(s).

My blog series of Object Detection for Intelligent Enterprise:

Off-the-shelf Object Detection for Intelligent Enterprise

Object Detection with Tensorflow for Intelligent Enterprise

Object Detection with YOLO for Intelligent Enterprise (this blog)

Overview of YOLO Object Detection

You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. Have a look at this inspiring video about How computers learn to recognize objects instantly by Joseph Redmon on TED talk. As well as this introductory video about YOLO Algorithm by Adrew Ng.

For more detail about YOLO, you may refer to its official website.

Train custom object detector with YOLO

1.Prepare Dataset

In order to train your own object detector, you need to prepare the dataset for training, including the images with the target objects, and labelling the object in the images. Here you also have my read-to-use shoe dataset(including images and yolo label files) for a quick start, which you can skip step 1 and step 2.

Step 1: Download Images with Target Objects

In my case, we need to be able to detect shoe in the SMB Market Place Solution for an intelligent online shopping experience on finding matched shoe with a photo through Facebook Messenger. So I need a dataset of shoe images, which I can easily to find from google by searching "shoe" images.

To download the images in bulk, I used a google chrome extension named Fatkun Batch Download Image.

Unselect the images without shoe and the carton images(no JPG format) from google search result.

Click More Options buttons to rename the image with format "shoes_{NO000}.JPEG" as attached screen, which will save the image as shoes_000.JPEG~shoe_999.JPEG
Some tips:
1).Save the image with to an appropriate format. In my case, JPEG format is required by annotation tool (in my case )afterwards.
2).The closer images(angle, background etc) in the training dataset to the real image input in your case, the more accurate detection results.
3).You may need from 300~600 images per class for a relatively very good detection result as expected. It may requires more image from different angle and background to have a nearly perfect detection. In my case I have 600 images downloaded(540 for training, 60 for testing).

Click Save Image button then you will have the images downloaded. Let's rename the image folder as "dataset".

Step 2: Label the Images with the Target Objects

Now you need to annotate all the downloaded images by marking the exact bounding boxes of shoe on the images with annotation tool.

In my case, I use LabelImg to label the shoe images with YOLO format, which supports VOC-Pascal and YOLO format. Just simply save the yolo output txt file in the same folder of images(dataset).

As a result, yolo format annotation are created for all the images. An example of yolo annotation as below.

0 0.324444 0.371111 0.337778 0.484444

0 0.642222 0.640000 0.200000 0.506667



class_index box_x1_ratio box_y1_ratio box_width_ratio box_height_ratio

0 - The index of object, in my case, only one class - shoe.

0.324444 - box_x1_ratio(box_x1 / image_width)

0.371111 - box_y1_ratio(box_y1 / image_height)

...

Step 3: Generate the image list text files for training and testing.

1).Download the scripts

generate_train_list.py : Generate the image list for training and testing. Used in this step

generate_anchors_yolo_v3: Generate the yolo v3 anchor boxes with K-Mean for the training dataset. Will be used in the step 6

generate_anchors_yolo_v2.py: Generate the yolo v2 anchor boxes with K-Mean for the training dataset. Will be used in the step 6

2).Structure the directories tree as below:
The sample source code available here

training

-dataset //(Please copy your dataset from step 2 to here)

-*.JPEG

-*.txt

-

3).Run this script:

$ python create_train_list.py

As a result, 10% of the dataset will be allocated into test_list.txt, the rest of the dataset as train_list.txt. Now we are ready to train the custom object detector.

2.Training Custom Object Detection Model

Step 4: Follow this manual to install Darknet for YOLO

Simply, just run the following commands.

git clone https://github.com/pjreddie/darknet

cd darknet

make

[Optional] If you would like to play with YOLO object detection with pre-trained model on MS COCO dataset, you can follow the steps in the manual to download the yolov3.weights and run the detector with command

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

If you are running a Windows machine, you can refer to this fork.

Step 5: Copy the training folder in step 3 to darknet folder

as a result, darknet/training incorporate all training related materials.

Step 6: Recalculate the anchor box with K-Mean.

This step is important to have a successful training, which the anchor box is recalculated with the training dataset

For yolo v3:

Please run the generate_anchors_yolo_v3.py in training folder to recalculate the anchor boxes with K-Mean. 10 anchors is required in yolo v3 configuration.

python generate_anchors_yolo_v3.py -filelist <path to train_list.txt generated in step3> -num_clusters <number of clusters>

For example:

python generate_anchors_yolo_v3.py -filelist train_list.txt -num_clusters 10

As a result, the anchors is generated in ./anchors/anchors10.txt, which will used in Step 7.

For yolo v2:

If you are after yolo v2, please use generate_anchors_yolo_v2.py instead.

python generate_anchors_yolo_v2.py -filelist train_list.txt -num_clusters 5.

The default yolo v2 configuration will require 5 anchors.As a result, the anchors are generated in ./anchors/anchors5.txt, which will used in Step 7.

Step 7: Prepare the yolo training configuration files.

Example:

cfg/yolov3.cfg: The yolo v3 configuration file for MS COCO dataset, which will be used for training and detection

data/coco.names: The label name list ofMS COCO dataset

data/coco.data: The training configuration forMS COCO dataset.

We will need to create our own cfg, names and data files for custom object detection.

For yolo v3:

1).Prepare the yolo configuration file(.cfg)

Copy the yolov3-voc.cfg in darknet/cfg folder, and rename it as yolov3_shoe.cfg

cfg file define the CNN for yolo. The following options be updated.

line 2~7:

Comment the batch and subdivision for Testing and uncomment for Training

batch: The number of images being processed images for each training step

subdivision: The batch is divided by subdivision to decrease GPU VRAM requirements. If you have a powerful GPU with loads of VRAM, this number can be decreased, or batch could be increased. If the training step will throw a CUDA out of memory error so you can decrease the batch and increase the subdivisions accordingly.

# Testing

#batch=1

#subdivisions=1

# Training

batch=64

subdivisions=16

Line 605, 689

filters: 3 * (5 + class number).

3 - yolo v3 supports 3 scales

5 - Output vector: (prob, x1, y1, width, height, c1,c2...).
The fixed 5: includes prob, x1, y1, width, height.
Variance: c1 - result of the first class, c2 -result of the second class...

class number: The number of target object classes

In my case, only one object (shoe) for detection, class number = 1, so filters=3 * (5 + 1) = 18.

filters=18

Line 610~611, 778~779

anchors: The anchor box in K-MEAN of the training data set. Please replace the anchors with the result of step 6 in anchor10.txt

classes: the number of class. In my case, it is one.

anchors = <Please replae with the result of anchor10.txt in step 6>

classes=1

2).Create a name list file of labels as custom.names in training/cfg folder

shoe

3).Create training configuration file as shoe_training_config.data in training/cfg folder

classes= 1

train  = training/train_list.txt

valid  = training/test_list.txt

names = cfg/custom.names

backup = backup

For yolo v2:

1).Prepare the yolo configuration file

Copy the yolov2-voc.cfg in darknet/cfg folder, and rename it as yolov2_shoe.cfg

cfg file define the CNN for yolo. The following configurations need to be updated.

line 2~7:

Comment the batch and subdivision for Testing and uncomment for Training

batch: The number of images being processed images for each training step

subdivision: The batch is divided by subdivision to decrease GPU VRAM requirements. If you have a powerful GPU with loads of VRAM, this number can be decreased, or batch could be increased. If the training step will throw a CUDA out of memory error so you can decrease the batch and increase the subdivisions accordingly.

# Testing

#batch=1

#subdivisions=1

# Training

batch=64

subdivisions=16

Line 237

filters: 5 * (5 + class number).

In my case, only one object (shoe) for detection, class number = 1, so filters=5 * (5 + 1) = 30.

filters=30

Line 242

anchors: The anchor box in K-MEAN of the training data set. Please replace the anchors with the result of anchor5.txt in step 6

anchors = <Please replae with the result of anchor5.txt in step 6>

Line 244:

classes: The number of classes

classes=1

2).Create a name list file of labels as custom.names in cfg folder

shoe

3).Create training configuration file as shoe_training_config.data in cfg folder

classes= 1

train  = training/train_list.txt

valid  = training/test_list.txt

names = cfg/custom.names

backup = backup

Step 8: Train the Custom Object Detection Model:

For training environment:

Using a machine with GPU(nvidia), please pay attention to matched versions between CUDA and tensorflow.

Using Google Cloud ML Engine.

Using a AWS EC2 instance based on AWS Deep Learning AMI

For yolo v3:

1).Download Pretrained Convolutional Weights

For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the darknet53 model. You can just download the weights for the convolutional layers here(76 MB).

2).Train The Model with command below

./darknet detector train cfg/shoe_training_config.data cfg/yolov3_shoe.cfg darknet53.conv.74

I stop the training at a stable average loss at 0.2 after around 10000 iteration.

For yolo v2:

1).Download Pretrained Convolutional Weights

For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the Extraction model. You can just download the weights for the convolutional layers here(76 MB).

2).Train The Model with command below

./darknet detector train cfg/shoe_training_config.data cfg/yolov2_shoe.cfg darknet19_448.conv.23

I stop the training at a stable average loss at 0.5 after around 6000 iteration. As a result, the weights file of training result can be found in the backup folder.

3.Testing Custom Object Detection Model

Run the command below:

#For yolo v3:

./darknet detector test cfg/shoe_training_config.data cfg/yolov3_shoe.cfg ./backup/yolov3_shoe.backup



#For yolo v2:

./darknet detector test cfg/shoe_training_config.data cfg/yolov2_shoe.cfg ./backup/yolov2_shoe.backup

Then specify the image path with target object, you will receive the result from terminal.

mask_scale: Using default '1.000000'

Loading weights from ./backup/yolov3_shoe.backup...Done!

Enter Image Path: data/Shoe.jpg

data/Shoe.jpg: Predicted in 3.484773 seconds.

shoe: 95%

shoe: 90%

You also can check the predictions.png in darknet root folder. It shows you like this.

Running the custom object detector as RESTful API on SAP Cloud Platform, Cloud Foundry

I have implemented a generic NodeJS RESTful API Wrapper for turning YOLO object detection into a RESTful API of object detection, which can be deployed on SAP Cloud Platform, Cloud Foundry or on-premise environment.

The project can be can be reused to provision your own custom object detector as web service with easy configuration. The source code is published with MIT license available here. Please follow its manual to download, configure and deploy your custom object detector asRESTful API.

As a result, you will have custom object detection provisioned as RESTful API.

POST /Detect

Object detection with given image url and detection threshold

Request sample:

{

	"ImageUrl": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSE36LOJ6NzReh-W_o5QKkgTUH7qbFygG_J1A0PWoPBnaH9UW50",

	"Threshold": 0.80

}

Response sample:

[

    {

        "box": {

            "y": 97.06493854522705,

            "x": 87.51638531684875,

            "w": 122.85414934158325,

            "h": 62.75526809692383

        },

        "name": "shoe",

        "prob": 0.9999971389770508

    }

]

Note: x-top left x, y-top left y, w-width, h-height

Demo kit:

A web demo kit of yolo object detection can be accessed after deployment: http://<YOUR_OBJECT_DETECTOR_HOST>:<PORT>/web/Detector

References:

YOLO: Real-Time Object Detection: https://pjreddie.com/darknet/yolo/

How to train YOLOv2 to detect custom objects: https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/