Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
YatseaLi
Product and Topic Expert
Product and Topic Expert
In my previous blog, we have seen how the Object Detection with tensorflow and yolo  is applied in Enterprise context in conjunction with SAP Leonardo Machine Learning Foundation. Now we will have a close look at how to implement custom object detection with yolo for creating intelligent solutions, especially how to train a custom object detector with custom dataset, and provision it as RESTful API running on SAP Cloud Platform, Cloud Foundry, being consumed by your intelligent solution through loosely-coupled HTTP(s).

My blog series of Object Detection for Intelligent Enterprise:

Overview of YOLO Object Detection


You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. Have a look at this inspiring video about How computers learn to recognize objects instantly by Joseph Redmon on TED talk. As well as this introductory video about YOLO Algorithm by Adrew Ng.

For more detail about YOLO, you may refer to its official website.

Train custom object detector with YOLO


1.Prepare Dataset


In order to train your own object detector, you need to prepare the dataset for training, including  the images with the target objects, and labelling the object in the images. Here you also have my read-to-use shoe dataset(including images and yolo label files) for a quick start, which you can skip step 1 and step 2.

Step 1: Download Images with Target Objects


In my case, we need to be able to detect shoe in the SMB Market Place Solution for an intelligent online shopping experience on finding matched shoe with a photo through Facebook Messenger. So I need a dataset of shoe images, which I can easily to find from google by searching "shoe" images.

To download the images in bulk, I used a google chrome extension named Fatkun Batch Download Image.

  • Unselect the images without shoe and the carton images(no JPG format) from google search result.

  • Click More Options buttons to rename the image with format "shoes_{NO000}.JPEG" as attached screen, which will save the image as shoes_000.JPEG~shoe_999.JPEG
    Some tips:
    1).Save the image with to an appropriate format. In my case, JPEG format is required by annotation tool (in my case )afterwards.
    2).The closer images(angle, background etc) in the training dataset to the real image input in your case, the more accurate detection results.
    3).You may need from 300~600 images per class for a relatively very good detection result as expected. It may requires more image from different angle and background to have a nearly perfect detection. In my case I have 600 images downloaded(540 for training, 60 for testing).

  • Click Save Image button then you will have the images downloaded. Let's rename the image folder as "dataset".


Step 2: Label the Images with the Target Objects


Now you need to annotate all the downloaded images by marking the exact bounding boxes of shoe on the images with annotation tool.

In my case, I use LabelImg to label the shoe images with YOLO format, which supports VOC-Pascal and YOLO format. Just simply save the yolo output txt file in the same folder of images(dataset).



As a result, yolo format annotation are created for all the images. An example of yolo annotation as below.


0 0.324444 0.371111 0.337778 0.484444
0 0.642222 0.640000 0.200000 0.506667

class_index box_x1_ratio box_y1_ratio box_width_ratio box_height_ratio
0 - The index of object, in my case, only one class - shoe.
0.324444 - box_x1_ratio(box_x1 / image_width)
0.371111 - box_y1_ratio(box_y1 / image_height)
...



Step 3: Generate the image list text files for training and testing.


1).Download the scripts


2).Structure the directories tree as below:
The sample source code available here

training

     -dataset //(Please copy your dataset from step 2 to here)

           -*.JPEG

           -*.txt








3).Run this script:


$ python create_train_list.py



As a result, 10% of the dataset will be allocated into test_list.txt, the rest of the dataset as train_list.txt. Now we are ready to train the custom object detector.


2.Training Custom Object Detection Model


Step 4: Follow this manual to install Darknet for YOLO


Simply, just run the following commands.
git clone https://github.com/pjreddie/darknet
cd darknet
make

[Optional] If you would like to play with YOLO object detection with pre-trained model on MS COCO dataset, you can follow the steps in the manual to download the yolov3.weights and run the detector with command
./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

If you are running a Windows machine, you can refer to this fork.

Step 5: Copy the training folder in step 3 to darknet folder


as a result, darknet/training incorporate all training related materials.

Step 6: Recalculate the anchor box with K-Mean.


This step is important to have a successful training, which the anchor box is recalculated with the training dataset

For yolo v3:

Please run the generate_anchors_yolo_v3.py in training folder to recalculate the anchor boxes with K-Mean. 10 anchors is required in  yolo v3 configuration.

python generate_anchors_yolo_v3.py -filelist <path to train_list.txt generated in step3> -num_clusters <number of clusters>

For example:
python generate_anchors_yolo_v3.py -filelist train_list.txt -num_clusters 10

As a result, the anchors is generated in ./anchors/anchors10.txt, which will used in Step 7.

For yolo v2:

If you are after yolo v2, please use generate_anchors_yolo_v2.py instead.

python generate_anchors_yolo_v2.py -filelist train_list.txt -num_clusters 5.

The default yolo v2 configuration will require 5 anchors.As a result, the anchors are generated in ./anchors/anchors5.txt, which will used in Step 7.

Step 7: Prepare the yolo training configuration files.


Example:

  • cfg/yolov3.cfg:  The yolo v3 configuration file for MS COCO dataset, which will be used for training and detection

  • data/coco.names: The label name list ofMS COCO dataset

  • data/coco.data: The training configuration forMS COCO dataset.


We will need to create our own cfg, names and data files for custom object detection.

For yolo v3:

1).Prepare the yolo configuration file(.cfg)

Copy the yolov3-voc.cfg in darknet/cfg folder, and rename it as yolov3_shoe.cfg

cfg file define the CNN for yolo. The following options  be updated.

line 2~7:

Comment the batch and subdivision for Testing and uncomment for Training

  • batch: The number of images being processed  images for each training step

  • subdivision: The batch is divided by subdivision to decrease GPU VRAM requirements. If you have a powerful GPU with loads of VRAM, this number can be decreased, or batch could be increased. If the training step will throw a CUDA out of memory error so you can decrease the batch and increase the subdivisions accordingly.




# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16​



Line 605, 689

  • filters: 3 * (5 + class number).

  • 3 - yolo v3 supports 3 scales

  • 5 - Output vector: (prob, x1, y1, width, height, c1,c2...).
    The fixed 5: includes prob, x1, y1, width, height.
    Variance: c1 - result of the first class, c2 -result of the second class...

  • class number: The number of target object classes


In my case, only one object (shoe) for detection, class number = 1, so filters=3 * (5 + 1) = 18.


filters=18


Line 610~611, 778~779


  • anchors: The anchor box in K-MEAN of the training data set. Please replace the anchors with the result of step 6 in anchor10.txt

  • classes: the number of class. In my case, it is one.



anchors = <Please replae with the result of anchor10.txt in step 6>
classes=1



2).Create a name list file of labels as custom.names in training/cfg folder
shoe

3).Create training configuration file as shoe_training_config.data in training/cfg folder
classes= 1
train = training/train_list.txt
valid = training/test_list.txt
names = cfg/custom.names
backup = backup

For yolo v2:

1).Prepare the yolo configuration file

Copy the yolov2-voc.cfg in darknet/cfg folder, and rename it as yolov2_shoe.cfg

cfg file define the CNN for yolo. The following configurations need to  be updated.

line 2~7:

Comment the batch and subdivision for Testing and uncomment for Training

  • batch: The number of images being processed  images for each training step

  • subdivision: The batch is divided by subdivision to decrease GPU VRAM requirements. If you have a powerful GPU with loads of VRAM, this number can be decreased, or batch could be increased. If the training step will throw a CUDA out of memory error so you can decrease the batch and increase the subdivisions accordingly.




# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16​



Line 237

  • filters: 5 * (5 + class number).


In my case, only one object (shoe) for detection, class number = 1, so filters=5 * (5 + 1) = 30.


filters=30


Line 242


  • anchors: The anchor box in K-MEAN of the training data set. Please replace the anchors with the result of anchor5.txt in  step 6



anchors = <Please replae with the result of anchor5.txt in step 6>



Line 244:

  • classes: The number of classes


classes=1

2).Create a name list file of labels as custom.names in cfg folder
shoe

3).Create training configuration file as shoe_training_config.data in cfg folder
classes= 1
train = training/train_list.txt
valid = training/test_list.txt
names = cfg/custom.names
backup = backup

Step 8: Train the Custom Object Detection Model:


For training environment:

For yolo v3:

1).Download Pretrained Convolutional Weights

For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the darknet53 model. You can just download the weights for the convolutional layers here(76 MB).

2).Train The Model with command below
./darknet detector train cfg/shoe_training_config.data cfg/yolov3_shoe.cfg darknet53.conv.74

I stop the training at a stable average loss at 0.2 after around 10000 iteration.

For yolo v2:

1).Download Pretrained Convolutional Weights

For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the Extraction model. You can just download the weights for the convolutional layers here(76 MB).

2).Train The Model with command below
./darknet detector train cfg/shoe_training_config.data cfg/yolov2_shoe.cfg darknet19_448.conv.23

I stop the training at a stable average loss at 0.5 after around 6000 iteration. As a result, the weights file of training result can be found in the backup folder.

3.Testing Custom Object Detection Model


Run the command below:
#For yolo v3:
./darknet detector test cfg/shoe_training_config.data cfg/yolov3_shoe.cfg ./backup/yolov3_shoe.backup

#For yolo v2:
./darknet detector test cfg/shoe_training_config.data cfg/yolov2_shoe.cfg ./backup/yolov2_shoe.backup

 

Then specify the image path with target object, you will receive the result from terminal.
mask_scale: Using default '1.000000'
Loading weights from ./backup/yolov3_shoe.backup...Done!
Enter Image Path: data/Shoe.jpg
data/Shoe.jpg: Predicted in 3.484773 seconds.
shoe: 95%
shoe: 90%

You also can check the predictions.png in darknet root folder. It shows you like this.


Running the custom object detector as RESTful API on SAP Cloud Platform, Cloud Foundry


I have implemented a generic NodeJS RESTful API Wrapper for turning YOLO object detection into a RESTful API of object detection, which can be deployed on SAP Cloud Platform, Cloud Foundry or on-premise environment.

The project can be can be reused to provision your own custom object detector as web service with easy configuration. The source code is published with MIT license available here. Please follow its manual to download, configure and deploy your custom object detector asRESTful API.

As a result, you will have custom object detection provisioned as RESTful API.

POST /Detect


Object detection with given image url and detection threshold

Request sample:



{
"ImageUrl": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSE36LOJ6NzReh-W_o5QKkgTUH7qbFygG_J1A0PWoPBnaH9UW50",
"Threshold": 0.80
}


Response sample:



[
{
"box": {
"y": 97.06493854522705,
"x": 87.51638531684875,
"w": 122.85414934158325,
"h": 62.75526809692383
},
"name": "shoe",
"prob": 0.9999971389770508
}
]
Note: x-top left x, y-top left y, w-width, h-height



Demo kit:



A web demo kit of yolo object detection can be accessed after deployment: http://<YOUR_OBJECT_DETECTOR_HOST>:<PORT>/web/Detector



References: