Skip to Content
Technical Articles

Build Logo Detection REST API with SAP Cloud Foundry and RetinaNet

In this blog post, we’ll learn how to utilize RetinaNet object detection framework to detect and localize logo in images and build a REST API Python Flask app with SAP Cloud Foundry. The result will be visualized in HTML file.

Pre-requisities

High Level Steps

Here is the high level steps that we will going to perform.

Image Logo Dataset

I have prepared 185 image logo files as a dataset with only 1 class as we will only detect one logo in the image. This was done manually by downloading images from Google search query.
Extract the dataset to a folder.

Annotate Images

Install and open the LabelImg tool.

  1. Click Open Dir and select the logo dataset folder. The list of images will appear on the File List.
  2. Change the save format to PascalVOC.
  3. Find the logo in the image and draw the bounding box.
  4. Create a new box label “pfe” if it doesn’t exists.
  5. Save the label.
  6. Repeat the step 3 to 5 for all images in File List.

Once you save the the label, the XML file will be created with the bounding box information, bndbox. We will extract the information and put into the required Keras-RetinaNet format.

<annotation>
	<folder>images</folder>
	<filename>00000003.jpg</filename>
	<path>C:\FD\Py\DownloadImg\images\00000003.jpg</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>0</width>
		<height>0</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>pfe</name>
		<pose>Unspecified</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>1</xmin>
			<ymin>1</ymin>
			<xmax>1200</xmax>
			<ymax>693</ymax>
		</bndbox>
	</object>
</annotation>

Split into Train and Test

We have 185 datapoints, we need to shuffle and split 80% data for training and 20% data for testing. I have generated the train.txt and test.txt which consist of the unique image ID of the image filename. Take a look at those files.

Build Dataset

We will also need to prepare three CSV files that are required by Keras-RetinaNet framework library.

  • retinanet_classes.csv
    Contains the class name to integer ID mapping

    pfe,0​
  • retinanet_test.csv and retinanet_train.csv.
    Contains the image path,  bounding box annotation and the human readable class label.

    /content/keras-retinanet/images/00000096.jpg,37,21,380,224,pfe
    /content/keras-retinanet/images/00000596.jpg,179,20,578,317,pfe
    /content/keras-retinanet/images/00000250.jpg,27,11,207,119,pfe
    ...​

    The first entry is the path to the image, follows by the bounding box coordinates in the following order: start x, start y, end x, end yAnd the last entry is the human readable class label.

Run the Python script below to generate those CSV files:

python build_logos.py

Train RetinaNet to Detect Logos

Now we have all the required files to perform the training. We will use Google Colab with GPU.

Open Google Colab and set the notebook settings runtime type Python 3 and Hardware accelerator to GPU.

 

I have prepared the Python Jupyter notebook for this purpose. Run the notebook and let the network train for a total of 50 epochs.

Export Model

Once the training is completed, we need to export before we can evaluate the model or apply it to predict objects in our own images.

Download resnet50_csv_50.h5 from Google Colab.

Run the following command to convert:

retinanet-convert-model resnet50_csv_50.h5 output.h5

We will get the ready model output.h5.

Evaluate Model

To evaluate the model on a testing set, use the following command:

retinanet-evaluate csv retinanet_test.csv retinanet_classes.csv output.h5

From the evaluation we obtain mean average precision (mAP) 96%.

Python Flask REST API 

We will create a Python Flask app to detect logo in images and deploy it to SAP Cloud Foundry.

To test the app with this image, go to the SAP Cloud Foundry app URL and provide the url parameter with the link to the image file:

https://retinanet_tf.cfapps.eu10.hana.ondemand.com/img?url=https://c8.alamy.com/comp/BX8FGF/different-strengths-of-atorvastatin-trade-name-lipitor-made-by-pfizer-BX8FGF.jpg

The service will return a JSON response that includes score and bounding box coordinates for the identified objects:

[
  [
    "pfe: 1.00",
    "87",
    "106",
    "190",
    "178"
  ],
  [
    "pfe: 1.00",
    "219",
    "480",
    "295",
    "530"
  ],
  [
    "pfe: 1.00",
    "397",
    "547",
    "476",
    "600"
  ],
  [
    "pfe: 1.00",
    "1060",
    "67",
    "1129",
    "172"
  ],
  [
    "pfe: 0.84",
    "770",
    "710",
    "895",
    "782"
  ],
  [
    "pfe: 0.58",
    "585",
    "350",
    "703",
    "421"
  ]
]

Visualize in HTML

To visualize the result, create a simple HTML code and populate the bounding boxes, class and scores.

var name = "https://c8.alamy.com/comp/BX8FGF/different-strengths-of-atorvastatin-trade-name-lipitor-made-by-pfizer-BX8FGF.jpg";
var response = {
    "detection_boxes": [
        [
            87,
            106,
            190,
            178
        ],
		[
            219,
            480,
            295,
            530
        ],
		[
            397,
            547,
            476,
            600
        ],
		[
            1060,
            67,
            1129,
            172
        ],
		[
            770,
            710,
            895,
            782
        ],
		[
            585,
            350,
            703,
            421
        ]
    ],
    "detection_classes": [
        "pfe",
		"pfe",
		"pfe",
		"pfe",
		"pfe",
		"pfe"
    ],
    "detection_scores": [
        1.00,
		1.00,
		1.00,
		1.00,
		0.84,
		0.58
    ]
};

Full source code can be found on my Git repo and the generated model can be found here.

References

Be the first to leave a comment
You must be Logged on to comment or reply to a post.