Automated Yard Processes using TinyML - 2 of 2

former_member16792 · ‎09-29-2022

Introduction

This is part 2 of the blog on automation of the yard process. In part one, we have seen how the different architectures tie together.

In this part, it will be a more in-depth look at the Machine Learning model specifically the digit recognition that is deployed in the microcontroller.

Machine Learning Model

One key aspect of ML models in microcontrollers would be the need to convert to TensorFlow Lite (TFLite). During this process, additional optimization such as quantization can be performed to reduce model size and latency.

License Plate Detection

Given that speed & accuracy are vital in the detection - since we do not want the driver to wait too long for detection which would negate the benefits - the search space for the machine learning algorithm is narrowed.

As this is part of an innovation showcase to be put up in the Singapore Experience Centre, Singapore license plates were used. The license plates pasted on the toy trucks were made to replicate the real-world scenario. As such, the typeface used was Charles Wright and the format of the license plates follow that of actual Prime Movers - Class 5 vehicles which transport goods.

Firstly, the truck license plate is detected with Micropython’s find_rect function. Thereafter, individual characters are extracted from the cropped license plate using the find_blobs function. Optical Character Recognition (OCR) TinyML models were trained in EdgeImpulse development platform, where the individual characters are classified with the machine learning algorithms. In EdgeImpulse platform, the image training and testing performance accuracies were high.

However, in practice, when deployed, it was found that the OCR TinyML models failed to recognise the license plate numbers despite having high model performance in training and testing. The disparity between deployed and train/test performance might be attributed to the following.

For example, the detected image in deployment differs from the train/test images in terms of lighting, angle, and even noise. Frequently, an extracted character was a number but an alphabet was detected or vice versa with a 1-model approach, where the TinyML model tries to predict 36 classes per character (10 numbers and 26 alphabets). Therefore, to assist the deployed models to correctly recognize the license plate numbers, it is necessary to exploit domain knowledge of the license plate number: Where should a number be expected, and where should an alphabet be expected? Consequently, a 2-models approach was developed, where a number recognition TinyML model is run in character slots where numbers are expected, and an alphabet recognition TinyML model is run in character slots where alphabets are expected, thus reducing the search space and the probability of false detection.

The corresponding license plate is of the form ‘xx ####y’ where ‘x’ and ‘y’ denote alphabets and # denotes numbers. From this domain knowledge, the first alphabet is always ‘X’ and the second is either ‘D’ or ‘E’ - current iteration is E but there exists vehicles with the previous iteration ‘D’ hence both are included. #### simply denote the numbers from ‘0000’ to ‘9999’ with leading zeroes accepted. ‘y’ is a checksum which follows an algorithm that is available online. For brevity, this algorithm will not be mentioned here. But having the checksum allows for the final character to be identified based on the checksum algorithm without the need for a detection.

Since the first character is always ‘X’, and the last character ‘y’ is a checksum, it is not necessary to run the machine learning models on all 7 characters. Instead, predictions on the 5 characters are sufficient. Further, since the second character is always ‘D’ or ‘E’ due to domain knowledge rules, it is deemed unnecessary to train a 26-character A-Z OCR character recognition model. Instead, a binary class D-E OCR character recognition TinyML model suffices.

Taken together, when the rectangle shape of the license plate is detected and each character is extracted, two machine learning models are run to read the license plate numbers. First, a D-E prediction on the second character. Secondly, in the third, fourth, fifth-, and sixth-character slots, numbers recognition predictions are performed. For each character slot, TinyML outputs the probabilities of class predictions. For each character slot in the numbers character, the top 3 class predictions are output (for the alphabet prediction, the deployed model works well and only the top 1 class prediction is chosen).

Finally, all possible combinations from the candidate class predictions are assembled. If any of the candidate solution matches the pre-approved master list of license plate number, that candidate solution is deemed to be the detected license plate number and an authorised license plate number is deemed to have been found. The figure below shows this.

Figure 1: Snippet showing solution candidates identified for license plate XD3386L

In Figure 1 above, the 3 rows depict the 3 independent machine learning inferences performed on the license plate number ‘XD3386L’. In each of the number character slot, the 3 numbers show the top 3 class prediction for the character slot. For example, [‘1’, ‘3’, ‘6’] indicates that the TinyML algorithm predictions that the number to be a character 1, 3, or 6. In this example, the third inference correctly recognizes the license plate number since one of the candidate solutions matches the license plate number.

Dangerous Goods

Note: Dangerous Goods code is installed but it is not used in the demo set-up presently with the aim of using it for subsequent processes as the demo gets refined.

Given the nature of Dangerous Goods, there are standardized regulations set forth by the United Nations (UN) to ensure harmonization and which its signatories adhere to. One instance would be affixing the labels clearly on the cargo transport units.

A sample of the possible hazard labels from United Nations Economic Commission for Europe (UNECE) are shown herein Figure 2:

Figure 2: Hazard label samples from UNECE

For the purposes of the demo showcase, 3 classes were chosen to be pasted on the truck. An additional 4th class - No Dangerous Goods - will also be present and simply indicates that the truck is not carrying any such goods.

Figure 3: Chosen hazard labels for Dangerous Goods Classification

Data Collection

These 3 classes were then printed out and pasted to the side of the toy truck. 1500 photos of each class - 3 + 1 background class - were captured directly using the Arduino Portenta H7 + Vision Shield camera component and OpenMV IDE. Figure 4 below shows the actual image as captured by the Arduino Portenta H7.

Figure 4: Actual photo as taken from Arduino Portenta H7 + Vision Shield

Model Selection & Training

Google Colab Pro was used as it had a decent GPU for training. The photos were placed into a train folder on drive & an 80-20 train-validation split was done. The method of splitting was via split-folders

!pip install split-folders

Setting directories for Train-Val Split

The images are collated in train directory with each image being in a sub-folder

Figure 5: Each class split into its own sub-folder

An empty output folder is created into which the 80-20 train-val split will be placed

import splitfolders 

train_dir = r'/content/drive/MyDrive/Colab Notebooks/Dangerous goods/Train'

output_dir = r'/content/drive/MyDrive/Colab Notebooks/Dangerous goods/Output'

splitfolders.ratio(train_dir, output=output_dir, seed=5126, ratio=(.8, .2),group_prefix=None, move=False)

 

##re-instantiate new dir for the actual training-test-split 

new_train = r'/content/drive/MyDrive/Colab Notebooks/Dangerous goods/Output/train'

new_val = r'/content/drive/MyDrive/Colab Notebooks/Dangerous goods/Output/val'



train_data_gen = tf.keras.preprocessing.image_dataset_from_directory(new_train, label_mode='categorical', image_size=(IMG_WIDTH,IMG_HEIGHT), batch_size=batch_size, color_mode='grayscale')

val_data_gen = tf.keras.preprocessing.image_dataset_from_directory(new_val, label_mode='categorical', image_size=(IMG_WIDTH,IMG_HEIGHT), batch_size=batch_size, color_mode='grayscale')

Model selection

All images were resized to 96x96 to ensure consistency with the Digit Recognition Model which was trained on Edge Impulse. Since the images are already grayscale, and there was not much complexity with the classes being distinguished by the lines and/or words, a simple model from scratch was written

IMG_WIDTH =  96

IMG_HEIGHT = 96

batch_size = 64





model = Sequential()



# convolutional layer

model.add(Conv2D(32, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu', input_shape=(IMG_WIDTH,IMG_HEIGHT,1),kernel_constraint=tf.keras.constraints.MaxNorm(1))) # padding='same' so there's no dimensionality reduction especially since our data isn't feature-rich

model.add(MaxPool2D(pool_size=(2,2),strides=2, padding='same')) 



#2nd layer 

model.add(Conv2D(32, kernel_size=(3,3), activation='relu', padding='same',kernel_regularizer='l1'))

model.add(MaxPool2D(pool_size=(2,2),strides=2, padding='same'))



#3rd layer

model.add(Conv2D(16, kernel_size=(3,3), activation='relu', padding='same'))

model.add(MaxPool2D(pool_size=(2,2), strides=2, padding='valid'))

 

#flatten output of conv

model.add(Flatten())



# FCNN layer

model.add(Dense(16, activation='relu'))



#add dropout 

model.add(Dropout(0.3))



# Output layer

model.add(Dense(4,activation='softmax'))

Compiling Model

The following metrics - Categorical Accuracy, Matthews Correlation Coefficient (MCC), Receiver Operating Characteristic Area Under Curve (ROC-AUC), & Precision - were measured with the key metric being categorical accuracy

from tensorflow.keras.optimizers import SGD, Adam

 

opt = Adam(learning_rate=0.0003)



#Compile the model

model.compile(optimizer=opt,

              loss='categorical_crossentropy',

              metrics=[tf.keras.metrics.CategoricalAccuracy(),                       

                       tfa.metrics.MatthewsCorrelationCoefficient(num_classes=4,name='mcc'),

                       tf.keras.metrics.AUC(name='AUC'),

                       tf.keras.metrics.Precision(name='Precision')

                       ])

The model summary is as shown

Figure 6: Model summary

Through the various iterations, it was discovered that not having too many parameters was vital to minimize overfitting. A smaller number would also result in a smaller file size when converted to TF Lite.

Training model

Earlystopping, ReduceLROnPlateau, & ModelCheckpoint callbacks were used. The latter being essential to get the best performing model for conversion.

earlystopping = tf.keras.callbacks.EarlyStopping(monitor = 'val_loss', 

                                                 patience = 15,

                                                 mode = 'min',                                                  restore_best_weights = True,

                                                 verbose = 1)

 

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor = 'val_loss',

                                                 factor = 0.5,

                                                 patience = 2,

                                                 verbose = 1,

                                                 min_delta = 1e-5,

                                                 mode = 'min')

 

checkpoint_filepath = '/content/drive/MyDrive/Colab Notebooks/Dangerous goods/DG_4classes_96x96_220808__{epoch}.h5' 

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(

        checkpoint_filepath,

        monitor="val_loss",

        save_best_only=True,

    )



#earlystopping,

callbacks = [ reduce_lr, earlystopping, checkpoint_callback]



##Fitting the model

EPOCHS =150

history = model.fit(train_data_gen,

                    validation_data=val_data_gen,

                    verbose=2,

                    callbacks=callbacks,

                    epochs=EPOCHS)

The best performing model - epoch 93 - ended up with a validation accuracy of 99.92% with a corresponding validation loss of 4.3%

TFLite

The model was then full integer quantized to TFLite for eventual deployment on the microcontroller.

Representative dataset

For full-integer quantization, a representative dataset is required for calibration to be performed. Hence the same code for instantiating training was used with the main difference being a batch size of 1.

train_ds_for_conversion = tf.keras.preprocessing.image_dataset_from_directory(new_train, label_mode='categorical', image_size=(IMG_WIDTH,IMG_HEIGHT), batch_size=1, color_mode='grayscale')

 

def represent_data_gen():

  for image_batch, labels_batch in train_ds_for_conversion:

    yield [image_batch]

Convert to TFLite

#convert the best model to tflite

keras_model = tf.keras.models.load_model('/content/drive/MyDrive/Colab Notebooks/Dangerous goods/DG_4classes_96x96_220808__93.h5')



#Convert to TFLite 

converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)

converter.optimizations = [tf.lite.Optimize.DEFAULT]

converter.experimental_new_converter=True

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

converter.representative_dataset = represent_data_gen 



#representative_dataset  

converter.target_spec.supported_types = [tf.int8]

converter.inference_input_type = tf.int8  

converter.inference_output_type = tf.int8  

 

tfmodel = converter.convert()

A simple labels .txt file was manually created with the 4 classes being in order in which they appeared (& were trained) in the code - Class1,Class3,Class7,NoDG. Both files were then placed into the microcontroller, with the quantized TFLite file coming in at 57kB.

The code and the label file were called together through a template made accessible in OpenMV IDE for deploying Edge Impulse trained models. Simple changes in the filenames were all that were required since coding it from scratch using Tensorflow produced same output model file as with using Edge Impulse

Conclusion

In this blog, we have seen how the machine learning model was constructed. By narrowing the search space from domain knowledge of Singapore license plates, we are able to optimise the detection and rapidly increase the detection time.

In addition, the initial 1-model approach performed well in training/testing but did not perform as well in actual deployment. The subsequent 2-model approach – one for numbers and the other for alphabets – performed much better with a decrease in false positives.

We have also seen how with aid of open-source platforms like Edge Impulse, training a quantized model is made more accessible especially if transfer learning is involved. Coding from scratch and quantization is also made accessible through Tensorflow Lite, with the Dangerous Goods model having performed well in addition to coming in at just 57kB.

As popularity in edge AI develops, increased support for quantization will allow for even more powerful models to be deployed in future.

We hope you have enjoyed this series of blog posts. If you have any questions or would like to know more about the process, feel free to write in to us for more details. Do feel free to provide feedback in the comments.