Anki Cozmo meets SAP Cloud Platform – Part 2: Data Science
This blog supplements the first part which prepared the IoT connection of the Anki Cozmo with the SAP Cloud Platform. We assume you have worked through the first part and have all the tools, accounts and code set up and running. In this second part, we discuss processing the sensor data from the robot.
Recall that in the first part we make the Cozmo robot drive in squares and send its location data into a database in the cloud. Now we look at using this data for generating insights and making smart decisions.
After walking through this part, you will have a basic understanding and a simple implementation of the following:
- Initial data browsing and visualization via Eclipse
- Data analysis with the SAP Predictive Analytics Library (PAL)
Again, in this tutorial we think of the Cozmo robot as a symbol for an autonomous vehicle, for example a forklift in a warehouse. We might interpret the stopping points of the robot (which we have in the database) as pickup locations in the warehouse. We will implement a visualization of the stopping points and a clustering algorithm to find the hotspot areas. In warehouse operations, the information about the hotspot areas of all robots might be of interest to identify potential collision points, to evaluate the efficiency of the task distribution among robots or even to deduct a better warehouse layout.
Experienced warehouse operators or data scientist among us can probably come up with more concise and realistic machine learning applications in the warehouse.
In the first step, we find the sensor data from the robot in the database and visualize it quickly within the Eclipse development environment. Then we walk through an example applying a clustering algorithm from the PAL to the data.
Sensor data browsing in Eclipse
In this first step, we work with Eclipse to locate the database table with our robot’s sensor data. We use the developer tools to quickly click together a first visualization of the sensor data. Purpose of his step is to get acquainted with the development environment and to get a first feel of the data.
If not yet done you need to install Eclipse on your notebook. Plus, install the plugin with the SAP development tools.
Once you have Eclipse installed and opened you can connect your HANA database (which you created in the first part of the blog). In Eclipse, open the SAP HANA development perspective and go to the Systems view. From the context menu choose “Add cloud system” and fill in the account information by specifying your SAP SCP trial account name, your user name on the SCP and its password. On the second screen choose your database. Note that your database user is SYSTEM so make sure to specify the credentials of the SYSTEM user.
Once you have successfully connected with your database find the table with your robot sensor data: Expand the “Catalog” folder, then the “SYSTEM” schema folder, then “Tables”. You will find a table with the technical ID of your IoT COZMO_POSITION message type in its name. For this table, open the data preview as indicated in the screenshot:
On the right side, the raw data of the table is displayed. You can easily visualize it by going to the Analysis tab. For example, a scatter plot with the X and Y coordinates of the robot can give a first visual impression about the robot’s action space: Drag the columns
C_POSY to the value axis area and the column
G_CREATED to the label axis area. Change the chart type to scatter and you should be presented with something like the following. If you have had your robot complete several rounds of driving in squares you should see the corners resembled in your scatter plot.
For reasonably small datasets like our toy data, looking at the visualization might inspire some ideas for analyses to try out: Depending on the semantics of the underlying data you might want to identify clusters in your data, find outliers etc.
Data science with sensor data
For our toy robot example, there is of course limited potential for proper data science approaches. To illustrate the approach with using the Predictive Analytics Library (PAL) we perform an exemplary cluster analysis on the sensor data.
The approach for using the analytics tools of the PAL is to first generate the data structures and a specific wrapper for the tool of choice. After this one-off preparation, the PAL tool is then ready to use any time. For more background information on this check out the saphanaacademy on Youtube. In the following we will be making use of their groundwork as provided via https://github.com/saphanaacademy/PAL, especially their code snippets.
To perform a k-means clustering analysis we take their code snippet as a basis and adjust it to our robot data. We can subdivide the SQL script in three parts and execute them in the SQL console in Eclipse. If you are new to HANA development in Eclipse: You can open the SQL console via the context menu of your cloud system in the systems view.
Part 1: Metadata: Create the PAL wrapper
This part comprises specifying the data types with respect to which the clusters are determined. Let’s look at the code.
To collect all our artifacts in a well-arranged place we start with creating our own schema called
PAL_T_KM_DATA with the input data resembles the sensor data we collect from the robot.
PAL_T_KM_CENTERS specifies the center point for each cluster – which we are interested to have in terms of x-y-z coordinates of the robot.
PAL_T_KM_RESULTS specifies for each data record the assigned cluster and the distance from the cluster center. In our robot example, we use the timestamp of creation (
G_CREATED) as the unique identifier of the data record.
Finally, note that as compared to the original code snippet, the interface structures of
SYS.AFLLANG_WRAPPER_PROCEDURE have slightly changed in the meantime and it is now necessary to specify the schema in which the referenced objects reside.
Execute these lines of code in the SQL console to create your PAL wrapper specifically for the Cozmo data.
CREATE SCHEMA "COZMO"; SET SCHEMA "COZMO"; CREATE TYPE PAL_T_KM_DATA AS TABLE ( G_CREATED INTEGER, C_POSX DOUBLE, C_POSY DOUBLE, C_POSZ DOUBLE); CREATE TYPE PAL_T_KM_PARAMS AS TABLE ( NAME VARCHAR(60), INTARGS INTEGER, DOUBLEARGS DOUBLE, STRINGARGS VARCHAR (100)); CREATE TYPE PAL_T_KM_RESULTS AS TABLE ( G_CREATED INTEGER, CENTER_ID INTEGER, DISTANCE DOUBLE); CREATE TYPE PAL_T_KM_CENTERS AS TABLE ( CENTER_ID INTEGER, C_POSX DOUBLE, C_POSY DOUBLE, C_POSZ DOUBLE); CREATE COLUMN TABLE PAL_KM_SIGNATURE ("POSITION" INTEGER, "SCHEMA_NAME" NVARCHAR(256), "TYPE_NAME" NVARCHAR(256), "PARAMETER_TYPE" VARCHAR(7)); INSERT INTO PAL_KM_SIGNATURE VALUES ( 1, 'COZMO', 'PAL_T_KM_DATA', 'IN'); INSERT INTO PAL_KM_SIGNATURE VALUES ( 2, 'COZMO', 'PAL_T_KM_PARAMS', 'IN'); INSERT INTO PAL_KM_SIGNATURE VALUES ( 3, 'COZMO', 'PAL_T_KM_RESULTS', 'OUT'); INSERT INTO PAL_KM_SIGNATURE VALUES ( 4, 'COZMO', 'PAL_T_KM_CENTERS', 'OUT'); CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE( 'AFLPAL', 'KMEANS','COZMO' ,'PAL_KM',PAL_KM_SIGNATURE);
Part 2: Data retrieval: Prepare views and parameters
After having created the abstract data structures in the previous part, we now create views with respect to these data structures and populate them with data from the actual database table holding the sensor data.
Looking at the code, note the listing of columns with our sensor data and also note the reference to the
T_IOT* table for the
COZMO_POSITION message type. Make sure to replace it by your own table name.
There is a quick hack of converting the creation timestamp (
G_CREATED) into an integer since the PAL appears to not accept timestamps as identifiers of data records. Feel free to implement a more robust conversion instead.
Finally, we set the
GROUP_NUMBER for the number of clusters to be created to 4. In our very simple robot example we know that exactly 4 clusters are reasonable. For the general practice, there exist techniques like the elbow method to identify the appropriate number of clusters.
SET SCHEMA "COZMO"; CREATE VIEW V_KM_DATA AS SELECT to_integer(to_varchar(G_CREATED,'MMDDHHSS')) as G_CREATED, C_POSX, C_POSY, C_POSZ FROM SYSTEM.T_IOT_95302F6C6D8F4954BF8B; CREATE COLUMN TABLE KM_PARAMS LIKE PAL_T_KM_PARAMS; CREATE COLUMN TABLE KM_RESULTS LIKE PAL_T_KM_RESULTS; CREATE COLUMN TABLE KM_CENTERS LIKE PAL_T_KM_CENTERS; CREATE VIEW V_KM_RESULTS AS SELECT a.G_CREATED, b.C_POSX, b.C_POSY, b.C_POSZ, a.CENTER_ID + 1 AS CLUSTER_NUMBER FROM KM_RESULTS a, SYSTEM.T_IOT_95302F6C6D8F4954BF8B b WHERE a.G_CREATED = to_integer(to_varchar(b.G_CREATED,'MMDDHHSS')); INSERT INTO KM_PARAMS VALUES ('THREAD_NUMBER', 2, null, null); INSERT INTO KM_PARAMS VALUES ('GROUP_NUMBER', 4, null, null); INSERT INTO KM_PARAMS VALUES ('INIT_TYPE', 1, null, null); INSERT INTO KM_PARAMS VALUES ('DISTANCE_LEVEL', 2, null, null); INSERT INTO KM_PARAMS VALUES ('MAX_ITERATION', 100, null, null); INSERT INTO KM_PARAMS VALUES ('NORMALIZATION', 0, null, null); INSERT INTO KM_PARAMS VALUES ('EXIT_THRESHOLD', null, 0.0001, null);
After executing also this second SQL script you should find the
COZMO schema in your catalog along with the newly created types, tables, views and the
PAL_KM stored procedure.
Part 3: K-Means analysis
With everything set up, the actual trigger of the k-means analysis reduces to single line of code:
CALL COZMO.PAL_KM (V_KM_DATA, KM_PARAMS, KM_RESULTS, KM_CENTERS) WITH OVERVIEW;
The stored procedure determines the cluster centers and stores them in table
COZMO.KM_CENTERS. Visualize the resulting centers via the data preview tool as discussed above and – voilà – you should see a result like this.
Congratulations for making it up to here! Hopefully this simple PAL example inspired some ideas how and why to approach data analysis for sensor data from robots.
Like, if you had several robots in your warehouse, what insights would you want to find out about their whereabouts and activities? Feel free to check out the other PAL functions to see if there is a reusable function for the kind of analysis you have in mind.