In an attempt to uncover Expert Analytics’ potential for image analysis, I decided to take a set of images of the same area of earth over a long time period to show what changes, either agriculturally or through city expansion, have happened over said time period. I came across a website http://earthshots.usgs.gov who provide satellite images of environmental change and have pinpointed several great use cases to show this change over a certain time period. I found a number of interesting use cases including the shrinkage of Lake Chad, but the one I personally found most interesting was the expansion of Beijing city between 1977 and 2000, a time period that contained rapid growth of the city, and a reduction of the surrounding grassland, forests and agricultural land.
Figure 1 – Satellite Image of Beijing City from 1977
The image above is a satellite image of Beijing city taken in 1977. USGS have done some pre-processing on the image that involved colouring road and city areas in blue and grassland, forests and agricultural land in red. The rest of the images in the dataset were taken in 1983, 1995 and 2000.
Figure 2 – Satellite Images from 1983, 1995, 2000
It is clear from the images above that the city has expanded significantly between 1977 and 2000 which has resulted in a reduction of grassland, forests, etc. We as humans, can see this clearly but if we want to analyse this automatically, things get a little bit trickier.
As previously stated, each different type of landmass is represented by colour in these images, therefore we can represent growth or reduction of a certain landmass by the percentage of pixels its colour occupies over a series of images. However, this is difficult with these images as there are various different shades of blue representing roads and city areas and different shades of red representing grassland areas. In order to combat this, clustering can be used to reduce the amount of colours in the image to an amount that still represents the landscape correctly, but allows for simple and meaningful results.
As mentioned in the previous image analysis use cases, images are represented as multi-dimensional matrices, one for each of the three colour channels. In this state, we cannot cluster the images so we must reformat the data to allow for clustering. We want to create a separate data set of each image, with each pixel being a row in the data table. This data set would contain columns for the x position of the pixel, its y position, and its R, G and B values. Once the data is in this format, it will be easy to use a clustering algorithm in order to cluster the pixel colour values.
Figure 3 – A visual representation of the resulting data set.
The R code used to download the image from a URL and convert it into this format is as follows:
So, then I took the 1977 as the training image, which will form a baseline for the amount of city and grassland areas in the image so we can analyse their changes over time. After carrying out tests to determine the optimum number of clusters (lowest intra-distance), it was decided that four cluster centres would best represent the image. The centres of each cluster will be the average of the data points in that cluster and so will contain the average R, G and B value of the data points located in its cluster. This allows us to change all of the pixels to the colour value of the centre of the cluster it belongs to, reducing the amount of colours in the image to four.
Figure 4 – The original image and the clustered image.
Now that we have clustered the image into four clusters we can directly calculate how many blue city pixels there are as well as how many red grassland pixels.
Figure 5 – Pie chart representing the percentage of pixels per colour.
Now that the colour in the first image have been analysed, the kmeans model trained on the first image can be used to cluster the pixels in the subsequent images allowing us to track the increase or decrease of the percentages of pixels of each colour over that time period.
The subsequent three images go through the same pre-processing as before so that each row in each data set represents one pixel (note. Each image is represented in a separate data set). The images are then clustered using the trained kmeans model, after which we can calculate the percentage of pixels per colour as we did for the initial image and form a trend chart detailing each pixel colour over the course of the series of images.
Figure 6 – Colour Percentage over time
The colours we are most interested in are the red colour and the blue colour. It is clear from figure 6 that the amount of red pixels in the images has decreasedfrom about 35% to about 18% between the years 1977-2000 and the blue pixels have increased from 29% to 50% over the same period. This analysis proves in number what our eyes could see clearly from the images and shows that this process could be done automatically and be used in a customer’s business application if they worked with images.
After completing the image clustering analysis in R, it was time to integrate it with SAP Expert Analytics. The image URL’s were stored in an SAP HANA table so that they could be acquired by Expert Analytics. Once acquired, the first image is used to form the clustering model and analyse the base colour distributions using a custom R extension that carries out the analysis previously written in R. This clustering model can then be saved and the remaining images can be scored against it in order to generate the same chart as represented in figure 6 above.
Figure 7 – Cluster model training
Figure 8 – Charts generated from Training Phase
Figure 9 – Scoring new images against the clustering model
Figure 10 – Charts generated during scoring phase
Now that the analysis has been integrated with Expert Analytics successfully, we now have the ability to export this model as a stored procedure in SAP HANA.
This analysis of the expansion of Beijing City between 1977 and 2000 proves that image analysis is possible and a viable in Expert Analytics. Using custom R Extensions, Users are able to create their own image analyses and integrate them with their business applications using SAP HANA.