Use Jupyter for openSAP Getting Started with Data ...

Technology Blogs by Members

Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!

In Week 3 – Unit 4: Cluster Analysis, Stuart is getting hands on again and I will show how to replicate his demonstration in Jupyter instead of Orange.

Since I am using the SAP Data Intelligence, trial edition, that is not always up due to hyper scaler costs, I leverage a Raspberry Pi temporarily to import my Notebook later.

To start with, I install Pandas and load the provided data file:

Based on this, I get my cluster analysis:

Appendix

pip install pandas

import pandas as pd

df = pd.read_csv('/home/pi/notebooks/openSAP_ds3_STORES_US.csv')

df.head()

pip install matplotlib sklearn

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=0)

df['CLUSTER'] = kmeans.fit_predict(df[['TURNOVER', 'SIZE']])

plt.scatter(df.TURNOVER, df.SIZE, c=df.CLUSTER)

plt.xlabel('TURNOVER')

plt.ylabel('SIZE')

centroids = kmeans.cluster_centers_

cen_x = [i[0] for i in centroids] 

cen_y = [i[1] for i in centroids]

df['cen_x'] = df.CLUSTER.map({0:cen_x[0], 1:cen_x[1], 2:cen_x[2]})

df['cen_y'] = df.CLUSTER.map({0:cen_y[0], 1:cen_y[1], 2:cen_y[2]})

for idx, val in df.iterrows():

    x = [val.TURNOVER, val.cen_x]

    y = [val.SIZE, val.cen_y]

    plt.plot(x, y)