Skip to Content
Technical Articles

Use Jupyter for openSAP Getting Started with Data Science – Unit 4: Cluster Analysis

In Week 3 – Unit 4: Cluster Analysis, Stuart is getting hands on again and I will show how to replicate his demonstration in Jupyter instead of Orange.

Since I am using the SAP Data Intelligence, trial edition, that is not always up due to hyper scaler costs, I leverage a Raspberry Pi temporarily to import my Notebook later.

To start with, I install Pandas and load the provided data file:

Based on this, I get my cluster analysis:

Appendix

pip install pandas
import pandas as pd
df = pd.read_csv('/home/pi/notebooks/openSAP_ds3_STORES_US.csv')
df.head()
pip install matplotlib sklearn
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=0)
df['CLUSTER'] = kmeans.fit_predict(df[['TURNOVER', 'SIZE']])
plt.scatter(df.TURNOVER, df.SIZE, c=df.CLUSTER)
plt.xlabel('TURNOVER')
plt.ylabel('SIZE')
centroids = kmeans.cluster_centers_
cen_x = [i[0] for i in centroids] 
cen_y = [i[1] for i in centroids]
df['cen_x'] = df.CLUSTER.map({0:cen_x[0], 1:cen_x[1], 2:cen_x[2]})
df['cen_y'] = df.CLUSTER.map({0:cen_y[0], 1:cen_y[1], 2:cen_y[2]})
for idx, val in df.iterrows():
    x = [val.TURNOVER, val.cen_x]
    y = [val.SIZE, val.cen_y]
    plt.plot(x, y)
Be the first to leave a comment
You must be Logged on to comment or reply to a post.