Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
architectSAP
Active Contributor
In Week 3 – Unit 4: Cluster Analysis, Stuart is getting hands on again and I will show how to replicate his demonstration in Jupyter instead of Orange.

Since I am using the SAP Data Intelligence, trial edition, that is not always up due to hyper scaler costs, I leverage a Raspberry Pi temporarily to import my Notebook later.

To start with, I install Pandas and load the provided data file:


Based on this, I get my cluster analysis:



Appendix


pip install pandas

import pandas as pd
df = pd.read_csv('/home/pi/notebooks/openSAP_ds3_STORES_US.csv')
df.head()

pip install matplotlib sklearn

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=0)
df['CLUSTER'] = kmeans.fit_predict(df[['TURNOVER', 'SIZE']])
plt.scatter(df.TURNOVER, df.SIZE, c=df.CLUSTER)
plt.xlabel('TURNOVER')
plt.ylabel('SIZE')
centroids = kmeans.cluster_centers_
cen_x = [i[0] for i in centroids]
cen_y = [i[1] for i in centroids]
df['cen_x'] = df.CLUSTER.map({0:cen_x[0], 1:cen_x[1], 2:cen_x[2]})
df['cen_y'] = df.CLUSTER.map({0:cen_y[0], 1:cen_y[1], 2:cen_y[2]})
for idx, val in df.iterrows():
x = [val.TURNOVER, val.cen_x]
y = [val.SIZE, val.cen_y]
plt.plot(x, y)
Labels in this area