# Visualization of Clustering Algorithms in SAP HANA using JavaScript D3 library

This blog begins with a short introduction to the KMeans and the DBSCAN Algorithms. It then explains how to import the Sample data set in SAP HANA and create the support tables and run the procedures on the data. Finally, the result table is used to create a json object which is passed to the D3 script. The visualization result is obtained from this.

# INTRODUCTION TO K MEANS ALGORITHM

In predictive analysis, k-means clustering is a method of cluster analysis. The k-means algorithm partitions n observations or records into k clusters in which each observation belongs to the cluster with the nearest center. In marketing and customer relationship management areas, this algorithm uses customer data to track customer behavior and create strategic business initiatives. Organizations can thus divide their customers into segments based on variants such as demography, customer behavior, customer profitability, measure of risk, and lifetime value of a customer or retention probability.

Clustering works to group records together according to an algorithm or mathematical formula that attempts to find centroids, or centers, around which similar records gravitate. The most common algorithm uses an iterative refinement technique. It is also referred to as Lloyd’s algorithm:

Given an initial set of k means m1, …, mk, the algorithm proceeds by alternating between two steps:

• Assignment step: assigns each observation to the cluster with the closest mean.
• Update step: calculates the new means to be the center of the observations in the cluster.

The algorithm repeats until the assignments no longer change.

## Prerequisites

• The input data contains an ID column and the other columns are of integer or double data type.
• The input data does not contain null value. The algorithm will issue errors when encountering null values.

# INTRODUCTION TO DBSCAN ALGORITHM

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based data clustering algorithm. It finds a number of clusters starting from the estimated density distribution of corresponding nodes.

DBSCAN requires two parameters: scan radius (eps) and the minimum number of points required to form a cluster (minPts). The algorithm starts with an arbitrary starting point that has not been visited. This point’s

eps-neighborhood is retrieved, and if the number of points it contains is equal to or greater than minPts, a cluster is started. Otherwise, the point is labeled as noise. These two parameters are very important and are usually determined by user.

PAL provides a method to automatically determine these two parameters. You can choose to specify the parameters by yourself or let the system determine them for you.

## Prerequisites

• No missing or NULL data in the input.
• The data is numeric, not categorical

# IMPORTING THE DATA SET IN SAP HANA

1. Select a dataset that is appropriate for Clustering and should satisfy the prerequisites. Following link will be helpful –  https://archive.ics.uci.edu/ml/datasets.html
2. Open the SAP HANA Development Environment.
3. Right Click on the System and open an SQL console.
4. Create a new schema with the following command –

Create schema <schema_name>;

1. Refresh the Catalog to see your schema.
2. Now to import the Table into the Schema, follow the following steps-
• Click on File>Import>Data from Local File and Click next.
• Follow the steps to import the table into your newly created schema.

(Note: I have imported my table under the name CUSTDATA)

After creation, the schema looks like follows. You can view the table content from the schema.

# RUNNING THE KMEANS and the DBSCAN Algorithm AND VISUALIZING THE RESULT

Open the SQL console run the following commands to create the result tables from the input data set.

CREATING TABLES FOR THE KMEANS ALGORITHM

SET SCHEMA TEST_TWO ;

DROP TYPE PAL_KMEANS_DATA_T;

CREATE TYPE PAL_KMEANS_DATA_T AS TABLE(

“ID” INTEGER,

“Fresh” INTEGER,

“Milk” INTEGER,

“Fruits” INTEGER

);

DROP TYPE PAL_CONTROL_T;

CREATE TYPE PAL_CONTROL_T AS TABLE(

“NAME” VARCHAR (100),

“INTARGS” INTEGER,

“DOUBLEARGS” DOUBLE,

“STRINGARGS” VARCHAR (100)

);

DROP TYPE PAL_KMEANS_ASSIGNED_T;

CREATE TYPE PAL_KMEANS_ASSIGNED_T AS TABLE(

“ID” INTEGER,

“CLUSTER” INTEGER,

“DISTANCE” DOUBLE,

“SLIGHT_SILHOUETTE” DOUBLE

);

DROP TYPE PAL_KMEANS_CENTERS_T;

CREATE TYPE PAL_KMEANS_CENTERS_T AS TABLE(

“CLUSTER_ID” INTEGER,

“Fresh” DOUBLE,

“Milk” DOUBLE,

“Fruits” DOUBLE

);

DROP TYPE PAL_KMEANS_SIL_CENTERS_T;

CREATE TYPE PAL_KMEANS_SIL_CENTERS_T AS TABLE(

“CLUSTER_ID” INTEGER,

“SLIGHT_SILHOUETTE” DOUBLE

);

DROP TYPE PAL_KMEANS_STATISTIC_T;

CREATE TYPE PAL_KMEANS_STATISTIC_T AS TABLE(

“NAME” VARCHAR(50),

“VALUE” DOUBLE

);

DROP TABLE PAL_KMEANS_PDATA_TBL;

CREATE COLUMN TABLE PAL_KMEANS_PDATA_TBL(“POSITION” INTEGER, “SCHEMA_NAME” VARCHAR(100), “TYPE_NAME” VARCHAR(100), “PARAMETER_TYPE” VARCHAR(100));

INSERT INTO PAL_KMEANS_PDATA_TBL VALUES (1, ‘TEST_TWO’, ‘PAL_KMEANS_DATA_T’, ‘IN’);

INSERT INTO PAL_KMEANS_PDATA_TBL VALUES (2, ‘TEST_TWO’, ‘PAL_CONTROL_T’, ‘IN’);

INSERT INTO PAL_KMEANS_PDATA_TBL VALUES (3, ‘TEST_TWO’, ‘PAL_KMEANS_ASSIGNED_T’, ‘OUT’);

INSERT INTO PAL_KMEANS_PDATA_TBL VALUES (4, ‘TEST_TWO’, ‘PAL_KMEANS_CENTERS_T’, ‘OUT’);

INSERT INTO PAL_KMEANS_PDATA_TBL VALUES (5, ‘TEST_TWO’, ‘PAL_KMEANS_SIL_CENTERS_T’, ‘OUT’);

INSERT INTO PAL_KMEANS_PDATA_TBL VALUES (6, ‘TEST_TWO’, ‘PAL_KMEANS_STATISTIC_T’, ‘OUT’);

CALL “SYS”.AFLLANG_WRAPPER_PROCEDURE_DROP(‘TEST_TWO’, ‘PAL_KMEANS_PROC’);

CALL “SYS”.AFLLANG_WRAPPER_PROCEDURE_CREATE(‘AFLPAL’, ‘KMEANS’, ‘TEST_TWO’, ‘PAL_KMEANS_PROC’, PAL_KMEANS_PDATA_TBL);

DROP TABLE #PAL_CONTROL_TBL;

CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL(

“NAME” VARCHAR (100),

“INTARGS” INTEGER,

“DOUBLEARGS” DOUBLE,

“STRINGARGS” VARCHAR (100)

);

INSERT INTO #PAL_CONTROL_TBL VALUES (‘THREAD_NUMBER’, 2, null, null);

INSERT INTO #PAL_CONTROL_TBL VALUES (‘GROUP_NUMBER’, 4, null, null);

INSERT INTO #PAL_CONTROL_TBL VALUES (‘INIT_TYPE’, 1, null, null);

INSERT INTO #PAL_CONTROL_TBL VALUES (‘DISTANCE_LEVEL’,2, null, null);

INSERT INTO #PAL_CONTROL_TBL VALUES (‘MAX_ITERATION’, 100, null, null);

INSERT INTO #PAL_CONTROL_TBL VALUES (‘EXIT_THRESHOLD’, null, 1.0E-6, null);

INSERT INTO #PAL_CONTROL_TBL VALUES (‘CATEGORY_WEIGHTS’, null, 0.5, null);

DROP TABLE PAL_KMEANS_DATA_TBL;

CREATE COLUMN TABLE PAL_KMEANS_DATA_TBL LIKE PAL_KMEANS_DATA_T;

INSERT INTO PAL_KMEANS_DATA_TBL SELECT * FROM CUSTDATA;

DROP TABLE PAL_KMEANS_ASSIGNED_TBL;

CREATE COLUMN TABLE PAL_KMEANS_ASSIGNED_TBL LIKE PAL_KMEANS_ASSIGNED_T;

DROP TABLE PAL_KMEANS_CENTERS_TBL;

CREATE COLUMN TABLE PAL_KMEANS_CENTERS_TBL LIKE PAL_KMEANS_CENTERS_T;

DROP TABLE PAL_KMEANS_SIL_CENTERS_TBL;

CREATE COLUMN TABLE PAL_KMEANS_SIL_CENTERS_TBL LIKE PAL_KMEANS_SIL_CENTERS_T;

DROP TABLE PAL_KMEANS_STATISTIC_TBL;

CREATE COLUMN TABLE PAL_KMEANS_STATISTIC_TBL LIKE PAL_KMEANS_STATISTIC_T;

CALL “TEST_TWO”.PAL_KMEANS_PROC(PAL_KMEANS_DATA_TBL, #PAL_CONTROL_TBL, PAL_KMEANS_ASSIGNED_TBL, PAL_KMEANS_CENTERS_TBL, PAL_KMEANS_SIL_CENTERS_TBL, PAL_KMEANS_STATISTIC_TBL) with OVERVIEW;

Now, I will insert a parameter size into the output table which is the size of the bubbles in the Circle packing.

Alter table “TEST_TWO”.“PAL_KMEANS_ASSIGNED_TBL” ADD (Size Integer);

Update “TEST_TWO”.“PAL_KMEANS_ASSIGNED_TBL” set Size = 1500;

The result tables can be viewed as follows-

SELECT * FROM PAL_KMEANS_ASSIGNED_TBL;

SELECT * FROM PAL_KMEANS_CENTERS_TBL;

SELECT * FROM PAL_KMEANS_SIL_CENTERS_TBL;

SELECT * FROM PAL_KMEANS_STATISTIC_TBL;

CREATING TABLES FOR THE DBSCAN ALGORITHM

SET SCHEMA DBSCAN;

DROP TYPE PAL_DBSCAN_DATA_T;

CREATE TYPE PAL_DBSCAN_DATA_T AS TABLE ( ID integer, ATTRIB1 integer, ATTRIB2 integer, ATTRIB3 integer);

DROP TYPE PAL_CONTROL_T;

CREATE TYPE PAL_CONTROL_T AS TABLE( NAME varchar(50), INTARGS integer, DOUBLEARGS double, STRINGARGS varchar(100));

DROP TYPE PAL_DBSCAN_RESULTS_T;

CREATE TYPE PAL_DBSCAN_RESULTS_T AS TABLE( ID integer, RESULT integer);

DROP TABLE PAL_DBSCAN_PDATA_TBL;

CREATE COLUMN TABLE PAL_DBSCAN_PDATA_TBL(“POSITION” INTEGER, “SCHEMA_NAME” VARCHAR(100), “TYPE_NAME” VARCHAR(100), “PARAMETER_TYPE” VARCHAR(100));

INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (1, ‘DBSCAN’,‘PAL_DBSCAN_DATA_T’, ‘IN’);

INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (2, ‘DBSCAN’,‘PAL_CONTROL_T’, ‘IN’);

INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (3, ‘DBSCAN’,‘PAL_DBSCAN_RESULTS_T’,‘OUT’);

call SYS.AFLLANG_WRAPPER_PROCEDURE_DROP(‘DBSCAN’,‘PAL_DBSCAN’);

call SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE(‘AFLPAL’,‘DBSCAN’, ‘DBSCAN’,‘PAL_DBSCAN’, PAL_DBSCAN_PDATA_TBL);

DROP TABLE PAL_DBSCAN_DATA_TBL;

CREATE COLUMN TABLE PAL_DBSCAN_DATA_TBL ( ID integer, ATTRIB1 integer, ATTRIB2 integer, ATTRIB3 integer);

INSERT INTO PAL_DBSCAN_DATA_TBL SELECT * FROM CUSTOMERS_DATA;

DROP TABLE #PAL_CONTROL_TBL;

CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL( NAME varchar(50), INTARGS integer, DOUBLEARGS double, STRINGARGS varchar(100));

INSERT INTO #PAL_CONTROL_TBL VALUES(‘AUTO_PARAM’,null,null,‘true’);

INSERT INTO #PAL_CONTROL_TBL VALUES(‘DISTANCE_METHOD’,1,null,null);

DROP TABLE PAL_DBSCAN_RESULTS_TBL;

CREATE COLUMN TABLE PAL_DBSCAN_RESULTS_TBL( ID integer, RESULT integer);

The result tables can be viewed as follows-

SELECT * FROM PAL_DBSCAN_RESULTS_TBL;

Now create a new project in your workspace in Repositories (I have created a project called vis_test)

• .project
• .xsaccess
• .xsapp

The content of the .project file is –

<?xml version=“1.0” encoding=“UTF-8”?>

<projectDescription>

<name>visual_test</name>

<comment></comment>

<projects>

</projects>

<buildSpec>

<buildCommand>

<name>com.sap.ndb.studio.view.ui.BIModelsBuilder</name>

<arguments>

</arguments>

</buildCommand>

<buildCommand>

<name>com.sap.ndb.studio.dwb.team.HanaIgnoreFileBuilder</name>

<arguments>

</arguments>

</buildCommand>

<buildCommand>

<name>org.eclipse.wst.jsdt.core.javascriptValidator</name>

<arguments>

</arguments>

</buildCommand>

<buildCommand>

<name>com.sap.ndb.studio.xse.editor.xsjsLintBuilder</name>

<arguments>

</arguments>

</buildCommand>

</buildSpec>

<natures>

<nature>com.sap.ndb.studio.view.ui.modelerNature</nature>

<nature>com.sap.ndb.studio.dwb.tools.core.hanaNature</nature>

<nature>com.sap.ndb.studio.xse.editor.xsjsNature</nature>

<nature>com.sap.ndb.studio.xse.editor.xsjsLintNature</nature>

<nature>org.eclipse.wst.jsdt.core.jsNature</nature>

</natures>

</projectDescription>

The content of the .xsaccess file is –

{

“exposed” : true,

“authentication” :

{

“method”: “Form”

},

“cache_control” : “must-revalidate“,

“cors” :

{

“enabled” : false

},

“enable_etags” : false,

“force_ssl” : false,

“prevent_xsrf” : true

}

The .xsapp file may be left empty.

## VISUALIZING THE KMEANS ALGORITHM

Now, we need to create a json object from the result tables which can be passed to the script in the html file for creating the visualization.

Creating the json Object-

For creating a json object –

• I have created a file called kmeans.xsjs inside the project which is called vis_test. The content of the file is-

try {

var conn = \$.db.getConnection();

var jsonObject = [];

// SQL query that should be called

var pstmt = conn.prepareStatement(“SELECT * FROM \”TEST_TWO\”.\”PAL_KMEANS_ASSIGNED_TBL\” ORDER BY Cluster”);

// Execute the SQL statement

var rs = pstmt.executeQuery();

// Set the response type to JSON

\$.response.contentType = “text/json“;

var currentClusterId;

var currentCluster = [];

var parentObject;

// Iterate through the rows and extract data row by row

while (rs.next())

{

currentClusterId = rs.getString(2);

currentCluster.push({“name”: rs.getString(1), “size” : rs.getString(5)});

while(rs.next() && currentClusterId === rs.getString(2))

{

currentCluster.push({“name”: rs.getString(1), “size” : rs.getString(5)});

currentClusterId = rs.getString(2);

}

jsonObject.push( { “name” : currentClusterId, “children” : currentCluster});

currentCluster =[];

}//while

parentObject = ({“name” : “parent”, “children” : jsonObject});

// Set the response body to the JSON object by stringifiying it

\$.response.setBody(JSON.stringify(parentObject));

// Close the connection to the database

rs.close();

pstmt.close();

conn.close();

}

catch (e)

{

\$.response.setBody(e.message);

\$.response.status = \$.net.http.INTERNAL_SERVER_ERROR;

}//catch

In order to see the json object in the Browser, go to the browser and type in the path of your file along with the IP address used.

You will see the json object as –

Creating the HTML file-

Now, I have created an html file called display.html which includes the D3 script for the visualization. This takes in the json object created in the previous step and to create a visual graph for the K Means algorithm. I have used a circle packing layout for this visualization.

The code in the circlepacking.html file is –

<!DOCTYPE html>

<html>

<meta charset=“UTF-8”>

<title>Insert title here</title>

<style>

circle {

fill: rgb(31, 119, 180);

fill-opacity: .25;

stroke: rgb(31, 119, 180);

stroke-width: 1px;

}

.leaf circle {

fill: #ff7f0e;

fill-opacity: 1;

}

.node  {

font: 10px sans-serif;

}

.leaf   {

font: 10px sans-serif;

}

.panel { font: 20px “Century Schoolbook”, Georgia, Times, serif;

color: #333;

line-height: 90%;

margin: .2em 0 .4em 0;

letter-spacing: -2px;

}

div#right_top {

position:absolute;

top:400px;

right:550px;

width:300px;

height:460px;

font: bold 15px “Century Schoolbook”, Georgia, Times, serif

}

</style>

<body>

<script src=https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js></script>

<script>

var diameter = 800,

format = d3.format(“,d”);

var pack = d3.layout.pack()

.size([diameter-4, diameter-4])

.value(function(d) { return d.size; });

var svg = d3.select(“body”).append(“svg”)

.attr(“class”, “panel”)

.attr(“width”, 900)

.attr(“height”, 1000)

.append(“g”)

.attr(“transform”, “translate(100,120)”);

1. svg.append(“svg:text”).text(function(d) {return “Cluster Graph “;})

.attr(“transform”, function(d, i){return “translate(“+ 320 +“,” + -50 + “)”; });

d3.json(“./kmeans.xsjs”, function(error, root) {

if (error) throw error;

var node = svg.datum(root).selectAll(“.node”)

.data(pack.nodes)

.enter().append(“g”)

.attr(“class”, function(d) { return d.children ? “node” : “leaf node”; })

.attr(“transform”, function(d) { return “translate(“ + d.x + “,” + d.y + “)”; });

1. node.append(“title”)

.text(function(d) { return d.name + (d.children ? “” : “: “ + format(d.size)); });

1. node.append(“circle”)

.attr(“r”, function(d) { return d.r; });

1. node.filter(function(d) { return !d.children; }).append(“text”)

.attr(“dy”, “.3em”)

.style(“text-anchor”, “middle”)

.text(function(d) { return d.name.substring(0, d.r / 3); });

});

d3.select(self.frameElement).style(“height”, diameter + “px”);

</script>

<div id=‘right_top’>

This representation shows the clustering of different customers based upon the type of expenditure they do.They have been divided into four clusters here. The number of clusters can be specified in the inputs to the algorithm.

</div>

</body>

</html>

Viewing the result-

After writing this code, we need to see the visualization result. In order to see the circle packing visualization, go to the browser and type in the path of your file along with the IP address used.

You will see the result as –

## VISUALIZING THE DBSCAN ALGOTIRITHM

For the DBSCAN algorithm, I have chosen to visualize the result through a scatter plot graph Matrix which compares all the dimensions of the input against each other and plots them. To visualize this, we need to create a json object from the result tables which can be passed to the script in the html file for creating the visualization.

Creating the json Object-

For creating a json object –

• I have created a file called dbscan.xsjs inside the project which is called vis_test. The content of the file is-

try {

// Get sthe Database connection object

var conn = \$.db.getConnection();

// json objects

var jsonObject=[];

var jsonObjectCluster;

var jsonObjectParams;

var jsonFinal;

// SQL statement that needs to be called to fetch the data

var pstmt = conn.prepareStatement(“SELECT T1.ID, RESULT, ATTRIB1, ATTRIB2, ATTRIB3 from \”DBSCAN\”.\”PAL_DBSCAN_RESULTS_TBL\” AS T1 JOIN \”DBSCAN\”.\”PAL_DBSCAN_DATA_TBL\” AS T2 ON T1.ID = T2.ID”);

// Execute the statement, the response is an iterable object over the rows

var rs = pstmt.executeQuery();

// Setting the response type to JSON

\$.response.contentType = “text/json“;

// Iterating through the rows and extracting the data row by row

while (rs.next())

{

// Pushing data on to the json object

jsonObject.push( { “Fresh” : parseInt(rs.getString(3)), “Milk”: parseInt(rs.getString(4)), “Frozen” : parseInt(rs.getString(5)), “clusters”: rs.getString(2)});

}//while

jsonObjectCluster = ( [“0”, “1”, “-1”]);

jsonObjectParams = ([“Fresh”, “Milk”, “Frozen”]);

jsonFinal = ({“dimensions” : jsonObjectParams , “clusters” : jsonObjectCluster, “values” : jsonObject});

// Setting the response body to the JSON object by stringifiying it

\$.response.setBody(JSON.stringify(jsonFinal));

// Close the connection to the database

rs.close();

pstmt.close();

conn.close();

}

catch (e)

{

\$.response.setBody(e.message);

\$.response.status = \$.net.http.INTERNAL_SERVER_ERROR;

}

In order to see the json object in the Browser, go to the browser and type in the path of your file along with the IP address used.

You will see the result as –

Creating the HTML file-

Now, I will create an html file called scatterPlot.html. It uses the D3 script for the data visualization. This takes in the json object created in the previous step and creates a visual scatter plot for the DBSCAN algorithm.

The code in the scatterPlot.html file is –

<!DOCTYPE html>

<html>

<script src=https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js></script>

<meta charset=“UTF-8”>

<title>Scatter Plot</title>

<style>

.legend circle { fill-opacity: 1; }

.legend { font: bold 15px “Century Schoolbook”, Georgia, Times, serif;

color: grey;

line-height: 90%;

margin: .2em 0 .4em 0;

letter-spacing: -2px;

}

.column { font: bold 15px “Century Schoolbook”, Georgia, Times, serif;

color: #333;

line-height: 90%;

margin: .2em 0 .4em 0;

letter-spacing: -2px;

}

.panel { font: bold 20px “Century Schoolbook”, Georgia, Times, serif;

color: #333;

line-height: 90%;

margin: .2em 0 .4em 0;

letter-spacing: -2px;

}

div#right_top {

position:absolute;

top:500px;

right:620px;

width:300px;

height:460px;

font: bold 15px “Century Schoolbook”, Georgia, Times, serif

}

</style>

<body>

<div id=“chart”>

<script>

d3.json(“./dbscan.xsjs”, function(customer) {

// Size parameters.

var size = 280,

// Color scale.

var color = d3.scale.ordinal().range([

“rgb(50%, 0%, 0%)”,

“rgb(0%, 50%, 0%)”,

“rgb(0%, 0%, 50%)”

]);

// Position scales.

var position = {};

customer.dimensions.forEach(function(dimension) {

function value(d) { return d[dimension]; }

position[dimension] = d3.scale.linear()

.domain([d3.min(customer.values, value), d3.max(customer.values, value)])

});

// Root panel.

var svg = d3.select(“#chart”)

.append(“svg:svg”)

.attr(“class”, “panel”)

.attr(“width”, size * customer.dimensions.length+4*legend_width)

.attr(“height”, size * customer.dimensions.length+legend_width);

svg.append(“svg:text”).text(function(d) {return “Scatter Plot Matrix “;})

.attr(“transform”, function(d, i){return “translate(“+ 450 +“,” + 50 + “)”; });

// One column per dimension.

var column = svg.selectAll(“g”

.data(customer.dimensions)

.enter().append(“svg:g”)

.attr(“class”, “column”)

.attr(“transform”, function(d, i) { return “translate(“ + i * size + “,100)”; });

// One row per dimension.

var row = column.selectAll(“g”)

.data(cross(customer.dimensions))

.enter().append(“svg:g”)

.attr(“transform”, function(d, i) { return “translate(100,” + i * size + “)”; });

// Titles for columns

column.append(“svg:text”).text(function(d) {return d;})

.attr(“transform”, function(d, i){return “translate(120,” + ((customer.dimensions.length * size)+10) + “)”; });

// Titles for rows

column.append(“svg:text”).text(function(d) {return d;})

.attr(“transform”, function(d, i){return “translate(“+ (-(i)*size+100) +“,” + (((i+1) * size)-20) + “)rotate(-90)”; });

// X-ticks. TODO Cross the dimension into the tick data?

row.selectAll(“line.x”)

.data(function(d) { return position[d.x].ticks(5).map(position[d.x]); })

.enter().append(“svg:line”)

.attr(“class”, “x”)

.attr(“x1”, function(d) { return d; })

.attr(“x2”, function(d) { return d; })

.attr(“y2”, size – padding / 2);

// Y-ticks. TODO Cross the dimension into the tick data?

row.selectAll(“line.y”)

.data(function(d) { return position[d.y].ticks(5).map(position[d.y]); })

.enter().append(“svg:line”)

.attr(“class”, “y”)

.attr(“x2”, size – padding / 2)

.attr(“y1”, function(d) { return d; })

.attr(“y2”, function(d) { return d; });

// Frame.

row.append(“svg:rect”)

.style(“fill”, “none”)

.style(“stroke”, “#aaa”)

.style(“stroke-width”, 1.5)

.attr(“pointer-events”, “all”)

.on(“mousedown”, mousedown);

// Dot plot.

var dot = row.selectAll(“circle”)

.data(cross(customer.values))

.enter().append(“svg:circle”)

.attr(“cx”, function(d) { return position[d.x.x](d.y[d.x.x]); })

.attr(“cy”, function(d) { return size – position[d.x.y](d.y[d.x.y]); })

.attr(“r”, 3)

.style(“fill”, function(d) { return color(d.y.clusters); })

.style(“fill-opacity”, .5)

.attr(“pointer-events”, “none”);

d3.select(window)

.on(“mousemove”, mousemove)

.on(“mouseup”, mouseup);

// Legend display

var legend = svg.selectAll(“g.legend”)

.data(customer.clusters)

.enter().append(“svg:g”)

.attr(“class”, “legend”)

.attr(“transform”, function(d, i) {

return “translate(“ + (label_height + size * customer.dimensions.length + padding+100) + “,” + (i*20+10+150) + “)”;

});

legend.append(“svg:circle”)

.style(“fill”, function(d) { return color(d); })

.style(“fill-opacity”, .5)

.attr(“r”, 3);

legend.append(“svg:text”)

.attr(“x”, 12)

.attr(“dy”, “.31em”)

.text(function(d) { return “Cluster Number “+d; });

// Mouse event

var rect, x0, x1, count;

function mousedown() {

x0 = d3.mouse(this);

count = 0;

rect = d3.select(this.parentNode)

.append(“svg:rect”)

.style(“fill”, “#999”)

.style(“fill-opacity”, .5);

d3.event.preventDefault();

}

function mousemove() {

if (!rect) return;

x1 = d3.mouse(rect.node());

var minx = Math.min(x0[0], x1[0]),

maxx = Math.max(x0[0], x1[0]),

miny = Math.min(x0[1], x1[1]),

maxy = Math.max(x0[1], x1[1]);

rect

.attr(“x”, minx – .5)

.attr(“y”, miny – .5)

.attr(“width”, maxx – minx + 1)

.attr(“height”, maxy – miny + 1);

var v = rect.node().__data__,

x = position[v.x],

y = position[v.y],

mins = x.invert(minx),

maxs = x.invert(maxx),

mint = y.invert(size – maxy),

maxt = y.invert(size – miny);

count = 0;

svg.selectAll(“circle”)

.style(“fill”, function(d) {

return mins <= d.y[v.x] && maxs >= d.y[v.x]

&& mint <= d.y[v.y] && maxt >= d.y[v.y]

? (count++, color(d.y.clusters))

: “#ccc”;

});

}

function mouseup() {

if (!rect) return;

rect.remove();

rect = null;

if (!count) svg.selectAll(“circle”)

.style(“fill”, function(d) {

return color(d.y.clusters);

});

}

});

function cross(a) {

return function(d) {

var c = [];

for (var i = 0, n = a.length; i < n; i++) c.push({x: d, y: a[i]});

return c;

};

}

</script>

<div id=‘right_top’>

This scatter Plot matrix contains the plots for 140 customer Ids and the monetary units they spend on Milk, Fresh Food and Frozen food every year. The amounts spent on different items have been compared and plotted. The different clusters can be clearly seen.

</div>

</body>

</html>

Viewing the result-

Now, to see the scatter plot, go to the browser and type in the path of your file along with the IP address used.