Additional Blogs by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
Private_Member_9643
Active Contributor
0 Kudos
Birds of a feather flock together,
But ever wondered why and how
A crow does not flock with the dove
Such is the nature’s law
Which enterprises are using without a flaw


Suppose that we have to allocate a number of automated teller machines (ATMs) in a given region so as to satisfy a number of constraints. These constraints could be of the following types:-


  • Population density of the region
  • Technology feasibility in terms of last mile connectivity or access
  • Sound commercial premises for housing the ATM
  • Statistical distribution of the users- commercial establishments or residential setups
  • Topography of the selected region or area. Etc.

Households or places of work may be clustered so that typically one ATM is assigned per cluster. The clustering, however, may be constrained by factors involving the location of bridges, rivers, and highways that can affect ATM accessibility. Additional constraints may involve limitations on the number of ATMs per district forming the region. Given such constraints, we can design our clusters, which further we can use for Clustering Analysis.


A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering.


Cluster analysis has wide applications including market or customer segmentation, pattern recognition, biological studies, spatial data analysis, Web document classification, and many others. Cluster analysis can be used as a standard data mining tool to gain insight into the data distribution, or serve as a preprocessing step for other data mining algorithms operating on the detected clusters.



Scenario:

An insurance company might want to identify the potential market for a few policies by segmenting their customer base according to attributes such as income, age, ***, risk categories, and policy types held.


Customer_idNameAge***IncomeRisk CategoryPolicy Type
Cust001Kamaljeet25M300000MediumSilver
Cust002Craig30M1000000LowGold
Cust003Pooja28F20000HighPlatinum


For creating any Data Mining Model we have to set fields and parameters of that Model.


MODEL FIELDS



Content Type defines the data in the Model field. Data could be key field, discrete, continuous, or ordered.


Parameter Values need to be defined for each Model field. The general parameters are weight, default value, binning intervals.


Values for Model field are defined on the basis of Content type selected. Generally we here define which values to ignore, missing values, and valid ranges of values.



MODEL PARAMETERS



Model parameters are defined for the whole model which we created. We define parameters such as number of clusters, max distinct values allowed for attributes, and stopping conditions like, max number of iteration, min fraction of inter cluster loops.


The last step in the clustering process is to use clustering result and derive strategies from this knowledge. We can analyze the clustering output by integrating the created data mining model into APD. We can see the clustering output using Influence Charts, Value Distribution Chart or PMML format.


The influence chart represents the relative importance of every attribute considered for clustering in the formation of clusters. The higher the index, higher is the influence in deciding which cluster an entity would get assigned to.


Using value distribution chart we can see distribution of values for the attributes in the cluster and also across the various clusters.


We can also display the clustering results in the PMML format. Predictive Model Markup Language (PMML) is an XML-based language that enables applications to define statistical and data mining models.



Conclusion

Thus we can see here that cluster analysis is a powerful tool that can be utilized to recognize patterns of usage or customer habits. More and more enterprises today are utilizing variegated forms of cluster analysis to segment their markets or customer base or their product offerings. Categorizing the data objects in an intelligent manner with dissimilar groupings of similar traits provides the companies to understand the level of differentiations existing in their repertoire of their customer base trends.

So the next time you see a Mc Donald’s being opened in your neighborhood which already contains some eating joints, don’t predict dooms day for the Big Mac… probably some cluster analysis has gone into the decision.



11 Comments