Cluster Analysis
Essay by 24 • December 10, 2010 • 813 Words (4 Pages) • 1,420 Views
Cluster Analysis
Ref : www.wikipedia.com, www.clustan.com
Cluster analysis is a class of statistical techniques that can be applied to data that exhibits "natural" groupings. Cluster analysis sorts through the raw data and groups them into clusters.
A cluster is a group of relatively homogeneous cases or observations. Objects in a cluster are similar to each other. They are also dissimilar to objects outside the cluster, particularly objects in other clusters.
Clustering is the classification of similar objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure.
The diagram below illustrates the results of a survey that studied drinkers' perceptions of spirits (alcohol). Each point represents the results from one respondent. The research indicates there are four clusters in this market.
Another example is the vacation travel market. Recent research has identified three clusters or market segments. They are the: 1) The demanders - they want exceptional service and expect to be pampered; 2) The escapists - they want to get away and just relax; 3) The educationalist - they want to see new things, go to museums, go on a safari, or experience new cultures.
In marketing, cluster analysis is used for:
* Segmenting the market and determining target markets
* Product positioning and New Product Development
* Selecting test markets
The basic procedure
1. Formulate the problem - select the variables that you wish to apply the clustering technique to.
2. Select a distance measure - various ways of computing distance:
o Squared Euclidean distance - the square root of the sum of the squared differences in value for each variable
o Manhattan distance - the sum of the absolute differences in value for any variable
o Chebychev distance - the maximum absolute difference in values for any variable
3. Select a clustering procedure.
4. Decide on the number of clusters.
5. Map and interpret clusters - draw conclusions - illustrative techniques like perceptual maps, icicle plots, and dendrograms are useful
6. Assess reliability and validity - various methods:
o repeat analysis but use different distance measure
o repeat analysis but use different clustering technique
o split the data randomly into two halves and analyze each part separately
o repeat analysis several times, deleting one variable each time
o repeat analysis several times, using a different order each time
Clustering procedures
There are several types of clustering methods:
* Non-Hierarchical clustering (also called k-means clustering)
o first determine a cluster center, then group all objects that are within a certain distance
o examples:
 Sequential Threshold method - first determine a cluster center, then group all objects that are within a predetermined threshold from the center - one cluster is created at a time
 Parallel Threshold method - simultaneously several cluster centers are
...
...