Discretization
Essay by 24 • October 28, 2010 • 620 Words (3 Pages) • 1,234 Views
Discretization
1. Discretization is the process of transforming Quantitative data to Qualitative data.(Also defined as - Discretization is a process that transforms data containing a quantitative attribute so that the attribute in question is replaced by a qualititative attribute.) In Data Mining much of the algorithms use qualitative data and hence the requirement of discretization. Example: Quantitative data represented by the attribute age in numeric values are represented in discrete descriptive terms such as young and old.
2. Discretization divides the value range of the quantitative attribute into a finite number of intervals. The mapping function associates all of the quantitative values in a single interval to a single qualitative value. A cut point is a value if the quantitative attribute where a mapping function locates an interval boundary. For example, a quantitative attribute recording age might be mapped onto a new qualitative age attribute with three values, pre teen, teenage and post teen. the cut points for such a discretization may be 13 or 18. Values of the original quantitative age attribute that are below 13 might get mapped onto the pre-teen value of new attribute, values from 13 to 18 onto teen , and values above 18 onto post teen.
3. Diverse taxonomies exist in literature to categorize discretization methods. These taxonomies are complimentary; each realating to a different dimension along which discretization methods may differ. Typically, discretization without reference to any other discretization method.
4. Popular Primary methods-
Supervised vs Unsupervised.
Supervised methods refer to the class information when selecting discretization cut points. Unsupervised methods do not use the class information. For example, when trying to predict whether a customer will be profitable, the data might be divided into two classes profitable and unprofitable.
A supervised discretization technique would take account if how useful was the selected cut point for identifying whether a customer was profitable. An unsupervised technique would not.Supervised methods can be further characterized as error-based, entropy-based or statistics-based.
Hierarchical vs. Non-hierarchical.
Hierarchical discretization utilizes an incremental process to select cut points. This creates an implicit hierarchy
over the value range. Hierachical discretization can be further characterized as either split or merge. Split discretization starts with a single interval that encompasses the entire value range, then repeatedly splits it into sub-intervals value in a separate interval, then repeatedly merges adjascent interval until a stopping criterion is met. It is possible to combine both split and merge process is then applied to post-process
...
...