Predictive Modeling Terms

Essay by 24 • November 27, 2010 • 728 Words (3 Pages) • 1,222 Views

Essay Preview: Predictive Modeling Terms

prev next

Page 1 of 3

Bagging (Voting, Averaging)

The concept of bagging (voting for classification, averaging for regression-type problems with continuous dependent variables of interest) applies to the area of predictive data mining, to combine the predicted classifications from multiple models, or from the same type of model for different learning data. It is also used to address the inherent instability of results when applying complex models to relatively small data sets. Suppose your data mining task is to build a model for predictive classification, and the dataset from which to train the model (learning data set, which contains observed classifications) is relatively small. You could repeatedly sub-sample (with replacement) from the dataset, and apply, for example, a tree classifier (e.g., C&RT and CHAID) to the successive samples. In practice, very different trees will often be grown for the different samples, illustrating the instability of models often evident with small datasets. One method of deriving a single prediction (for new observations) is to use all trees found in the different samples, and to apply some simple voting: The final classification is the one most often predicted by the different trees. Note that some weighted combination of predictions (weighted vote, weighted average) is also possible, and commonly used. A sophisticated (machine learning) algorithm for generating weights for weighted prediction or voting is the Boosting procedure.

Boosting

Boosting applies to the area of predictive data mining, to generate multiple models or classifiers (for prediction or classification), and to derive weights to combine the predictions from those models into a single prediction or predicted classification.

A simple algorithm for boosting works like this: Start by applying some method (e.g. a tree classifier such as C&RT or CHAID) to the learning data, where each observation is assigned an equal weight. Compute the predicted classifications, and apply weights to the observations in the learning sample that are inversely proportional to the accuracy of the classification. In other words, assign greater weight to those observations that were difficult to classify (where the misclassification rate was high), and lower weights to those that were easy to classify (where the misclassification rate was low). In the context of C&RT for example, different misclassification costs (for the different classes) can be applied, inversely proportional to the accuracy of prediction in each class. Then apply the classifier again to the weighted data (or with different misclassification costs), and continue with the next iteration (application of the analysis method for classification to the re-weighted data).

Boosting will generate a sequence of classifiers, where each consecutive classifier in the sequence is an "expert" in classifying observations that were not well classified by those preceding it. During deployment (for

...

Download as: txt (4.8 Kb) pdf (74.4 Kb) docx (9.8 Kb)

Continue for 2 more pages »

Read Full Essay Save

Only available on Essays24.com

Similar Essays

Consumer Behaviour Models And Consumer Behaviour In Tourism

Consumer Behavior Models in Tourism Analysis Study Muhannad M.A Abdallat, Ph.D. Assistant Professor Hesham El -Sayed El - Emam, Ph.D. Assistant Professor Department of Tourism

7,881 Words | 32 Pages
Chapter 26 Terms

hChapter 26 Terms 1. Government role in RR building- Congress was impressed by arguments supporting military and postal needs and began to advance liberal money

540 Words | 3 Pages
Shakespeare Term Paper

William Shakespeare was an Englishman who wrote poems and plays. According to many he was labeled as one of the greatest dramatists the world

1,104 Words | 5 Pages
Of Mice And Men (Term Paper On Loneliness

In terms of emotional stability, there is only one thing in life that is really needed and that is friends. Without friends, people would

1,885 Words | 8 Pages
Marijuana Short And Long Term Effects On The Brain

MARIJUANA'S SHORT AND LONG TERM EFFECTS ON THE BRAIN Millions are abusing marijuana every day. In fact, 1 out of 7 high school students smoke

2,169 Words | 9 Pages
Attitude Change In Viewing Racist Terms

Running head: ATTITUDE CHANGE IN VIEWING RACIST TERMS Attitude Change in Viewing Racist Terms as Immoral Using Persuasion Tactics and Group Size Larkin Wood II

3,720 Words | 15 Pages
Psychology Terms Relating To Drugs

* Sensation - when energy is converted to neural impulses. * Psychophysics - study of relationship between sensation and what you perceive * Signal detection

509 Words | 3 Pages
What Is An Adjoint Model

WHAT IS AN ADJOINT MODEL? The article begins by going into the history of the use of adjoints in meteorology. Adjoints are considered by many

264 Words | 2 Pages