|
A
Simple Bootstrap Variable Selection Method for Building
Database Marketing Models Bruce Ratner, Ph.D.
Variable selection -
determining which independent variables to include in a model - is a
vital part of the model building process. Most data analysts use the
well-known variable selection approaches, such as forward selection
that includes one-by-one variables that contribute to the prediction of
the target variable (binary/response for logistic regression;
continuous/profit for ordinary least squares regression) until no
additional variable contributes any significant improvement in the
model's prediction. Not as well-known is the variable selection methods
produce suboptimal models: either omitting an important (necessary)
predictor variable producing biased predictions, or including an
unnecessary variable producing large (unstable) prediction errors. The
purpose of this article is to use in tandem the bootstrap and the
variable selection methods for a less biased and more stable variable
selection methodology. Two case studies are presented using response
and profit database marketing models.
Related Articles: 1. When Data Are Too Large to Handle in the Memory of Your Computer 2. Creating A Bootstrap Sample |
|