|
Model
Selection by Means of Natural Selection
Bruce Ratner, Ph.D.
Model selection - the
process of selecting the best subset of predictor variables to define a
model - is made in part fast and easy by today's computer power, which
is cheap in terms of crunch time. Running a million regression models
with eighteen candidate predictor variables (2 to the 18th power) takes
only minutes. However, deciding the best subset among the million, or
just collecting a dozen good models is a tedious task. The traditional
approach to model selection uses significance testing, which is a rote
process dulled by the lack of a creative component of constructing new
variables from the original ones. The purpose of this article is to
introduce a new method that automatically and simultaneously selects
important original variables, and constructs new important variables
from the original variables by finding patterns within the data (data
mining), and lastly selects a model based on the best subset of
original and constructed variables. The method is based on the
assumption-free, nonparametric genetic paradigm inspired by Natural
Selection - Darwin's Principle of Survival of the Fittest and the
biological operations of reproduction, sexual recombination and
mutation. The new method offers a clear advantage over current
statistical methods, whose performance is dependent upon significance
tests along with theoretical assumptions, predefined model
formulations, and data-type restrictions. A case study is presented to
illustrate the potential of the new method for building database
marketing and CRM models with the GenIQ software
implementation of the new method.
|
|