DM Stat-1 Articles
Link to Home

Link to Articles

Link to Consulting

Link to Seminar

Link to Stat-Chat

Link to Software

Link to Clients

Model Selection by Means of Natural Selection
Bruce Ratner, Ph.D.
Model selection - the process of selecting the best subset of predictor variables to define a model - is made in part fast and easy by today's computer power, which is cheap in terms of crunch time. Running a million regression models with eighteen candidate predictor variables (2 to the 18th power) takes only minutes. However, deciding the best subset among the million, or just collecting a dozen good models is a tedious task. The traditional approach to model selection uses significance testing, which is a rote process dulled by the lack of a creative component of constructing new variables from the original ones. The purpose of this article is to introduce a new method that automatically and simultaneously selects important original variables, and constructs new important variables from the original variables by finding patterns within the data (data mining), and lastly selects a model based on the best subset of original and constructed variables. The method is based on the assumption-free, nonparametric genetic paradigm inspired by Natural Selection - Darwin's Principle of Survival of the Fittest and the biological operations of reproduction, sexual recombination and mutation. The new method offers a clear advantage over current statistical methods, whose performance is dependent upon significance tests along with theoretical assumptions, predefined model formulations, and data-type restrictions. A case study is presented to illustrate the potential of the new method for building database marketing and CRM models with the GenIQ software implementation of the new method.

For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at