Finding the Best Variables For Direct Marketing Models

Finding the best possible subset of variables to put in a model has been a frustrating exercise. Many methods of variable selection exist, but none of them is perfect. Furthermore, none use a criterion that addresses the specific needs of direct/database marketing (DM) models. The purpose of this article to is to present a new methodology – the GenIQ Model© – that uses the machine learning aprroach of genetic programming to isolate the variables. Pointedly, the GenIQ Model automatically determines the best set of predictor variables (from the original variables, and newly constructed – genetically data mined – variables) based on a virtually unbiased assessment of all variables under consideration, an achievement not possible with statistical methods. Most significantly, genetic modeling is used to address the specific needs of DM models, viz., optimizing the decile table, which has trandscended its DM origin, and now serves as a universal measure of model performance. Moreover, GenIQ offers exceptional predictions with minimal error variance, and a unique feature accommodating dirty and incomplete data. GenIQ can handle both classification (e.g., target yes-no response variable) and regression (e.g., target continuous sales variable) problems with categorical, ordinal and continuous candidate predictor variables. Case studies are reported showing the potential power, and future prominence of GenIQ in the data analyst's toolkit.


	Finding the Best Variables for Direct Marketing Models Bruce Ratner, Ph.D. Finding the best possible subset of variables to put in a model has been a frustrating exercise. Many methods of variable selection exist, but none of them is perfect. Furthermore, none use a criterion that addresses the specific needs of direct/database marketing (DM) models. The purpose of this article to is to present a new methodology – the GenIQ Model© – that uses the machine learning aprroach of genetic programming to isolate the variables. Pointedly, the GenIQ Model automatically determines the best set of predictor variables (from the original variables, and newly constructed – genetically data mined – variables) based on a virtually unbiased assessment of all variables under consideration, an achievement not possible with statistical methods. Most significantly, genetic modeling is used to address the specific needs of DM models, viz., optimizing the decile table, which has trandscended its DM origin, and now serves as a universal measure of model performance. Moreover, GenIQ offers exceptional predictions with minimal error variance, and a unique feature accommodating dirty and incomplete data. GenIQ can handle both classification (e.g., target yes-no response variable) and regression (e.g., target continuous sales variable) problems with categorical, ordinal and continuous candidate predictor variables. Case studies are reported showing the potential power, and future prominence of GenIQ in the data analyst's toolkit. For more information about this article, call Bruce Ratner at 516.791.3544, 1 800 DM STAT-1, or e-mail at br@dmstat1.com. DM STAT-1 CONSULTING / br@dmstat1.com 574 Flanders Drive / North Woodmere, NY 11581 / U S A Voice 1-516-791-3544 / Fax 1-516-791-5075 Toll Free 1 800 DM STAT-1