DM Stat-1 Articles
Link to Home

Link to Articles

Link to Consulting

Link to Seminar

Link to Stat-Chat

Link to Software

Link to Clients

Assessing the Importance of Variables in Database Response Models
 Bruce Ratner, Ph.D.

The classic approach for assessing the statistical significance of a variable considered for model inclusion is the well-known null hypothesis-significance testing procedure, which is based on the reduction in prediction error (actual response minus predicted response) associated with the variable in question. The statistical apparatus of the formal testing procedure for logistic regression analysis consists of: the log likelihood function (LL), the G statistic, degrees of freedom, and the p-value. The procedure uses the apparatus within a theoretical framework with weighty and untenable assumptions. From a purist point of view, this could cast doubt on findings that actually have statistical significance. Even if findings of statistical significance are accepted as correct, they may not be of practical importance or have noticeable value to the study at hand. For the data analyst with a pragmatic slant, the limitations and lack of scalability inherent in the classic system can not be overlooked, especially within big data settings. In contrast, the data mining approach uses the LL units, the G statistic and degrees of freedom in an informal data-guided search for variables that suggest a noticeable reduction in prediction error. One point worth noting is that the informality of the data mining approach calls for suitable change in terminology, from declaring a result as statistically significant to one worthy of notice or noticeably important. In this article I describe the data mining approach of variable assessment for building database response models.

For more information about this article, call Bruce Ratner at 516.791.3544, 1 800 DM STAT-1, or e-mail at