Building a Database Response Model
for Categorical Data
Bruce Ratner, Ph.D.
The standard method for building a binary (yes-no) response model is the logistic regression model (LRM). Variable selection procedures for LRM with continuous and categorical data are well established in the literature and widely practiced. When the response model is restricted to categorical predictor variables, the appropriate model specification (structure) is the Log-Linear Model (LLM) with its own variable selection methods. Unfortunately, LLM has not gained wide-usage for building a response model with categorical predictor variables, perhaps because of its heavily demanding wordbook, including terms such as odds ratios, marginal and conditional odds, and very high-order interactions. Accordingly, LRM-based response models for categorical data are met with a frustrating variable selection process and a questionably specified model.
The purpose of this article is to present a new method - the GenIQ Model© - as an alternative technique for modeling a response variable with categorical predictor variables. The GenIQ Model, which is based on the assumption-free, nonparametric genetic paradigm inspired by Darwin's Principle of Survival of the Fittest, offers theoretical and ease-of-use advantages over LRM and LLM. It automatically and simultaneously "evolves" the response model structure, and the variable selection among categorical predictor variables. The open-worked GenIQ Model and its wordbook are both generally regarded as not demanding on newcomers to genetic modeling. A novel case study using the Titanic dataset (recall the HMS Titanic, which sank on its maiden voyage April 14, 1912) is illustrated to encourage the use of the new method.
1 800 DM STAT-1, or e-mail at email@example.com.