|
A
Database Marketing Model
for Zero-inflated Data Bruce Ratner, Ph.D. The problem of modeling
data with missing values is well known to data analysts. Data analysts
know that almost all standard statistical modeling techniques require
complete data, and accordingly discard individuals with missing data.
They make every effort to impute the missing data values. A common
approach is to "zero-inflate" the data by replacing missing values with
zeros. For binary variables and dummified categorical variables, say,
representing participation in lifestyle activities, which assume 1 or 0
if an individual does or does not participate in a given lifestyle
activity, respectively, missing-value individuals would have zeros. The
working assumption is the missing-value individuals are nonparticipants
of the corresponding lifestyle activities. Similarly, for continuous
variables, say, representing a count activity (e.g., number of visits)
or dollar amount, missing-value individuals would have zeros, implying
they have no activity or a zero dollar value. Zero-inflated data
clearly do not meet the bell-shaped data distributional assumption of
the standard statistical modeling techniques. The zero-inflated data
approach empirically has been justified by producing good model results
in the majority.
The purpose of this
article is to present a distribution-free alternative to regression
modeling with zero-inflated data, which are either due to imputation as
discussed above, or actually observed. The GenIQ Model,
which is based on the machine learning method of genetic programming,
theoretically accepts zero-inflated data, and thus offers optimal model
results. Two case studies are presented using response and profit
database marketing models.
|
|