Gain of a Predictive Information Advantage: Data Mining via Evolution

Bruce Ratner, Ph.D.

Statistical predictive techniques have their roots in the small-data setting of the day, 200 years ago. During that time, a large collection of literature has formed that 1) deepened the theory, providing a zoom-in view of the entrails of the traditional-technique workings, and 2) widened the theory, providing a zoom-out view of the new methods, which stem in part from non-theoretical aspects such as data types; e.g., addressing categorical data resulted in the log-linear model. These new and better-understood methods have as their work-ground the small-data setting, and thus have two zones of weakness, which prevent them from the gain of a predictive information advantage. First, the data analyst must “fit the (data to a) model” under the assumption that the data analyst’s choicest pre-specified model did in fact generate the data at-hand, an untenable assumption (problematic then, and definitely now). Second, these methods are at best optimal for the small-data of yesteryear, and are not scaleable to today’s big-data setting. Today’s model input process can effortlessly import big data due to gargantuan computer memory storage devises. However, the model’s output process is stuck with the paradigm of “fit the model.” The implication of the weaknesses is simply that the models cannot bear the gain of a predictive information advantage.

The purpose of this article is to present the evolutionary computational GenIQ Model© whose paradigm, inspired by Darwinian evolution, is the converse of the one of yesteryear: “fitness begets structure,” equivalently, “the data defines the model.” GenIQ sits well in the work-ground of today’s big-data setting because computers, which are necessary for handily housing big data, are also a necessity to strainlessly perform the required evolutionary computation (EC) of mining big data. Thus, GenIQ presents the gain of a predictive information advantage. EC is the branch of methodologies whereby the computer itself “evolves” predictive information (concurrently, as it evolves a predictive model) by mimicking the natural genetic operators of reproduction, mating, and mutation. Unlike human evolution, which never stops because G-d has no timeline, GenIQ is stopped by the data analyst, who does have a time line, when she see indicators of the best fittest is begotten, which in turn indicates the best structure has evolved – offering the gain of a predictive information advantage (as a by-product of the best defined model). In sum, GenIQ is a flexible, any-size data model that is self-defining, which thereupon yield the gain of a predictive information advantage, which was unimaginable two centuries ago. For an eye-opening preview of the 9-step modeling process of GenIQ, click here. For FAQs about GenIQ, click here.