Data Mining Paradigm

Data Mining Paradigm
Bruce Ratner, Ph.D.

The term data mining emerged from the database marketing community sometime between the late 1970s and early 1980s. Statisticians did not understand the excitement and activity caused by this new technique, since the discovery of patterns and relationships (structure) in the data is not new to them. They had known about data mining for a long time, albeit under various names such as data fishing, snooping, and dredging, and most disparaging, “ransacking” the data. Because any discovery process inherently exploits the data, producing spurious findings, statisticians did not view data mining in a positive light. Simply looking for something increases the odds that it will be found; therefore, looking for structure typically results in finding structure. All data have spurious structures, which are formed by the “forces” that makes things come together, such as chance. The bigger data, the greater odds are that spurious structures abound. Thus, an expectation of data mining is that it produces structures, both real and spurious, without distinction between them. In this article I discuss under what conditions statisticians accept data mining as a bona fide field.

Related Articles: Data Mining and Its Aplications