|
CHAID for Uncovering Relationships: A Data Mining Tool Bruce Ratner, Ph.D. CHAID is a technique that recursively partitions (or splits) a population into separate and distinct segments. These segments, called nodes, are split in such a way that the variation of the dependent variable (categorical or continuous) is minimized within the segments and maximized among the segments. After the initial splitting of the population into two or more nodes (defined by values of an independent or predictor variable), the splitting process is repeated on each of the nodes. Each node is treated like a new sub-population. It is then split into two or more nodes (defined by the values of another predictor variable) such that the variation of the dependent variable is minimized within the nodes, and maximized among the nodes. The splitting process is repeated until stopping rules are met. The output of CHAID is a tree display, where the root is the population, and the branches are the connecting segments such that the variation of the dependent variable is minimized within all the segments, and maximized among all the segments. CHAID was originally developed as a method of finding interaction variables. In database marketing, CHAID is primarily used today as a market segmentation technique. I present CHAID for uncovering relationship, and compare it to a newer data mining technique with solid resilient muscle – the GenIQ Model©. Related Articles: Data Mining and Its Aplications 1 800 DM STAT-1, or e-mail at br@dmstat1.com. |
|