“Interpreting Model Performance: Use the “Smart” Decile Analysis

Data analysts use the decile analysis – based on the scores of the response model at hand – for creating a solicitation list of the most likely individuals to obtain an advantage over a random selection of individuals. The decile analysis involves a brute (“dumb”) division of a database into ten equal-sized contiguous groups (deciles) without regard for the shape of the distribution of model scores. The assumption of this “dumb” decile analysis – individuals within a decile have equivalent model scores, which are different from the model scores of the above-and-below neighboring deciles – is not always tenable, as the distribution of model scores is not always "smooth" but often characterized by "clumps" or "gaps". Deciles with these characteristics lodge extreme response segments, which reflect what the model is doing and how to implement the model to obtain a greater advantage over a random selection. The purpose of this article is to present the correct approach to interpreting model performance via the "smart" decile analysis, which provides a division of a database taking into account the clumps and gaps, for identifying extreme response segments to aid in understanding what the response model is doing and how to best implement the response model.

(Point of Note: The smart decile analysis is employed and enjoyed within the GenIQ Model Software.)

Two Illustrations of Dumb and Smart Decile Analyses

Illustration #1 for a Response Model #1

How to Read the Smart Decile Analysis

The quasi N-tile analysis (smart decile analysis) is used to helplessly show that the dumb decile analysis is misleading in its display of model performance. Although, the quasi 10-tile analysis of Illustration #1, below, produces 10 divisions or tiles (which is not always the case; see Illustration #2, below), its display is not like the corresponding decile analysis. Therefrom, the quasi N-tile analysis shows that the decile analysis assumption – individuals within a given decile have equivalent model scores, reflecting equivalent likelihoods of responding – is not met, and therefore, the estimates from the dumb decile analysis are not honest. (Let’s not concern ourselves with rounding off individuals in the deciles for now: Who wants to discuss 919.5 individuals anyway?!)

I use the quasi 10-tile analysis to parse the Top decile to show that the model scores form three clusters of individuals, each with nonequivalent responsiveness. That is, the Top decile consists of individuals of three levels of response rates – 20.00%, 12.40%, and 9.84%. This is inferred as follows:

The Top 1^st tile consists of the 70 most responsive individuals (in the entire data file) with equivalent scores as identified by the quasi 10-tile analysis. Their “smart” estimated response rate is 20.00%. Although these individuals account for 0.76% of the data, they are an extreme response segment.
The next most responsive individuals with equivalent scores as identified by the quasi 10-tile analysis are in the 2^nd tile consisting of 613 individuals with a smart estimated response rate of 12.40%.
The next-next most responsive individuals, who come from the remaining 236 (= 919 – 70 - 613) individuals at the bottom of the top decile, are identified by the quasi N-tile analysis. They are in the 3^rd tile with a smart estimated response rate of 9.84%.
Thus, the Top decile consists of individuals in three clusters with varying levels of response rates – 20.00%, 12.40%, and 9.84%.
The 236 individuals genuinely belong with all of their counterparts in the 2^nd decile, and with the 7.44% (685) of the most responsive individuals in the 3^rd decile. Specifically, the 236 individuals from the bottom of the top decile, all 919 individuals in the 2^nd decile, and the 685 individuals from the top of the 3^rd decile all dwell in the 3^rd tile as indicated by the quasi analysis. These 1840 (= 236 + 919 + 685) individuals have a smart estimated response rate of 9.84%.
Accordingly, the quasi 10-tile analysis provides smart and honest estimates of CumLifts for the now-understood dumb/nominal decile analysis:

For the Top decile with CumLift of 142: The smart estimate of CumLift is 151 for the 2^nd tile, reflecting 7.43% depth-of-file. Because model scores are slightly clumped about the 10%-neighborhood, an exact 10% depth-of-file smart estimate cannot be obtained.
For the top two deciles, a 20% depth-of-file with a CumLift of 129: Because model scores are heavily clumped (have a very small variance) about the 20%-neighborhood, no smart estimate cannot be obtained.
For the top three deciles, a 30% depth-of-file with a CumLift of 123: The smart estimate of CumLift is 123 for the 3^rd tile, reflecting 27.44% depth-of-file. Because model scores are slightly clumped about the 30%-neighborhood an exact 10% depth-of-file smart estimate cannot be obtained.

Illustration #2 for a Response Model #2

How to Read the Smart Decile Analysis
The quasi N-tile analysis (smart decile analysis) for Illustration #2 is read similarly to that in Illustration #1. But, note that the quasi 10-tile for Illustration #2 only has (showing) six tiles. The four "missing" tiles (2^nd, 4^th, 6^th, and 8^th) are actually suppressed as their model scores are nonobservable "gap" scores, and indicate there are no individuals in these tiles. Clearly, this smart decile analysis demonstrates that the estimates from the dumb decile analysis are not honest.

Using 50-tiles
smart_50tiles