“How Large a Sample is Required to Build a Database Response Model?”

Statistical consultants are frequently asked: “How large a sample is required to build a database response model?” This simply stated question is not so easily answered as the required input for the sample size calculation is arbitrary and sometimes not known. The “right” sample size is based on several concepts and conditions, such as the arbitrary Type I and Type II errors, the effect size, the number of predictor variables in the model, and the average correlation among the predictor variables. Because the latter two conditions are virtually never known before building a database response model, the calculated sample size is a guesstimate that is too large for most marketing solicitation budgets. Notwithstanding the effects of the input data used, the traditional sample size calculation paradigm is about testing for statistical significance, not practical importance. The latter is more in line with the building of a database response model; namely, what is the usefulness of the model predictions of rank-order likelihood of response? Thus, I raise the appropriate question: “How large a gain (increase in response) is expected from a database response model built with the sample size at hand or the sample size permitted by the budget?” The purpose of this article is to delate the original question, address the newly posited one, and present a methodology for answering the latter.


	“How Large a Sample is Required to Build a Database Response Model?” Bruce Ratner, Ph.D. Statistical consultants are frequently asked: “How large a sample is required to build a database response model?” This simply stated question is not so easily answered as the required input for the sample size calculation is arbitrary and sometimes not known. The “right” sample size is based on several concepts and conditions, such as the arbitrary Type I and Type II errors, the effect size, the number of predictor variables in the model, and the average correlation among the predictor variables. Because the latter two conditions are virtually never known before building a database response model, the calculated sample size is a guesstimate that is too large for most marketing solicitation budgets. Notwithstanding the effects of the input data used, the traditional sample size calculation paradigm is about testing for statistical significance, not practical importance. The latter is more in line with the building of a database response model; namely, what is the usefulness of the model predictions of rank-order likelihood of response? Thus, I raise the appropriate question: “How large a gain (increase in response) is expected from a database response model built with the sample size at hand or the sample size permitted by the budget?” The purpose of this article is to delate the original question, address the newly posited one, and present a methodology for answering the latter. DM STAT-1 CONSULTING / br@dmstat1.com 574 Flanders Drive / North Woodmere, NY 11581 / U S A Voice 1-516-791-3544 / Fax 1-516-791-5075 Toll Free 1 800 DM STAT-1