DM Stat-1 Articles
Link to Home

Link to Articles

Link to Consulting

Link to Seminar

Link to Stat-Chat

Link to Software

Link to Clients

Generating a Random Sample of
 Alphabet Letters: Why?
Bruce Ratner, Ph.D.

Data preparation can be defined as your acquaintance with the data to understanding what they tell you. You must 1] insure there are no impossible or improbable values (e.g., age of 120 years, or a boy named Sue, respectively), and 2] audit missing and zero values. Post-audit may demand imputation for missing values. Importantly, data preparation also includes coming face-to-face with the data distribution (shape): Looking for 1) a clump - a mass of data (spike) at a single value (often at zero), or a quantity of data cohering together so as to make one body of indefinite shape; and 2) a gap - an intervening space between two nonconsecutive adjacent values. Effective data preparation includes spreading out the clumps, closing in the gaps, and reshaping the data in the desirable and reliable bell-shape curve.

The purpose of this article is to provide an unthought-of devise for the data preparation tool kit - the SAS-code  for a random-alphabet function, which generates a uniform distribution of alphabet letters. I provide several illustrations as to why this handy implement is a welcomed addition to the data analyst’s tool kit. If you have an interesting data-prep application (not random passwords)  of the random-alphabet function, I would appreciate your thought-of idea. Please email me.  Thanks.


SAS-code for a Random-alphabet Function

DATA alpha_letters;
do i=1 to 10000;
random_alphabet
     substr('abcdefghijklmnopqrstuvwxyz',ceil(ranuni(0)*26),1);
output;
end;
RUN; 
PROC  freq;
table random_alphabet;
RUN;



For more information about this article, call Bruce Ratner at 516.791.3544,
1 800 DM STAT-1, or e-mail at br@dmstat1.com.