Generating a Random Sample of Alphabet Letters: Why?

Generating a Random Sample of
Alphabet Letters: Why?

Bruce Ratner, Ph.D.

Data preparation can be defined as your acquaintance with the data to understanding what they tell you. You must 1] insure there are no impossible or improbable values (e.g., age of 120 years, or a boy named Sue, respectively), and 2] audit missing and zero values. Post-audit may demand imputation for missing values. Importantly, data preparation also includes coming face-to-face with the data distribution (shape): Looking for 1) a clump - a mass of data (spike) at a single value (often at zero), or a quantity of data cohering together so as to make one body of indefinite shape; and 2) a gap - an intervening space between two nonconsecutive adjacent values. Effective data preparation includes spreading out the clumps, closing in the gaps, and reshaping the data in the desirable and reliable bell-shape curve.

The purpose of this article is to provide an unthought-of devise for the data preparation tool kit - the SAS-code for a random-alphabet function, which generates a uniform distribution of alphabet letters. I provide several illustrations as to why this handy implement is a welcomed addition to the data analyst’s tool kit. If you have an interesting data-prep application (not random passwords) of the random-alphabet function, I would appreciate your thought-of idea. Please email me. Thanks.

SAS-code for a Random-alphabet Function

DATA alpha_letters;
do i=1 to 10000;
random_alphabet =
substr('abcdefghijklmnopqrstuvwxyz',ceil(ranuni(0)*26),1);
output;
end;
RUN;
PROC freq;
table random_alphabet;
RUN;