|
Generating a Random Sample of
Alphabet Letters: Why? Bruce Ratner, Ph.D. Data preparation can be defined as your acquaintance with the data to understanding what they tell you. You must 1] insure there are no impossible or improbable values (e.g., age of 120 years, or a boy named Sue, respectively), and 2] audit missing and zero values. Post-audit may demand imputation for missing values. Importantly, data preparation also includes coming face-to-face with the data distribution (shape): Looking for 1) a clump - a mass of data (spike) at a single value (often at zero), or a quantity of data cohering together so as to make one body of indefinite shape; and 2) a gap - an intervening space between two nonconsecutive adjacent values. Effective data preparation includes spreading out the clumps, closing in the gaps, and reshaping the data in the desirable and reliable bell-shape curve. The purpose of this article is to provide an unthought-of devise for the data preparation tool kit - the SAS-code for a random-alphabet function, which generates a uniform distribution of alphabet letters. I provide several illustrations as to why this handy implement is a welcomed addition to the data analyst’s tool kit. If you have an interesting data-prep application (not random passwords) of the random-alphabet function, I would appreciate your thought-of idea. Please email me. Thanks. SAS-code for a Random-alphabet Function DATA alpha_letters; do i=1 to 10000; random_alphabet = substr('abcdefghijklmnopqrstuvwxyz',ceil(ranuni(0)*26),1); output; end; RUN; PROC freq; table random_alphabet; RUN; 1 800 DM STAT-1, or e-mail at br@dmstat1.com. |
|