|
The Correlation Coefficient: Definition
Bruce Ratner, Ph.D. The correlation coefficient, denoted by r, is a measure of the strength of the straight-line or linear relationship between two variables. The correlation coefficient takes on values ranging between +1 and -1. The following points are the accepted guidelines for interpreting the correlation coefficient:
The calculation of the correlation coefficient for two variables, say X and Y, is simple to understand. Let zX and zY be the standardized versions of X and Y, respectively. That is, zX and zY are both re-expressed to have means equal to zero, and standard deviations (std) equal to one. The re-expressions used to obtain the standardized scores are in equations (3.1) and (3.2):
zXi = [Xi - mean(X)]/std(X) (3.1) zYi = [Yi - mean(Y)]/std(Y) (3.2) The correlation coefficient is defined as the mean product of the paired standardized scores (zXi, zYi) as expressed in equation (3.3). rX,Y = sum of [zXi * zYi]/(n-1), where n is the sample size (3.3) For a simple illustration of the calculation, consider the sample of five observations in Table 1. Columns zX and zY contain the standardized scores of X and Y, respectively. The last column is the product of the paired standardized scores. The sum of these scores is 1.83. The mean of these scores (using the adjusted divisor n-1, not n) is 0.46. Thus, rX,Y = 0.46. ( Related Article: When Data Are Not Straight ) 1 800 DM STAT-1, or e-mail at br@dmstat1.com. |