Pearson's r

Jump to navigation Jump to search

Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]


In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the MCV or PMCC) (r) is a common measure of the correlation between two variables X and Y. When measured in a population the Pearson Product Moment correlation is designated by the Greek letter rho (ρ). When computed in a sample, it is designated by the letter r and is sometimes called "Pearson's r." Pearson's correlation reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. A correlation of -1 means that there is a perfect negative linear relationship between variables. A correlation of 0 means there is no linear relationship between the two variables. Correlations are rarely if ever 0, 1, or -1. A certain outcome could indicate whether correlations are negative or positive.[1]

The statistic is defined as the sum of the products of the standard scores of the two measures divided by the degrees of freedom.[1]. If the data comes from a sample, then

<math>r = \frac {1}{n - 1} \sum ^n _{i=1} \left( \frac{X_i - \bar{X}}{s_X} \right) \left( \frac{Y_i - \bar{Y}}{s_Y} \right)</math>

where

<math>\frac{X_i - \bar{X}}{s_X}, \bar{X}, \text{ and } s_X</math>

are the standard score, sample mean, and sample standard deviation (calculated using n − 1 in the denominator).[1]

If the data comes from a population, then

<math>\rho = \frac {1}{n} \sum ^n _{i=1} \left( \frac{X_i - \mu_X}{\sigma_X} \right) \left( \frac{Y_i - \mu_Y}{\sigma_Y} \right)</math>

where

<math>\frac{X_i - \mu_X}{\sigma_X}, \mu_X, \text{ and } \sigma_X</math>

are the standard score, population mean, and population standard deviation (calculated using n in the denominator).

The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations.

The coefficient ranges from −1 to 1. A value of 1 shows that a linear equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y increasing with X. A score of −1 shows that all data points lie on a single line but that Y increases as X decreases. A value of 0 shows that a linear model is inappropriate – that there is no linear relationship between the variables.[1]

The Pearson coefficient is a statistic which estimates the correlation of the two given random variables.

The linear equation that best describes the relationship between X and Y can be found by linear regression. This equation can be used to "predict" the value of one measurement from knowledge of the other. That is, for each value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value. We denote this predicted variable by Y'.

Any value of Y can therefore be defined as the sum of Y′ and the difference between Y and Y′:

<math>Y = Y^\prime + (Y - Y^\prime).</math>

The variance of Y is equal to the sum of the variance of the two components of Y:

<math>s_y^2 = S_{y^\prime}^2 + s^2_{y.x}.</math>

Since the coefficient of determination implies that sy.x2 = sy2(1 − r2) we can derive the identity

<math>r^2 = {s_{y^\prime}^2 \over s_y^2}.</math>

The square of r is conventionally used as a measure of the association between X and Y. For example, if r2 is 0.90, then 90% of the variance of Y can be "accounted for" by changes in X and the linear relationship between X and Y.[1]

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 Moore, David (August 2006). "4". Basic Practice of Statistics (4 ed.). WH Freeman Company. pp. 90–114. ISBN 0-7167-7463-1.

Template:Statistics

de:Korrelationskoeffizient it:Indice di correlazione di Pearson nl:Correlatiecoëfficiënt

Template:WH Template:WS