Pearson product-moment correlation coefficient

You don't need to be Editor-In-Chief to add or edit content to WikiDoc. You can begin to add to or edit text on this WikiDoc page by clicking on the edit button at the top of this page. Next enter or edit the information that you would like to appear here. Once you are done editing, scroll down and click the Save page button at the bottom of the page.

Jump to: navigation, search

In statistics, the Pearson product-moment correlation coefficient (sometimes known as the PMCC) (r) is a measure of the correlation of two variables X and Y measured on the same object or organism, that is, a measure of the tendency of the variables to increase or decrease together. It is defined as the sum of the products of the standard scores of the two measures divided by the degrees of freedom:

 r = \frac {\sum z_x z_y}{n - 1}.

Note that this formula assumes the Z scores are calculated using standard deviations which are calculated using n − 1 in the denominator.

The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations.

The coefficient ranges from −1 to 1. A value of 1 shows that a linear equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y increasing with X. A score of −1 shows that all data points lie on a single line but that Y increases as X decreases. A value of 0 shows that a linear model is inappropriate – that there is no linear relationship between the variables.

The Pearson coefficient is a statistic which estimates the correlation of the two given random variables.

The linear equation that best describes the relationship between X and Y can be found by linear regression. This equation can be used to "predict" the value of one measurement from knowledge of the other. That is, for each value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value. We denote this predicted variable by Y'.

Any value of Y can therefore be defined as the sum of Y′ and the difference between Y and Y′:

Y = Y^\prime + (Y - Y^\prime).

The variance of Y is equal to the sum of the variance of the two components of Y:

s_y^2 = S_{y^\prime}^2 + s^2_{y.x}.

Since the coefficient of determination implies that sy.x2 = sy2(1 − r2) we can derive the identity

r^2 = {s_{y^\prime}^2 \over s_y^2}.

The square of r is conventionally used as a measure of the association between X and Y. For example, if the coefficient is 0.90, then 81% of the variance of Y can be "accounted for" by changes in X and the linear relationship between X and Y.

In computer software

  • The CORREL() function in many major spreadsheet packages, such as Microsoft Excel, OpenOffice.org Calc and Gnumeric calculates Pearson's correlation coefficient. Note that versions of Excel prior to 2003 exhibited rounding errors in this function and others [1].
  • The PEARSON() function in Microsoft Excel also calculates Pearson's correlation coefficient.
  • In MATLAB and Minitab, corr(X) calculates Pearsons correlation coefficient along with p-value.
    • In MATLAB, scilab, and GNU Octave corrcoef calculates Pearsons correlation coefficient.
  • In S-Plus and R, cor.test(X,Y) calculates Pearson's correlation coefficient.
R = corrcoef(X) returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are observations and whose columns are variables.
  • In IDL, the CORRELATE() function computes the PMCC.

See also

External links



de:Korrelationskoeffizient it:Indice di correlazione di Pearson nl:Correlatiecoëfficiënt


WikiDoc Help Menu

Quick Start..

Editing basics

Advanced editing

Communicating your edits

Help Videos You Can Watch

Acknowledgement and Attribution Regarding Sources of Content

Some of the initial content on this page may be incorporated in part from copyleft sources in the public domain including wikis such as Wikipedia and AskDrWiki. Drug information for patients came from the The National Library of Medicine. Infectious disease information may have come from the Centers for Disease Control (CDC). Differential Diagnoses are drawn from clinicians as well as an amalgamation of 3 sources: 1.The Disease Database; 2. Kahan, Scott, Smith, Ellen G. In A Page: Signs and Symptoms. Malden, Massachusetts: Blackwell Publishing, 2004:3; 3. Sailer, Christian, Wasner, Susanne. Differential Diagnosis Pocket. Hermosa Beach, CA: Borm Bruckmeir Publishing LLC, 2002:7 .

Personal tools
In other languages