Errors and residuals in statistics
You don't need to be Editor-In-Chief to add or edit content to WikiDoc. You can begin to add to or edit text on this WikiDoc page by clicking on the edit button at the top of this page. Next enter or edit the information that you would like to appear here. Once you are done editing, scroll down and click the Save page button at the bottom of the page.
In statistics and optimization, the concepts of statistical error and residual are easily confused with each other.
A statistical error is the amount by which an observation differs from its expected value; the latter being based on the whole population from which the statistical unit was chosen randomly. The expected value, being for instance the mean of the entire population, is typically unobservable. If the mean height in a population of 21-year-old men is 1.75 meters, and one randomly chosen man is 1.80 meters tall, then the "error" is 0.05 meters; if the randomly chosen man is 1.70 meters tall, then the "error" is −0.05 meters. The nomenclature arose from random measurement errors in astronomy. It is as if the measurement of the man's height were an attempt to measure the population mean, so that any difference between the man's height and the mean would be a measurement error.
A residual (or fitting error), on the other hand, is an observable estimate of the unobservable statistical error. The simplest case involves a random sample of n men whose heights are measured. The sample mean is used as an estimate of the population mean. Then we have:
- The difference between the height of each man in the sample and the unobservable population mean is a statistical error, and
- The difference between the height of each man in the sample and the observable sample mean is a residual.
Note that the sum of the residuals within a random sample is necessarily zero, and thus the residuals are necessarily not independent. The sum of the statistical errors within a random sample need not be zero; the statistical errors are independent random variables if the individuals are chosen from the population independently.
In sum:
- Residuals are observable; statistical errors are not.
- Statistical errors are often independent of each other; residuals are not (at least in the simple situation described above, and in most others).
Contents |
Example with some mathematical theory
If we assume a normally distributed population with mean μ and standard deviation σ, and choose individuals independently, then we have
and the sample mean
is a random variable distributed thus:
The statistical errors are then
whereas the residuals are
(As is often done, the "hat" over the letter ε indicates an observable estimate of an unobservable quantity called ε.)
The sum of squares of the statistical errors, divided by σ2, has a chi-square distribution with n degrees of freedom:
This quantity, however, is not observable. The sum of squares of the residuals, on the other hand, is observable. The quotient of that sum by σ2 has a chi-square distribution with only n − 1 degrees of freedom:
It is remarkable that the sum of squares of the residuals and the sample mean can be shown to be independent of each other. That fact and the normal and chi-square distributions given above form the basis of confidence interval calculations relying on Student's t-distribution. In those calculations one encounters the quotient
in which the σ appears in both the numerator and the denominator and cancels. That is fortunate because in practice one would not know the value of σ2.
References
- Residuals and Influence in Regression, R. Dennis Cook, New York : Chapman and Hall, 1982.
- Applied Linear Regression, Second Edition, Sanford Weisberg, John Wiley & Sons, 1985.
See also
- Margin of error
- Mean absolute error
- Propagation of error
- Root mean square deviation
- Sampling error
- Studentized residual
External links
- VIAS Science Cartoons Residuals from the humorous perspective.de:Zufälliger Fehlerfi:Virhe
sv:Slumpfel it:Errore statistico
Acknowledgement and Attribution Regarding Sources of Content
Some of the initial content on this page may be incorporated in part from copyleft sources in the public domain including wikis such as Wikipedia and AskDrWiki. Drug information for patients came from the The National Library of Medicine. Infectious disease information may have come from the Centers for Disease Control (CDC). Differential Diagnoses are drawn from clinicians as well as an amalgamation of 3 sources: 1.The Disease Database; 2. Kahan, Scott, Smith, Ellen G. In A Page: Signs and Symptoms. Malden, Massachusetts: Blackwell Publishing, 2004:3; 3. Sailer, Christian, Wasner, Susanne. Differential Diagnosis Pocket. Hermosa Beach, CA: Borm Bruckmeir Publishing LLC, 2002:7 .

