Censoring (statistics)

Jump to navigation Jump to search

In statistics, censoring occurs when the value of an observation is only partially known.

For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at least 75 years. Such a situation could occur if the individual disenrolled from the study at age 75, or if the individual is currently alive at the age of 75.

Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to 300 lbs. If a 350 lb individual is weighed using the scale, the observer would only know that the individual's weight is greater than 300 lbs.

Types of Censoring

  • Left Censoring - a data point is below a certain value but it is unknown by how much
  • Interval Censoring - a data point is somewhere on an interval between two values
  • Right Censoring - a data point is above a certain value but it is unknown by how much

Epidemiology

One of the earliest attempts to analyse a statistical problem involving censored data was Daniel Bernoulli's 1766 analysis of smallpox morbidity and mortality data to demonstrate the efficacy of vaccination.[1]

Life testing

File:Censored Data Example.jpg
Example of five replicate tests resulting in four failures and one suspended time.

Reliability testing often consists of conducting a test on an item to determine the time it takes for a failure to occur.

  • Sometimes a failure is planned and expected but does not occur: operator error, equipment malfunction, test anomaly, etc. The test result was not the desired time-to-failure but can be (and should be) used as a time-to-termination. The use of censored data is unintentional but necessary.
  • Sometimes engineers plan a test program so that, after a certain time limit or number of failures, all other tests will be terminated. These suspended times are treated as right-censored data. The use of censored data is intentional.

An analysis of the data from replicate tests includes both the times-to-failure for the items which failed and the time-of-test-termination for those which did not fail.

Analysis

Special techniques may be used to handle censored data. Tests with specific failure times are coded as actual failures: Censored data are coded for the type of censoring and the known interval or limit. Special software programs (often reliability oriented) can conduct a maximum likelihood estimation for summary statistics, confidence intervals, etc.

The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown.

References

  1. Bernoulli D. (1766) "Essai d’une nouvelle analyse de la mortalite causee par la petite verole. Mem. Math. Phy. Acad. Roy. Sci. Paris, reprinted in Bradley (1971) 21 and Blower (2004)

Bibliography

  • Blower, S. (2004), D, Bernoulli's "Template:Pdf", Reviews of Medical Virolology, 14: 275–288
  • Bradley, L. (1971) Smallpox Inoculation: An Eighteenth Century Mathematical Controversy, Nottingham
  • Mann, N. R.; et al. (1975). Methods for Statistical Analysis of Reliability and Life Data. New York: Wiley. ISBN 047156737X.

External links

  • "Engineering Statistics Handbook", NIST/SEMATEK, [1]

See also