Missing data: Difference between revisions

Jump to navigation Jump to search
Line 16: Line 16:


===Missing at random (MAR)===
===Missing at random (MAR)===
It is '''not related to the outcome''' but is related to the independent variables (for example age, race,  gender).
It is '''not related to the outcome but is related to the independent variables''' (for example age, race,  gender).
Probability of a value being missing will generally depend on observed values (NOT MISSING VALUES), so it does not correspond to the intuitive notion of 'random'.
It is important to clarify that so it does not correspond to the general notion of 'random'; the probability of a value being missing generally depends on the observed values (independent variables) not on the missing values.
May influence if the independent variable is related to the outcome.
May influence if the independent variable is related to the outcome.


Old subjects might drop out a treatment because they have walking difficulties ( as they cannot go to the clinic center, however among older subjects, the likelihood of dropping out does not relate to the outcome).
Example: Old patients dropping out from an intervention due to physical condition (walking to the center for follow up), which does not relate to the outcome.


===Missing not at random (MNAR)===
===Missing not at random (MNAR)===

Revision as of 16:00, 31 May 2013

Template:Missing data Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1] Gonzalo Romero, M.D.[2]

Overview

In statistics missing data refers to the absence of registered data for a given variable. Missing data is frequent in clinical research. It is an important source of bias, reducing the consistency (precision or reproducibility) of the study. It can have an important effect on the conclusion of the study potentially leading to invalid results drawn from the data.

Classification of missing data

Missing data can be classified depending on the relationship with the independent or dependent(outcome) variables in 3 categories:

  1. Missing completely at random (MCAR)
  2. Missing at random (MAR)
  3. Missing not at random (MNAR)

Missing completely at random (MCAR)

It is independent of observed and non-observed data, therefore not related to the independent variables or the outcome.

Missing at random (MAR)

It is not related to the outcome but is related to the independent variables (for example age, race, gender). It is important to clarify that so it does not correspond to the general notion of 'random'; the probability of a value being missing generally depends on the observed values (independent variables) not on the missing values. May influence if the independent variable is related to the outcome.

Example: Old patients dropping out from an intervention due to physical condition (walking to the center for follow up), which does not relate to the outcome.

Missing not at random (MNAR)

It is is related to the outcome. Present when the pattern of missing data are related to unobserved data - therefore it is impossible to predict data from other values from the dataset

The worst missing data would be the "missing not at random" data since it would indicate that the dropouts were related to the therapy under study.

Handling missing data