Missing data

Template:Missing data Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1] Gonzalo Romero, M.D.[2]

Overview

In statistics missing data refers to the absence of registered data for a given variable. Missing data is frequent in clinical research. It is an important source of bias, reducing the consistency (precision or reproducibility) of the study. It can have an important effect on the conclusion of the study potentially leading to invalid results drawn from the data.

Classification of missing data

Missing data can be classified depending on the relationship with the independent or dependent(outcome) variables in 3 categories:

Missing completely at random (MCAR)
Missing at random (MAR)
Missing not at random (MNAR)

Missing completely at random (MCAR)

It is independent of observed and non-observed data, therefore not related to the independent variables or the outcome.

Missing at random (MAR)

It is not related to the outcome but is related to the independent variables (for example age, race, gender). It is important to clarify that so it does not correspond to the general notion of 'random'; the probability of a value being missing generally depends on the observed values (independent variables) not on the missing values. May influence if the independent variable is related to the outcome.

Example: Old patients dropping out from an intervention due to physical condition (walking to the center for follow up), which does not relate to the outcome.

Missing not at random (MNAR)

It is is related to the outcome. It is considered the worst type of missing data because the dropouts are is related to the therapy or intervention under investigation. There is a pattern of missing data which is related to unobserved data making impossible to use other values from the dataset to predict the missing values.