Missing data

Template:Missing data Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1] Gonzalo Romero, M.D.[2]

Overview

In statistics missing data refers to the absence of registered data for a given variable. It is frequent in clinical research. It is an important source of bias, reducing the consistency (precision or reproducibility) of the study having an important effect on the conclusion of the study potentially leading to invalid results. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.

Classification of missing data

Missing data can be classified depending on the relationship with the independent or dependent(outcome) variables in 3 categories:

Missing completely at random (MCAR)
Missing at random (MAR)
Missing not at random (MNAR)

Missing completely at random (MCAR)

It is independent of observed and non-observed data, therefore not related to the independent variables or the outcome.

Missing at random (MAR)

It is not related to the outcome but is related to the independent variables (for example age, race, gender). Probability of a value being missing will generally depend on observed values (NOT MISSING VALUES), so it does not correspond to the intuitive notion of 'random'. May influence if the independent variable is related to the outcome.

Old subjects might drop out a treatment because they have walking difficulties ( as they cannot go to the clinic center, however among older subjects, the likelihood of dropping out does not relate to the outcome).

Missing not at random (MNAR)

It is is related to the outcome. Present when the pattern of missing data are related to unobserved data - therefore it is impossible to predict data from other values from the dataset

The worst missing data would be the "missing not at random" data since it would indicate that the dropouts were related to the therapy under study.