Missing data: Difference between revisions

Jump to navigation Jump to search
Line 23: Line 23:


===Missing not at random (MNAR)===
===Missing not at random (MNAR)===
It is is '''related to the outcome'''.
It is is '''related to the outcome'''. It is considered the worst type of missing data because the dropouts are is related to the therapy or intervention under investigation.
Present when the pattern of missing data are related to unobserved data - therefore it is impossible to predict data from other values from the dataset
There is a pattern of missing data which is  related to unobserved data making impossible to use other values from the dataset to predict the missing values.
 
The worst missing data would be the "missing not at random" data since it would indicate that the dropouts were '''related to the therapy '''under study.


==Handling missing data==
==Handling missing data==

Revision as of 16:04, 31 May 2013

Template:Missing data Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1] Gonzalo Romero, M.D.[2]

Overview

In statistics missing data refers to the absence of registered data for a given variable. Missing data is frequent in clinical research. It is an important source of bias, reducing the consistency (precision or reproducibility) of the study. It can have an important effect on the conclusion of the study potentially leading to invalid results drawn from the data.

Classification of missing data

Missing data can be classified depending on the relationship with the independent or dependent(outcome) variables in 3 categories:

  1. Missing completely at random (MCAR)
  2. Missing at random (MAR)
  3. Missing not at random (MNAR)

Missing completely at random (MCAR)

It is independent of observed and non-observed data, therefore not related to the independent variables or the outcome.

Missing at random (MAR)

It is not related to the outcome but is related to the independent variables (for example age, race, gender). It is important to clarify that so it does not correspond to the general notion of 'random'; the probability of a value being missing generally depends on the observed values (independent variables) not on the missing values. May influence if the independent variable is related to the outcome.

Example: Old patients dropping out from an intervention due to physical condition (walking to the center for follow up), which does not relate to the outcome.

Missing not at random (MNAR)

It is is related to the outcome. It is considered the worst type of missing data because the dropouts are is related to the therapy or intervention under investigation. There is a pattern of missing data which is related to unobserved data making impossible to use other values from the dataset to predict the missing values.

Handling missing data