Definition

Numerical data values deviate markedly from others in a multivariate analysis.

Explanation

The identification of outliers in a joint distribution of two or more measurement variables assumes implausibility of a value given the realization of (at least) one other value.

Example

We may assume that a weight is positively associated with height. Observing a very large weight (150kg) in a small person (150cm) may be regarded as an unlikely event that merits further investigation.

Guidance

Multivariate outliers may strongly distort statistical analyses and must be checked for before conducting statistical analyses in numerical data elements.

Contrary to indicators belonging to the consistency dimension, observing outliers does not necessarily indicate a data error but a data property, which may be of substantial interest from a substantial point of view (Aguinis 2013).

It is strongly recommended to take a graphical depiction of the distribution of a variable into account in addition to a numerical cut-off because the graph provides additional information on the interpretation of the observed outlier against the entire distribution.

The presence of outliers, even data values prove correct, may require adaptions of the statistical analysis approach.

Interpretation

Within variables:

The higher the count or percentage of variables affected by multivariate outliers, the higher the probability of a low data quality.

Across variables:

The higher the number or percentage of variables affected by multivariate outliers, the higher the probability of a low data quality.

Descriptors

Literature

Aguinis, H., R. K. Gottfredson and H. Joo (2013). “Best-Practice Recommendations for Defining, Identifying, and Handling Outliers.” Organizational Research Methods 16(2): 270-301.

Indicator “Multivariate outliers”