Loading Tree…
Numerical data values deviate markedly from others in a multivariate analysis.
The identification of outliers in a joint distribution of two or more measurement variables assumes implausibility of a value given the realization of (at least) one other value.
We may assume that a weight is positively associated with height. Observing a very large weight (150kg) in a small person (150cm) may be regarded as an unlikely event that merits further investigation.
Multivariate outliers may strongly distort statistical analyses and must be checked for before conducting statistical analyses in numerical data elements.
Contrary to indicators belonging to the consistency dimension, observing outliers does not necessarily indicate a data error but a data property, which may be of substantial interest from a substantial point of view (Aguinis 2013).
It is strongly recommended to take a graphical depiction of the distribution of a variable into account in addition to a numerical cut-off because the graph provides additional information on the interpretation of the observed outlier against the entire distribution.
The presence of outliers, even data values prove correct, may require adaptions of the statistical analysis approach.
Within variables:
The higher the count or percentage of variables affected by multivariate outliers, the higher the probability of a low data quality.
Across variables:
The higher the number or percentage of variables affected by multivariate outliers, the higher the probability of a low data quality.