Loading Tree…

DQI-4007

Definition

The observed strength of an association deviates from the expected strength of the association.

Explanation

Associations of data elements may be of an expected strength. For continuous data elements the observed Pearson correlation coefficient \(\\rho\) may differ from an expected strength, for mixture of categorical and continuous data elements odds ratios may be unexpected, and for categorical variables \(\\chi^2\)-based measures of association (\(\\phi\)-coefficient, \(\\lambda\)-coefficient) may differ.

Example

In the first table some reading results of a symptom severity are shown. As expected, there is almost no difference between readers as images were almost randomly assigned to readers.

The respective Cramers-V coefficient is: 0 and the 95% CI [0; 0.13].


Results of another reading show different results:

Accordingly, Cramers-V coefficient is much higher: 0.44 and the 95% CI [0.28; 0.61].

Guidance

Examining differences of the strength of associations should be based on sufficient a-priori knowledge, i.e. the strength of expected associations should be known from large and unbiased samples. In this case, observed associations can be compared against expected associations. Depending on the datatype of data elements a broad no. of tests is available. Please see an overview of association measures in Table 6.2 of this overview. Some of these tests are embedded in base R or the R package DescTools (Signorell et al. 2016).

Besides contingency-table based tests, regression models can also be used to examine the association of data elements. Please see for example: Durrleman and Simon 1989 and Jinyuan et al. 2016.

Interpretation

Differences in the strength of associations indicate errors in the data. Causes for such deviations are manifold such as false transformation of measurements or false use of instruments and/or devices.

Descriptors

Literature

  • Signorell, A., et al. “DescTools: Tools for descriptive statistics. R package version 0.99. 18.” R Foundation for Statistical Computing, Vienna, Austria (2016).

  • Jinyuan, L. I. U., et al. “Correlation and agreement: overview and clarification of competing concepts and measures.” Shanghai archives of psychiatry 28.2 (2016): 115.

Durrleman, S., and Simon, R. (1989). Flexible regression models with cubic splines. Statistics in Medicine 8, 551–561.
Jinyuan, L., Wan, T., Guanqin, C., Yin, L., Changyong, F., et al. (2016). Correlation and agreement: Overview and clarification of competing concepts and measures. Shanghai Archives of Psychiatry 28, 115.
Signorell, A., Aho, K., Alfons, A., Anderegg, N., Aragon, T., et al. (2016). DescTools: Tools for descriptive statistics. R package version 0.99. 18. R Foundation for Statistical Computing, Vienna, Austria.