The list below contains selected literature references in the context of our work.
Paper |
---|
Struckmann et al. dataquieR 2: An updated r package for FAIR data quality assessments in observational studies and electronic health record data, 2024, Journal of Open Source Software, 9(98), 6581, https://doi.org/10.21105/joss.06581 |
Saleem et al. A review and empirical comparison of univariate outlier detection methods, 2021, Pakistan Journal of Statistics, 37(4) |
Aguinis et al. Best-practice recommendations for defining, identifying, and handling outliers, 2013, Organizational Research Methods, 16(2), 270–301 |
A. A. for Public Opinion Research Standard definitions: Final dispositions of case codes and outcome rates for surveys, 2011, ” |
Altman & Bland Assessing agreement between methods of measurement, 2017, Clin Chem, https://doi.org/10.1373/clinchem.2016.268870 |
Assenov et al. Comprehensive analysis of DNA methylation data with RnBeads, 2014, Nature Methods, 11(11), 1138 |
Bach The freiburg visual acuity test–automatic measurement of visual acuity, 1996, Optom Vis Sci, 73(1), 49–53, https://www.ncbi.nlm.nih.gov/pubmed/8867682 |
Bamberg et al. Whole-body MR imaging in the german national cohort: Rationale, design, and technical background, 2015, Radiology, 277(1), 206–220 |
Boehmke Data wrangling with r, 2016 |
Bakar et al. A comparative study for outlier detection techniques in data mining, 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems, 1–6 |
Bangia Dictionary of information technology, 2010 |
Bargaje Good documentation practice in clinical research, 2011, Perspectives in Clinical Research, 2(2), 59 |
Barnett & Lewis Outliers in statistical data, 1994 |
Begley & Ellis Drug development: Raise standards for preclinical cancer research, 2012, Nature, 483(7391), 531–533 |
Bennett How can i deal with missing data in my study?, 2001, Australian and New Zealand Journal of Public Health, 25(5), 464–469 |
Bretherton Reference model for metadata: A strawman, 1994, Whitepaper, University Wisconsin., https://pdfs.semanticscholar.org/f941/4454ef0e25ef102831ed8c7a4b6e9c094b00.pdf |
Brown & Forsythe Robust tests for the equality of variances, 1974, Journal of the American Statistical Association, 69(346), 364–367 |
Callahan et al. A comparison of data quality assessment checks in six data sharing networks, 2017, eGEMs (Generating Evidence & Methods to Improve Patient Outcomes), 5(1) |
Chalmers & Glasziou Avoidable waste in the production and reporting of research evidence, 2009, Obstetrics & Gynecology, 114(6), 1341–1345 |
Chen et al. A review of data quality assessment methods for public health information systems, 2014, International Journal of Environmental Research and Public Health, 11(5), 5170–5207 |
Chang et al. Shiny: Web application framework for r, 2015, 2018, R Package Version, 1(0), 14 |
Callegaro et al. Web survey methodology, 2015 |
Cleveland et al. Regression by local fitting: Methods, properties, and computational algorithms, 1988, Journal of Econometrics, 37(1), 87–114 |
Cleveland & Devlin Locally weighted regression: An approach to regression analysis by local fitting, 1988, Journal of the American Statistical Association, 83(403), 596–610 |
Couchoud et al. Renal replacement therapy registries—time for a structured data quality evaluation programme, 2013, Nephrology Dialysis Transplantation, 28(9), 2215–2220 |
Das et al. A new method to evaluate the completeness of case ascertainment by a cancer registry, 2008, Cancer Causes & Control, 19(5), 515–525 |
Dasu & Johnson Exploratory data mining and data cleaning, 2003 |
Dong & Peng Principled missing data methods for researchers, 2013, SpringerPlus, 2(1), 222 |
Drion et al. Some distribution-free tests for the difference between two empirical cumulative distribution functions, 1952, The Annals of Mathematical Statistics, 23(4), 563–574 |
Durrleman & Simon Flexible regression models with cubic splines, 1989, Statistics in Medicine, 8(5), 551–561 |
Ebrahim & Davey Smith Commentary: Should we always deliberately be non-representative?, 2013, International Journal of Epidemiology, 42(4), 1022–1026 |
Edwards et al. Science friction: Data, metadata, and collaboration, 2011, Social Studies of Science, 41(5), 667–690 |
Fasano & Franceschini A multidimensional version of the kolmogorov–smirnov test, 1987, Monthly Notices of the Royal Astronomical Society, 225(1), 155–170 |
Feinstein & Cicchetti High agreement but low kappa: I. The problems of two paradoxes, 1990, Journal of Clinical Epidemiology, 43(6), 543–549 |
Filzmoser A multivariate outlier detection method, 2004 |
Finnie et al. EpiJSON: A unified data-format for epidemiology, 2016, Epidemics, 15, 20–26 |
Fletcher et al. Clinical epidemiology: The essentials, 2012 |
Freedman & Diaconis On the histogram as a density estimator: L 2 theory, 1981, Probability Theory and Related Fields, 57(4), 453–476 |
Golub & Van Loan Matrix computations johns hopkins university press, 1996, Baltimore and London |
Gonzalez-Chica et al. Test of association: Which one is the most appropriate for my study?, 2015, Anais Brasileiros de Dermatologia, 90(4), 523–528 |
Grant Data visualization: Charts, maps, and interactive graphics, 2018 |
Hahsler et al. Introduction to arules-a computational environment for mining association rules and frequent item sets, 2010, 2018 |
Hallgren Computing inter-rater reliability for observational data: An overview and tutorial, 2012, Tutorials in Quantitative Methods for Psychology, 8(1), 23 |
Hansen et al. Enabling longitudinal data comparison using DDI, 2011 |
Harrell Jr Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis, 2015 |
Harris et al. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support, 2009, Journal of Biomedical Informatics, 42(2), 377–381 |
Hartge A dictionary of epidemiology, sixth edition, 2015, Am J Epidemiol, https://doi.org/10.1093/aje/kwv031 |
Hawkins Introduction, 1980, In Identification of outliers (pp. 1–12), https://doi.org/10.1007/978-94-015-3994-4_1 |
Hayat et al. Statistical methods used in the public health literature and implications for training of public health professionals, 2017, PloS One, 12(6), e0179032 |
Horton & Kleinman Using r and RStudio for data management, statistical analysis, and graphics, 2015 |
Hoyle et al. Metadata for the longitudinal data life cycle: The role and benefit of metadata management and reuse., 2010, DDI Working Paper Series: Longitudinal Data Best Practices, https://doi.org/http://dx.doi.org/10.3886/DDILongitudinal03 |
Hubert & Vandervieren An adjusted boxplot for skewed distributions, 2008, Computational Statistics & Data Analysis, 52(12), 5186–5201 |
Hu & Sung Detecting pattern-based outliers, 2003, Pattern Recognition Letters, 24(16), 3059–3068 |
Huebner et al. A contemporary conceptual framework for initial data analysis, 2018, Observational Studies, 4, 71–192, https://obsstudies.org/wp-content/uploads/2018/04/idarev2.pdf |
Huser et al. Methods for examining data quality in healthcare integrated data repositories, 2017 |
Ioannidis Why most published research findings are false, 2005, PLoS Medicine, 2(8), e124 |
Ioannidis Discussion: Why an estimate of the science-wise false discovery rate and application to the top medical literature is false, 2013, Biostatistics, 15(1), 28–36 |
Ioannidis et al. Increasing value and reducing waste in research design, conduct, and analysis, 2014, The Lancet, 383(9912), 166–175 |
Jager & Leek An estimate of the science-wise false discovery rate and application to the top medical literature, 2013, Biostatistics, 15(1), 1–12 |
Jager & Leek Rejoinder: An estimate of the science-wise false discovery rate and application to the top medical literature, 2013, Biostatistics, 15(1), 39–45 |
Joshi et al. Likert scale: Explored and explained, 2015, British Journal of Applied Science & Technology, 7(4), 396 |
Jinyuan et al. Correlation and agreement: Overview and clarification of competing concepts and measures, 2016, Shanghai Archives of Psychiatry, 28(2), 115 |
Kahn et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data, 2016, eGEMs, 4(1) |
Kalton The treatment of missing survey data, 1986, Survey Methodology, 12, 1–16 |
Kao & Green Analysis of variance: Is there a difference in means and what does it mean?, 2008, Journal of Surgical Research, 144(1), 158–170 |
Kahn et al. Quantifying clinical data quality using relative gold standards, 2010, AMIA Annual Symposium Proceedings, 2010, 356 |
Karr et al. Data quality: A statistical perspective, 2006, Statistical Methodology, 3(2), 137–173 |
Kalton & Kasprzyk The treatment of missing survey data, 1986, Survey Methodology, 12(1), 1–16 |
Keller et al. The evolution of data quality: Understanding the transdisciplinary origins of data quality concepts and approaches, 2017 |
Kleiber & Zeileis Visualizing count data regressions using rootograms, 2016, The American Statistician, 70(3), 296–303 |
Koo & Li A guideline of selecting and reporting intraclass correlation coefficients for reliability research, 2016, Journal of Chiropractic Medicine, 15(2), 155–163 |
Kullback & Leibler On information and sufficiency, 1951, The Annals of Mathematical Statistics, 22(1), 79–86 |
Kullback Information theory and statistics, 1997 |
Levene Robust tests for equality of variances, 1961, Contributions to Probability and Statistics. Essays in Honor of Harold Hotelling, 279–292 |
De Lusignan et al. Key concepts to assess the readiness of data for international research: Data quality, lineage and provenance, extraction and processing errors, traceability, and curation, 2011, Yearb Med Inform, 6(1), 112–120 |
Lang & Little Principled missing data treatments, 2016, Prevention Science, https://doi.org/10.1007/s11121-016-0644-5 |
Langeheine et al. Consequences of an extended recruitment on participation in the follow‐up of a child study: Results from the german IDEFICS cohort, 2017, Paediatric and Perinatal Epidemiology, 31(1), 76–86 |
Lee et al. A framework for data quality assessment in clinical research datasets, 2017, AMIA Annual Symposium Proceedings, 2017, 1080 |
Lehmann & Casella Theory of point estimation, 2006 |
Lenth et al. Least-squares means: The r package lsmeans, 2016, Journal of Statistical Software, 69(1), 1–33 |
Liaw et al. Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature, 2013, International Journal of Medical Informatics, 82(1), 10–24 |
Lindsey Comparison of probability distributions, 1974, Journal of the Royal Statistical Society. Series B (Methodological), 38–47 |
Lindsey & Mersch Fitting and comparing probability distributions with log linear models, 1992, Computational Statistics & Data Analysis, 13(4), 373–384 |
Little & Rubin Statistical analysis with missing data, 2014 |
Mayr et al. A permutation test to analyse systematic bias and random measurement errors of medical devices via boosting location and scale models, 2017, Statistical Methods in Medical Research, 26(3), 1443–1460 |
Mahalanobis On the generalized distance in statistics, 1936 |
Seo A review and comparison of methods for detecting outliers in univariate data sets, 2006 |
McMahon & Denaxas A novel framework for assessing metadata quality in epidemiological and public health research settings, 2016, AMIA Summits on Translational Science Proceedings, 2016, 199 |
Meyer et al. Efficient data management in a large-scale epidemiology research project, 2012, Computer Methods and Programs in Biomedicine, 107(3), 425–435 |
Mitchell et al. Data management using stata: A practical handbook, 2010 |
Morgenthaler A survey of robust statistics, 2007, Statistical Methods and Applications, 15(3), 271–293 |
Müller & Büttner A critical discussion of intraclass correlation coefficients, 1994, Statistics in Medicine, 13(23-24), 2465–2476 |
Nadkarni Metadata-driven software systems in biomedicine: Designing systems that can adapt to changing knowledge, 2011, https://doi.org/doi: 10.1007/978-0-85729-510-1 |
Consortium The german national cohort: Aims, study design and organization, 2014, European Journal of Epidemiology, 29, 371–382 |
Newsom Longitudinal structural equation modeling: A comprehensive introduction, 2015 |
Nohr & Olsen Commentary: Epidemiologists have debated representativeness for more than 40 years—has the time come to move on?, 2013, International Journal of Epidemiology, 42(4), 1016–1017 |
Nonnemacher et al. Datenqualität in der medizinischen forschung, 2014 |
Potter et al. Web application teaching tools for statistics using r and shiny, 2016, Technology Innovations in Statistics Education, 9(1) |
Plantier et al. Biomedical engineering systems and technologies: 7th international joint conference, BIOSTEC 2014, angers, france, 3-6, 2014, revised selected papers, 2016 |
Porta A dictionary of epidemiology, 2014 |
Press & Teukolsky Kolmogorov-smirnov test for two-dimensional data: How to tell whether a set of (x, y) data paints are consistent with a particular probability distribution, or with another data set, 1988, Computers in Physics, 2(4), 74–77 |
Prinz et al. Believe it or not: How much can we rely on published data on potential drug targets?, 2011, Nature Reviews Drug Discovery, 10(9), 712 |
Priyadarshana & Sofronov Multiple break-points detection in array CGH data via the cross-entropy method, 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(2), 487–498 |
Ranganathan et al. Common pitfalls in statistical analysis: Measures of agreement, 2017, Perspectives in Clinical Research, 8(4), 187 |
Rasmussen & Blank The data documentation initiative: A preservation standard for research, 2007, Archival Science, 7(1), 55–71 |
Rossini et al. Simple parallel statistical computing in r, 2007, Journal of Computational and Graphical Statistics, 16(2), 399–420 |
Reineke et al. Modys–ein modulares steuerungs-und dokumentationssystem für epidemiologische studien, 2006, Medizinische Dokumentation–Wichtig Oder Nichtig |
A. Richter et al. Data quality monitoring in clinical and observational epidemiologic studies: The role of metadata and process information, 2019, GMS Med Inform Biom Epidemiol, 15(1), https://doi.org/doi: 10.3205/mibe000202 |
R. Rigby et al. Distributions for modelling location, scale, and shape: Using GAMLSS in r, 2017, URL Www. Gamlss. Org.(last Accessed 5 March 2018) |
R. A. Rigby & Stasinopoulos Generalized additive models for location, scale and shape, 2005, Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(3), 507–554 |
Risch Searching for genetic determinants in the new millennium, 2000, Nature, 405(6788), 847 |
Rossini et al. Simple parallel statistical computing in r, 2007, Journal of Computational and Graphical Statistics, 16(2), 399–420 |
Rothman et al. Why representativeness should be avoided, 2013, International Journal of Epidemiology, 42(4), 1012–1014 |
Rothman et al. Modern epidemiology, 2008 |
Rothwell External validity of randomised controlled trials: “To whom do the results of this trial apply?” 2005, The Lancet, 365(9453), 82–93 |
R Core Team R: A language and environment for statistical computing, 2020, https://www.R-project.org/ |
Ryssevik The data documentation initiative (DDI) metadata specification, 2001, Ann Arbor, MI: Data Documentation Alliance. Retrieved from Http://Www. Ddialliance. Org/Sites/Default/Files/Ryssevik_0. Pdf |
Schafer & Graham Missing data: Our view of the state of the art, 2002, Psychol Methods, 7(2), 147–177, https://www.ncbi.nlm.nih.gov/pubmed/12090408 |
C. Schmidt et al. Square2-a web application for data monitoring in epidemiological and clinical studies, 2017, Studies in Health Technology and Informatics, 235, 549–553 |
C. O. Schmidt et al. Assessment of a data quality guideline by representatives of german epidemiologic cohort studies., 2019, MIBE, 15(1), https://doi.org/doi: 10.3205/mibe000203 |
Schmidberger et al. State-of-the-art in parallel computing with r, 2009, Journal of Statistical Software, 47(1) |
Signorell et al. DescTools: Tools for descriptive statistics. R package version 0.99. 18, 2016, R Foundation for Statistical Computing, Vienna, Austria |
Sison & Glaz Simultaneous confidence intervals and sample size determination for multinomial proportions, 1995, Journal of the American Statistical Association, 90(429), 366–369 |
Sniders & Bosker Multilevel analysis: An introduction to basic and advanced multilevel modeling., 1999 |
Stang & Jöckel Avoidance of representativeness in presence of effect modification, 2014, International Journal of Epidemiology, 43(2), 630–631 |
Stausberg et al. Indicators of data quality: Review and requirements from the perspective of networked medical research indikatoren zur datenqualität: Stand und anforderungen aus sicht der vernetzten medizinischen forschung, 2019, GMS Med Inform Biom Epidemiol, 15(1), https://doi.org/doi: 10.3205/mibe000199 |
Sterne & Smith Sifting the evidence—what’s wrong with significance tests?, 2001, Physical Therapy, 81(8), 1464–1469 |
Sturges The choice of a class interval, 1926, Journal of the American Statistical Association, 21(153), 65–66 |
Teppo et al. Data quality and quality control of a population-based cancer registry: Experience in finland, 1994, Acta Oncologica, 33(4), 365–369 |
Thygesen & Ersbøll When the entire population is the sample: Strengths and limitations in register-based epidemiology, 2014, European Journal of Epidemiology, 29(8), 551–558 |
Tukey Exploratory data analysis, 1977 |
Van der Loo The stringdist package for approximate string matching, 2014, The R Journal, 6(1), 111–122 |
Vardaki et al. A statistical metadata model for clinical trials’ data management, 2009, Computer Methods and Programs in Biomedicine, 95(2), 129–145 |
Vardigan et al. Data documentation initiative: Toward a standard for the social sciences, 2008, International Journal of Digital Curation, 3(1), 107–113 |
Wager et al. Model selection for penalized spline smoothing using akaike information criteria, 2007, Australian & New Zealand Journal of Statistics, 49(2), 173–190 |
Wang & Strong Beyond accuracy: What data quality means to data consumers, 1996, Journal of Management Information Systems, 12(4), 5–33 |
Watts et al. Data quality assessment in context: A cognitive perspective, 2009, Decision Support Systems, 48(1), 202–211 |
Nicole G. Weiskopf et al. A data quality assessment guideline for electronic health record data reuse, 2017, eGEMs (Generating Evidence & Methods to Improve Patient Outcomes), 5(1) |
Nicole G. Weiskopf et al. Defining and measuring completeness of electronic health records for secondary use, 2013, Journal of Biomedical Informatics, 46(5), 830–836 |
Nicole Gray Weiskopf & Weng Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research, 2013, Journal of the American Medical Informatics Association, 20(1), 144–151 |
Organization International statistical classification of diseases and related health problems, 2004 |
Wilson Toward releasing the metadata bottleneck, 2011, Library Resources & Technical Services, 51(1), 16–28 |
Wickham Advanced r, 2014 |
Wickham R packages: Organize, test, document, and share your code, 2015 |
De Leeuw et al. Prevention and treatment of item nonresponse, 2003, Journal of Official Statistics, 19, 153–176 |
Carsten Oliver Schmidt et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in r, 2021, BMC Medical Research Methodology, 21(1), 1–15, https://doi.org/10.1186/s12874-021-01252-7 |
Völzke et al. Cohort profile: The study of health in pomerania, 2010, International Journal of Epidemiology, 40(2), 294–307, https://doi.org/10.1093/ije/dyp394 |
Adrian Richter et al. dataquieR: Assessment of data quality in epidemiological research, 2021, Journal of Open Source Software, 6(61), 3093, https://doi.org/10.21105/joss.03093 |
T. A. A. for Public Opinion Research Standard definitions: Final dispositions of case codes and outcome rates for surveys, 2016 |
ISO ISO 8000-1:2022 data quality part 1: overview, 2022, https://www.iso.org/obp/ui/#iso:std:iso:8000:-1:ed-1:v1:en |
Stanley Smith On the theory of scales of measurement, 1946, Science, 103, 677–680 |