Current Volume 9
The quality of datasets is a critical factor in identifying the stability and accuracy of the present-day analytics systems. In most data-intensive contexts, data quality, including missing values, data duplication, inconsistency in the schema, and drift in distribution, can cause a substantial impact on the results of analytical processes and result in unreliable insights. Initial data quality assessment has pointed to the necessity of systematic mechanisms in measuring the reliability, completeness, and consistency of datasets prior to their application in analytical decision-making and the significance of structured data quality assessment mechanisms in complex information systems (Christopher S. Carson, 2001). In spite of these developments, a number of existing analytics pipelines do not have automated processes that can measure the reliability of datasets in a single and scalable way. The current paper will present an automated framework of dataset trust scoring aimed at assessing analytics readiness by fusing messages of data profiling with rule-based validation procedures. The suggested solution consists of the combination of indicators of profiling, which include missingness, duplicate records, distribution drift, and invalid categorical values, and validation rules implemented by automated data integrity checks. The emergence of recent automated data profiling and quality scoring tools has confirmed the efficacy of the algorithmic assessment techniques in identifying data anomalies and enhancing the predictive analytics integrity (Hugo Moura et al., 2024). Based on such improvements, the suggested framework consists of a structured scoring model that combines various data quality indicators into a single dataset trust score. The framework also assesses the possibility of using dataset trust scores as predictors of downstream analytics stability and errors in analytical processes. The adaptive data quality scoring models have demonstrated their potential in industrial contexts in which drift-sensitive monitoring systems are utilized to ensure the stability of data-driven systems in the long term (Fatih Bayram et al., 2024). The proposed framework builds a scalable and automated data readiness evaluation system before the analytical processing by extrapolating these concepts. The research also adds to the expanding area of automated data governance by introducing an effective model of continuous quality assessment of the dataset, which would help organizations increase the reliability of analytics, minimize the spread of errors, and increase the confidence in the decision-making systems it is based on.
Automated Data Quality Scoring; Dataset Trust Score; Data Profiling and Validation; Analytics Readiness Assessment; Data Drift and Integrity Monitoring; Data Quality Automation; AI-Driven Data Governance.
IRE Journals:
Sai Lalitesh Pothukuchi "Automated Data Quality Scoring for Analytics Readiness Using Integrated Profiling and Validation Frameworks" Iconic Research And Engineering Journals Volume 7 Issue 6 2023 Page 622-636 https://doi.org/10.64388/IREV7I6-1716725
IEEE:
Sai Lalitesh Pothukuchi
"Automated Data Quality Scoring for Analytics Readiness Using Integrated Profiling and Validation Frameworks" Iconic Research And Engineering Journals, 7(6) https://doi.org/10.64388/IREV7I6-1716725