Reliability of LLM-Assisted Data Cleaning in Pandas Pipelines: An Empirical Evaluation Framework for Detecting Silent Data Corruption

Sai Lalitesh Pothukuchi

doi:10.64388/IREV8I8-1716726

Reliability of LLM-Assisted Data Cleaning in Pandas Pipelines: An Empirical Evaluation Framework for Detecting Silent Data Corruption

Author(s): Sai Lalitesh Pothukuchi
Paper ID: 1716726
Page: 1172-1182
Published Date: 28-02-2025
Published In: Iconic Research And Engineering Journals
Publisher: IRE Journals
e-ISSN: 2456-8880
Volume/Issue: Volume 8 Issue 8 February-2025

Download

Abstract

Large Language Models (LLMs) are being used in data science pipelines in more and more cases to automate tabular data preprocessing in Pandas pipelines. Nevertheless, current evaluation standards are mostly focused on syntactic accuracy and unit-test accuracy, but not much on the semantic accuracy of the data transformations generated. Type casting, missing value imputation, outlier, encoding, and normalisation operations of data cleaning may silently corrupt statistical distributions and undercut event validity of downstream analytics, without inducing execution errors. The current paper is a reliably conducted systematic cross-domain empirical assessment of data cleaning using LLM on healthcare, financial, e-commerce, and sensor data. Our evaluation rubric is multi-dimensional in that it covers the structural correctness, logical validity, statistical soundness, preservation of data integrity, and reproducibility on a scale of 0 to 3. In 5,150 cleaning operations, transformations generated by LLM were highly structurally correct (>90%), but semantically more reliable when compared by task category. Missing value processing and outlier detection had a high harm rate (10-15) and a silent error rate as high as 7%. In order to address those risks, we suggest an automated validation system that includes schema validation, distribution shift, distribution shift detection (Kolmogorov-Smirnov testing and variance analysis), tracking the null propagation, and constraint-based integrity checks. The framework minimised silent errors by about 60 per cent with a precision level of 0.91 and a recall of 0.88. These results indicate that syntax-based metrics cannot be used to assess AI-aided preprocessing and propose the need to address semantic stability metrics and automated protection of responsible usage of LLMs in production data pipelines.

Keywords

Large Language Models; LLM-assisted Programming; Data Cleaning; Pandas Pipelines; Silent Errors; Data Integrity; AI Reliability; Semantic Code Evaluation; Automated Validation; Data Preprocessing

Citations

IRE Journals:
Sai Lalitesh Pothukuchi "Reliability of LLM-Assisted Data Cleaning in Pandas Pipelines: An Empirical Evaluation Framework for Detecting Silent Data Corruption" Iconic Research And Engineering Journals Volume 8 Issue 8 2025 Page 1172-1182 https://doi.org/10.64388/IREV8I8-1716726

IEEE:
Sai Lalitesh Pothukuchi "Reliability of LLM-Assisted Data Cleaning in Pandas Pipelines: An Empirical Evaluation Framework for Detecting Silent Data Corruption" Iconic Research And Engineering Journals, 8(8) https://doi.org/10.64388/IREV8I8-1716726

Reliability of LLM-Assisted Data Cleaning in Pandas Pipelines: An Empirical Evaluation Framework for Detecting Silent Data Corruption

Abstract

Keywords

Citations

About IRE Journals

Important Links

For Authors

Contact Us For Help

Connect With Us