The emergence of more massive volumes, faster speed, and an increasingly diverse range of big data has led to the creation of more sophisticated techniques for big data processing and archiving. ETL has evolved from extracted, transformed, load processes performed centrally and restricted in scalability to distributed structures that include cloud computing and big data technology. These modern distributed designs in ETL architecture present the processing of tasks by distributing them across the nodes or clusters, thus promoting scalability, performance, and fault tolerance. Hence, by utilizing parallelism in ETL processes by distributed systems, it is easier and faster to handle large datasets and reconcile time data processing together with data integration and transformation. In this paper, we provide a discussion of one distributed ETL architecture relevant to large-scale data processing. Here, we discuss its key factors of data extraction, data transformation and loading, all of which are intended for distributed high-performance environments. In this paper, through the case study on the described architecture and its performance analysis, we explain how this design contributes to the reduction of processing time and improvement of system scalability. The findings also reveal that distributed ETL frameworks play a crucial function in various applications of big data handling and analysis and prove how effective they can be in meeting the growing need for data management in the current society.
Distributed ETL Architecture, Big Data Analytics, Scalable Data Processing, Cloud-Based Data Management, High-Performance Computing, Fault-Tolerant Systems, Parallel Data Integration.
IRE Journals:
Vandana Kollati
"Scalable Distributed ETL Architecture for Big Data Storage and Processing" Iconic Research And Engineering Journals Volume 6 Issue 12 2023 Page 1605-1612
IEEE:
Vandana Kollati
"Scalable Distributed ETL Architecture for Big Data Storage and Processing" Iconic Research And Engineering Journals, 6(12)