Evaluating Programming Tools for Scalable and Efficient Data Science Applications
  • Author(s): Jyothi Swaroop Myneni
  • Paper ID: 1710319
  • Page: 1131-1145
  • Published Date: 31-08-2024
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 8 Issue 2 August-2024
Abstract

The explosion in the number of information-centric applications has markedly affected research and industrial practices in a variety of areas, and the power system domain is one of the hardest hit. Smart grid, smart power meters, and vast numbers of sensors have produced astronomical volumes of data, and any system tasked with parsing that information will require highly scalable and efficient computation. Conventional programming solutions are well-suited to small-scale problems, but they are not always able to support large-scale power system analytics with regard to both computation and scalability. The selection of programming tools gains importance so that timely insights are achieved, offering the effective use of resources and providing reliable system performance. This paper compares four common programming platforms, Python, R, Julia, and C++, in terms of their suitability for large-scale data science applications in power system technology. It will use benchmarking approaches that combine synthetic and real-world datasets of the smart grid. The main benchmarks, such as the execution rate, scalability in the distributed architectures, memory consumption, and power system workflow adaptability, were analysed on a high-performance computer cluster. The datasets have been found to be small-scale (10 GB) to large-scale (1TB) to represent the varied operating conditions. Simulated workloads were assigned as short-time load forecasting, anomaly detection, and real-time monitoring, which are the keystones of the current power system analytics. The empirical results indicate that Python, particularly with the use of distributed frameworks like Apache Spark or Dask, is a good middle ground in terms of scalability, usability, and interoperability with machine learning packages, and may therefore be an appropriate selection when forecasting and real-time monitoring of the system are desired. Julia shows impressive efficiency; although Julia is slightly slower than C++, it has a high-level syntax that qualifies it for time-sensitive applications like fault detection and predictive maintenance. C++ remains the undisputed king within the realm of raw computing speed, and is particularly popular in applications dependent on latency and simulation-intensive, although its sharp learning curve and the cost to maintain those skills are quite high.

Keywords

Programming tools, scalability, efficiency, power systems, data science, Python, Julia, R, C++.

Citations

IRE Journals:
Jyothi Swaroop Myneni "Evaluating Programming Tools for Scalable and Efficient Data Science Applications" Iconic Research And Engineering Journals Volume 8 Issue 2 2024 Page 1131-1145

IEEE:
Jyothi Swaroop Myneni "Evaluating Programming Tools for Scalable and Efficient Data Science Applications" Iconic Research And Engineering Journals, 8(2)