Natural Language Querying of DUCKDB using LLM
  • Author(s): Poojitha K D; Chandana M Pallegar; G K Harshitha; Rakshitha B M; Padmapriya H N
  • Paper ID: 1716470
  • Page: 2596-2622
  • Published Date: 24-04-2026
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 9 Issue 10 April-2026
Abstract

Many users work with CSV files to store data, but analyzing this data usually requires knowledge of SQL, which can be difficult for non-technical users. To overcome this problem, this project presents a web-based application that allows users to interact with their data using simple English queries instead of writing SQL commands. The system converts natural language input into SQL queries, making data analysis faster and easier without requiring any programming skills. The proposed system runs completely offline and is developed using the Flask web framework along with a locally hosted pre-trained Large Language Model. This approach ensures data privacy and avoids dependency on cloud-based services. When a CSV file is uploaded, it is processed using Pandas to clean column names, identify data types, and dynamically generate a relational structure inside DuckDB, an in-memory analytical database. The user’s query is combined with the dataset schema and processed by the language model to generate accurate SQL statements. These queries are validated to prevent unsafe operations and are then executed on the dataset, with results returned in a structured JSON format. The application provides a simple and interactive user interface built using HTML, CSS, and JavaScript, supporting drag-and-drop file uploads, data preview, and real-time query responses. This project demonstrates an efficient and secure solution for querying structured data using natural language, making it useful for academic, business, and inventory-related datasets. Future improvements may include support for multiple datasets, graphical data visualization, and enhanced model optimization for handling more complex queries.

Citations

IRE Journals:
Poojitha K D, Chandana M Pallegar, G K Harshitha, Rakshitha B M, Padmapriya H N "Natural Language Querying of DUCKDB using LLM" Iconic Research And Engineering Journals Volume 9 Issue 10 2026 Page 2596-2622 https://doi.org/10.64388/IREV9I10-1716470

IEEE:
Poojitha K D, Chandana M Pallegar, G K Harshitha, Rakshitha B M, Padmapriya H N "Natural Language Querying of DUCKDB using LLM" Iconic Research And Engineering Journals, 9(10) https://doi.org/10.64388/IREV9I10-1716470