Data Lakehouse Architecture: Bridging the Gap Between Data Lakes and Data Warehouses
  • Author(s): Maya Thomas ; Lavanya Gonsalez ; Ribin Jacob ; Tincy Mathew
  • Paper ID: 1711258
  • Page: 593-600
  • Published Date: 13-10-2025
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 9 Issue 4 October-2025
Abstract

The exponential growth of enterprise and scientific data has challenged longstanding assumptions in data management. Traditional data warehouses deliver mature relational semantics and predictable performance, but struggle with semi-structured modalities, iterative data science, and real-time signals. Data lakes, by contrast, scale elastically on commodity object stores and support diverse data types through schema-on-read, yet historically lacked transactional guarantees, strong governance, and consistent query performance. The Data Lakehouse architecture reconciles these trade-offs by layering warehouse-like ACID transactions, versioned metadata, and query optimization over open file formats in a cloud-native design. This paper provides a deep, holistic treatment of Lakehouse principles and practice. We (i) trace the intellectual lineage from MapReduce, Dremel, and Hive to modern log-structured table formats; (ii) formalize a reference architecture encompassing storage, transaction/metadata, and processing layers with a cross-cutting governance plane; (iii) present a comparative analysis of Delta Lake, Apache Iceberg, and Apache Hudi; (iv) synthesize performance considerations for vectorized execution, small-file mitigation, and streaming upsets; (v) examine governance and interoperability patterns for multi-cloud deployments; and (vi) explore emerging directions- including vector/tensor extensions for AI, zero-ETL pipelines, semantic integration, and carbon-aware optimization. Throughout, we anchor discussion in peer-reviewed literature and production learnings, retaining resolvable DOIs for all referenced works. The result is a practitioner-ready, research-grounded blueprint for building resilient, interoperable, and AI-native data platforms.

Keywords

Data Lakehouse, Delta Lake, Apache Iceberg, Apache Hudi, Parquet, ORC, ACID Transactions, Metadata Governance, Big Data Architecture, Cloud Analytics, Machine Learning, Vector Databases

Citations

IRE Journals:
Maya Thomas , Lavanya Gonsalez , Ribin Jacob , Tincy Mathew "Data Lakehouse Architecture: Bridging the Gap Between Data Lakes and Data Warehouses" Iconic Research And Engineering Journals Volume 9 Issue 4 2025 Page 593-600

IEEE:
Maya Thomas , Lavanya Gonsalez , Ribin Jacob , Tincy Mathew "Data Lakehouse Architecture: Bridging the Gap Between Data Lakes and Data Warehouses" Iconic Research And Engineering Journals, 9(4)