The exponential growth of enterprise and scientific data has challenged longstanding assumptions in data management. Traditional data warehouses deliver mature relational semantics and predictable performance, but struggle with semi-structured modalities, iterative data science, and real-time signals. Data lakes, by contrast, scale elastically on commodity object stores and support diverse data types through schema-on-read, yet historically lacked transactional guarantees, strong governance, and consistent query performance. The Data Lakehouse architecture reconciles these trade-offs by layering warehouse-like ACID transactions, versioned metadata, and query optimization over open file formats in a cloud-native design. This paper provides a deep, holistic treatment of Lakehouse principles and practice. We (i) trace the intellectual lineage from MapReduce, Dremel, and Hive to modern log-structured table formats; (ii) formalize a reference architecture encompassing storage, transaction/metadata, and processing layers with a cross-cutting governance plane; (iii) present a comparative analysis of Delta Lake, Apache Iceberg, and Apache Hudi; (iv) synthesize performance considerations for vectorized execution, small-file mitigation, and streaming upsets; (v) examine governance and interoperability patterns for multi-cloud deployments; and (vi) explore emerging directions- including vector/tensor extensions for AI, zero-ETL pipelines, semantic integration, and carbon-aware optimization. Throughout, we anchor discussion in peer-reviewed literature and production learnings, retaining resolvable DOIs for all referenced works. The result is a practitioner-ready, research-grounded blueprint for building resilient, interoperable, and AI-native data platforms.
Data Lakehouse, Delta Lake, Apache Iceberg, Apache Hudi, Parquet, ORC, ACID Transactions, Metadata Governance, Big Data Architecture, Cloud Analytics, Machine Learning, Vector Databases
IRE Journals:
Maya Thomas , Lavanya Gonsalez , Ribin Jacob , Tincy Mathew
"Data Lakehouse Architecture: Bridging the Gap Between Data Lakes and Data Warehouses" Iconic Research And Engineering Journals Volume 9 Issue 4 2025 Page 593-600
IEEE:
Maya Thomas , Lavanya Gonsalez , Ribin Jacob , Tincy Mathew
"Data Lakehouse Architecture: Bridging the Gap Between Data Lakes and Data Warehouses" Iconic Research And Engineering Journals, 9(4)