Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
    • Journal Information
    • Editorial Board Members
    • Reviewer Panel
    • Abstracting and Indexing
    • Journal Policies
    • Our CrossMark Policy
    • Publication Ethics
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Join Editorial Board
    • Join Reviewer Panel
  • Contact us
  • Downloads

eISSN: 2581-9615 || CODEN: WJARAI || Impact Factor 8.2 ||  CrossRef DOI

Research and review articles are invited for publication in March 2026 (Volume 29, Issue 3) Submit manuscript

Data science pipelines in lakehouse architectures: A scalable approach to big data analytics

Breadcrumb

  • Home
  • Data science pipelines in lakehouse architectures: A scalable approach to big data analytics

Praveen Kumar Reddy Gujjala *

Independent Researcher, Cloud Computing, Columbus OH, USA.
 
Research Article
World Journal of Advanced Research and Reviews, 2022, 16(03), 1412-1425
Article DOI: 10.30574/wjarr.2022.16.3.1305
DOI url: https://doi.org/10.30574/wjarr.2022.16.3.1305
 
Received on 18 October 2022; revised on 19 December 2022; accepted on 26 December 2022
 
The exponential growth of data generation across industries has necessitated the development of sophisticated architectures capable of handling diverse data types while maintaining analytical agility. This paper presents a comprehensive framework for implementing end-to-end data science pipelines within lakehouse architectures, bridging the gap between traditional data warehouses and data lakes. The proposed methodology leverages the unified storage and processing capabilities of lakehouse systems to create scalable, reproducible, and maintainable data science workflows that support both exploratory analytics and production machine learning deployments.
Our research introduces a novel modular pipeline framework that seamlessly integrates data engineering and data science operations through containerized microservices architecture. The framework incorporates advanced metadata management systems for comprehensive data lineage tracking and implements cloud-native automation layers that dynamically scale computational resources based on workload demands. Through systematic evaluation of performance metrics and real-world case studies, we demonstrate significant improvements in pipeline execution time, resource utilization efficiency, and model deployment velocity compared to traditional architectures.
The lakehouse paradigm enables data scientists to perform complex analytics on raw, semi-structured, and structured data without the traditional extract-transform-load bottlenecks that characterize conventional data warehouse approaches. By combining Apache Spark's distributed processing capabilities with Databricks' collaborative analytics platform and MLflow's model lifecycle management, our framework provides a comprehensive solution for enterprise-scale data science operations. Experimental results indicate up to 60% reduction in time-to-insight and 40% improvement in computational resource efficiency compared to legacy pipeline architectures. 
 
Lakehouse Architecture; Data Science Pipelines; Apache Spark; Mlflow; Metadata Management; Cloud Computing; Machine Learning Operations
 
https://wjarr.com/sites/default/files/fulltext_pdf/WJARR-2022-1305.pdf

Preview Article PDF

Praveen Kumar Reddy Gujjala. Data science pipelines in lakehouse architectures: A scalable approach to big data analytics. World Journal of Advanced Research and Reviews, 2022, 16(3), 1412-1425. Article DOI: https://doi.org/10.30574/wjarr.2022.16.3.1305

Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


All statements, opinions, and data contained in this publication are solely those of the individual author(s) and contributor(s). The journal, editors, reviewers, and publisher disclaim any responsibility or liability for the content, including accuracy, completeness, or any consequences arising from its use.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content

Copyright © 2026 World Journal of Advanced Research and Reviews - All rights reserved

Developed & Designed by VS Infosolution