IT Spin Inc, USA.
World Journal of Advanced Research and Reviews, 2025, 26(02), 1071-1079
Article DOI: 10.30574/wjarr.2025.26.2.1667
Received on 27 March 2025; revised on 06 May 2025; accepted on 09 May 2025
Generative AI technologies offer transformative potential for addressing fundamental challenges in data pipeline management across enterprise environments. This comprehensive exploration details how artificial intelligence can create self-optimizing, autonomous data pipelines capable of adapting to evolving data ecosystems without human intervention. The integration of machine learning techniques—including anomaly detection, reinforcement learning, and large language models—enables unprecedented capabilities in pipeline orchestration, from predictive failure prevention to dynamic resource allocation. These intelligent systems demonstrate substantial advancements in multiple dimensions: dramatically reducing processing times, preventing failures before occurrence, optimizing resource utilization, automating schema evolution, and significantly lowering operational costs. By leveraging established platforms like Apache Airflow, Apache Spark, and Kubernetes while introducing AI-powered middleware and Databricks' Generative AI capabilities (including Lakehouse IQ, Foundation Models, RAG pipelines, Custom AI Agents, and Auto-Documentation tools), this architecture enables incremental adoption pathways suitable for various organizational maturity levels. Despite remarkable progress, several considerations remain, including initial training requirements, integration with legacy infrastructure, explainability concerns in regulated sectors, and governance frameworks for autonomous systems. Future directions point toward streaming data optimization, federated learning approaches that preserve privacy, specialized language models for intuitive pipeline management, and hardware-aware optimizations for specialized computing environments. The convergence of data engineering with artificial intelligence represents a fundamental shift toward truly adaptive data infrastructure that minimizes operational burden while maximizing business value.
Generative AI; Autonomous data pipelines; Failure prediction; Resource optimization; Schema evolution
Preview Article PDF
Lingareddy Alva. Generative AI for self-optimizing and autonomous data pipelines. World Journal of Advanced Research and Reviews, 2025, 26(2), 1071-1079. Article DOI: https://doi.org/10.30574/wjarr.2025.26.2.1667