Independent Researcher, USA.
World Journal of Advanced Research and Reviews, 2026, 30(01), 2413-2422
Article DOI: 10.30574/wjarr.2026.30.1.1141
Received on 20 March 2026; revised on 26 April 2026; accepted on 28 April 2026
The exponential growth of data streaming from edge devices, IoT sensors, and transactional systems has rendered traditional Extract, Transform, Load (ETL) pipelines obsolete for real-time analytics. These conventional, rule-based pipelines struggle with dynamic workloads, unpredictable system failures, and the semantic heterogeneity of cross-industry data. This paper proposes a unified, AI-driven framework for optimizing real-time ETL pipelines deployed on cloud infrastructure. The framework integrates three core components: (1) a predictive auto-scaling engine utilizing Long Short-Term Memory (LSTM) networks to forecast workload volatility and dynamically allocate cloud resources, minimizing latency and cost; (2) a self-healing fault tolerance mechanism powered by reinforcement learning (RL) that proactively identifies anomalous node behaviour and orchestrates failover strategies without human intervention; and (3) a semantic interoperability layer employing graph neural networks (GNNs) and large language models (LLMs) to map disparate data schemas (e.g., HL7 for healthcare, X12 for finance, and OPC-UA for manufacturing) into a unified, query-able format. We evaluate the proposed framework using a simulated multi-cloud environment with mixed-industry streaming datasets. Results demonstrate a 42% reduction in ETL latency during demand spikes, a 99.99% uptime through predictive self-healing, and a 95% reduction in manual schema-mapping efforts. This unified approach establishes a new paradigm for resilient, efficient, and interoperable real-time data processing across industry verticals.
Real-Time ETL; AI-Driven Optimization; Predictive Scaling; Fault Tolerance; Data Interoperability; Cloud Infrastructure; Reinforcement Learning; Graph Neural Networks; Streaming Analytics; Self-Healing Systems
Preview Article PDF
Naresh Reddy Telukutla. AI-driven optimization of real-time ETL pipelines in cloud infrastructure: A unified framework for predictive scaling, fault tolerance and cross-industry data interoperability. World Journal of Advanced Research and Reviews, 2026, 30(01), 2413-2422. Article DOI: https://doi.org/10.30574/wjarr.2026.30.1.1141.