Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
    • Journal Information
    • Editorial Board Members
    • Reviewer Panel
    • Abstracting and Indexing
    • Journal Policies
    • Our CrossMark Policy
    • Publication Ethics
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Join Editorial Board
    • Join Reviewer Panel
  • Contact us
  • Downloads

eISSN: 2581-9615 || CODEN: WJARAI || Impact Factor 8.2 ||  CrossRef DOI

Research and review articles are invited for publication in April 2026 (Volume 30, Issue 1) Submit manuscript

Automating large scale data pipelines using spark, python and workflow orchestration

Breadcrumb

  • Home
  • Automating large scale data pipelines using spark, python and workflow orchestration

JAGADEESWAR ALAMPALLY *

Software Development Manager – USA.

Review Article

World Journal of Advanced Research and Reviews, 2020, 06(02), 281-292

Article DOI: 10.30574/wjarr.2020.6.2.0134

DOI url: https://doi.org/10.30574/wjarr.2020.6.2.0134

Received on 30 April 2020; revised on 24 May 2020; accepted on 28 May 2020

Automation has become a fundamental requirement for managing modern large-scale data pipelines as organizations increasingly rely on continuous data processing for analytics and decision-making. Traditional pipeline management approaches often involve significant manual intervention, leading to operational inefficiencies, inconsistent execution, and delays in data availability. This study presents an automated data pipeline framework that integrates Apache Spark for distributed data processing, Python for transformation and control logic, and workflow orchestration mechanisms for task scheduling and dependency management. The proposed framework is designed to improve reliability, scalability, and operational efficiency when handling large datasets across distributed computing environments. The architecture incorporates automated job scheduling, fault-tolerant task execution, and structured pipeline monitoring to ensure consistent data processing workflows. Experimental evaluation demonstrates that the automated framework significantly reduces manual operational overhead while improving pipeline stability and data freshness. Results further indicate consistent pipeline performance across varying dataset volumes and improved resource utilization in distributed environments. The findings highlight the practical advantages of integrating Spark-based processing with Python-driven automation and orchestration tools to support scalable and reliable data engineering operations. This research contributes to the development of automated pipeline architectures that support efficient large-scale data processing in modern data-driven systems. 

Data Pipeline Automation; Apache Spark; Python; Workflow Orchestration; Distributed Data Processing; Big Data Engineering; Scalable Data Pipelines

https://wjarr.com/sites/default/files/fulltext_pdf/WJARR-2020-0134.pdf

Preview Article PDF

JAGADEESWAR ALAMPALLY. Automating large scale data pipelines using spark, python and workflow orchestration. World Journal of Advanced Research and Reviews, 2020, 06(02), 281-292. Article DOI: https://doi.org/10.30574/wjarr.2020.6.2.0134.

Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


All statements, opinions, and data contained in this publication are solely those of the individual author(s) and contributor(s). The journal, editors, reviewers, and publisher disclaim any responsibility or liability for the content, including accuracy, completeness, or any consequences arising from its use.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content

Copyright © 2026 World Journal of Advanced Research and Reviews - All rights reserved

Developed & Designed by VS Infosolution