Comparative analysis of MapReduce and Apache Tez Performance in Multinode clusters with data compression

Sifat Ibtisum 1, S M Atikur Rahman 2, * and S. M. Saokat Hossain 3

1 Department of Computer Science, Missouri University of Science & Technology, Rolla, Missouri, MO 65409, USA.
2 Department of Industrial, Manufacturing and Systems Engineering, University of Texas at El Paso, TX  79968, USA.
3 Department of Computer Science, Jahangirnagar University, Savar, Dhaka-1342, Bangladesh.
 
Review Article
World Journal of Advanced Research and Reviews, 2023, 20(03), 519–526
Article DOI: 10.30574/wjarr.2023.20.3.2486
Publication history: 
Received on 26 October 2023; revised on 04 December 2023; accepted on 06 December 2023
 
Abstract: 
This article conducts a thorough comparative analysis of Apache Tez and MapReduce in the context of big data processing. It focuses on key performance metrics, scalability, and ease of use. The analysis begins with an overview of the architectural distinctions between the two frameworks, emphasizing their fundamental design principles. A detailed performance evaluation follows, considering factors such as execution time, resource utilization, and throughput across diverse workloads. The study explores scalability by examining how Apache Tez and MapReduce respond to increasing data volumes and computational demands. Cluster size effects, resource allocation strategies, and adaptability to dynamic workloads are scrutinized. Additionally, the article evaluates the frameworks' ease of use for developers and administrators, incorporating aspects like programming model simplicity, debugging capabilities, and system configurability. User experiences are gathered through surveys and practical use cases. The conclusions drawn from this analysis offer valuable insights for organizations and practitioners seeking suitable distributed computing frameworks. By addressing both performance and user experience, the article aims to provide a comprehensive perspective on the strengths and weaknesses of Apache Tez and MapReduce, assisting decision-makers in making informed choices for their big data processing requirements.
 
Keywords: 
Apache Tez; Spark Core; Compression; Cluster size; MapReduce
 
Full text article in PDF: 
Share this