Arabic natural language processing: Data science application in big data environment

Sherif M. Saif *

Department of Computers and Systems, Electronics Research Institute, Cairo, Egypt.
 
Research Article
World Journal of Advanced Research and Reviews, 2024, 24(02), 2283–2293
Article DOI: 10.30574/wjarr.2024.24.2.3602
 
Publication history: 
Received on 16 October 2024; revised on 22 November 2024; accepted on 25 November 2024
 
Abstract: 
In the era of Big Data and Data Science, Text analysis within, Natural Language Processing (NLP), suffers from the curse of high dimensionality. The use of NLP in applications such as speech processing, semantic webs, and word processing has become a main element in today’s Artificial Intelligence and Big Data Applications. A natural language parsing system must incorporate three components of natural language, namely, lexicon, morphology, and syntax. As Arabic is highly derivational, each component requires extensive exploitation of the associated linguistic characteristics. Parsing Arabic sentences still has open challenges due to several reasons including the relatively free word order of Arabic, the length of sentences, and the omission of diacritics (vowels) in written Arabic and the frequency of pro-drop phenomena. This research exploits Visual Prolog to provide a scalable platform for Arabic parser and explains the details of the used lexicon and parser and shows the scalability of the system to address more functions.
 
Keywords: 
Arabic NLP; Data Science; Big Data; Prolog; Parser
 
Full text article in PDF: 
Share this