Comparative analysis of machine learning algorithms for sentiment classification in social media text

Israt Jahan 1, *, Md Nakibul Islam 1, Md Mahadi Hasan 1 and Md Rafiuddin Siddiky 2

1 Department of Information Technology, Washington University of Science and Technology, Virginia, USA.
2 Department of Information Systems Technology, Wilmington University, Delaware, USA.
 
Research Article
World Journal of Advanced Research and Reviews, 2024, 23(03), 2842–2852
Article DOI: 10.30574/wjarr.2024.23.3.2983
 
Publication history: 
Received on 18 August 2024; revised on 25 September 2024; accepted on 27 September 2024
 
Abstract: 
In the era of social media, sentiment analysis has become crucial for understanding public opinion. This study presents a comparative analysis of five machine learning algorithms for sentiment classification in social media text: Logistic Regression, Support Vector Machines (SVM), Random Forest, Naive Bayes, and Gradient Boosting. Using a dataset of 100,000 tweets collected over three months, we evaluated these algorithm's performance in classifying sentiments as positive, negative, or neutral. The data underwent extensive preprocessing, including cleaning, normalization, and addressing class imbalance using SMOTE. Our results show that Logistic Regression and SVM achieved the highest overall accuracy at 86.22%, demonstrating balanced performance across all sentiment classes. Random Forest followed closely with 82.59% accuracy, while Naive Bayes and Gradient Boosting showed lower but still noteworthy performance at 70.45% and 69.96% respectively. All models exhibited challenges in classifying negative sentiments, suggesting potential areas for improvement. The study provides insights into each algorithm's strengths and weaknesses, offering guidance for practitioners in selecting appropriate methods for sentiment analysis tasks. Our findings contribute to the ongoing research in applying machine learning to the complex task of sentiment analysis in the rapidly evolving landscape of social media communication.
 
Keywords: 
Sentiment Analysis; Machine Learning; Social Media; Text Classification; Natural Language Processing
 
Full text article in PDF: 
Share this