Sentiment analysis of passenger feedback on U.S. airlines using machine learning classification methods

Md Nurul Raihen 1, * and Sultana Akter 2

1 Department of Mathematics and Computer Science, Fontbonne University, Saint Louis, MO, USA.
2 Institute for Data Science and Informatics, University of Missouri Columbia, Columbia, MO, USA.
 
Research Article
World Journal of Advanced Research and Reviews, 2024, 23(01), 2260–2273
Article DOI: 10.30574/wjarr.2024.23.1.2183
 
Publication history: 
Received on 12 June 2024; revised on 18 July 2024; accepted on 20 July 2024
 
Abstract: 
Twitter, a platform for micro-blogging, has contained as a novel information architecture. Everyday People worldwide publish about 200 million status messages, known as tweets. Twitter users express their opinions by posting concise text messages. Twitter data is useful for sentiment analysis and consumer feedback tweets. This study employed multi-class sentiment analysis to analyze tweets from 6 major US airlines (American, United, US Airways, Southwest, Delta and Virgin America). Airlines are essential for travel, and this study has helped people choose the best ones. Classification model with the lowest error rate could help airline companies improve their business by figuring out why information is being misclassified. This analysis of airline evaluations can help us identify good airlines and apply this model to our own journeys. This helps the airline identify its weaknesses so they can improve them. A technique of natural language processing (NLP) known as sentiment analysis (or opinion mining) classifies the tone of data as positive, negative, or neutral. The analysis was conducted with seven distinct classification strategies: Linear Discriminant Analysis, Quadratic Discriminant Analysis, Decision Tree, Random Forest, K-Nearest Neighbors, Gradient Boosting, and AdaBoost to utilize the split validation (80% as train data set, 20% as test data set) and 10 folds cross validation process. The suggested model demonstrates superior accuracy and efficiency compared to all others, achieving an accuracy score of 90.13% for the Random Forest with 10 folds cross validation approach. The project aims to utilize machine learning techniques to estimate the reasons for misclassified information since the lowest error rate means the airline sentiment provides less wrong prediction.
 
Keywords: 
Twitter; Airlines; Classification; Error Rate; Validation
 
Full text article in PDF: 
Share this