Sentiment analysis, an integral part of natural language processing, holds significant importance in understanding public opinion and sentiment trends, especially in the era of social media dominance. This paper explores sentiment analysis on Twitter data, focusing on the challenges posed by the platform's dynamic nature and informal communication style. Leveraging Big Data technologies and PySpark's machine learning framework, we address the complexities of sentiment analysis by developing robust models capable of accurately categorizing tweets based on sentiment polarity. Our study delves into various machine learning algorithms, feature engineering techniques, and model optimization strategies, utilizing a Twitter dataset comprising 1,600,000 annotated tweets. Through a comprehensive literature review, we highlight existing methodologies and advancements in sentiment analysis, including classical machine learning and deep learning approaches. Additionally, we categorize relevant research papers to provide insights into the diverse methodologies employed in sentiment analysis on Twitter data. Our findings contribute to the advancement of sentiment analysis techniques, emphasizing the significance of Big Data utilization in overcoming the challenges inherent in analyzing social media data.
Sentiment Analysis on Twitter Dataset Using Apache Spark and H2O Machine Learning Framework: A Comparative Study
Category
Student Abstract Submission