Council on Undergraduate Research (CUR) - Public Discourse During COVID-19: Analyzing Keyword Variations in Twitter Data

Public Discourse During COVID-19: Analyzing Keyword Variations in Twitter Data

Understanding public sentiment during crises like the COVID-19 pandemic is crucial for policymakers and researchers. This study emphasizes the challenges and methodologies of analyzing COVID-19-related tweets, focusing on categorizing and tracking keyword trends to reflect dynamic changes in public sentiment and discussions over time.

The study leverages a data pipeline for collecting, processing, and categorizing tweets. Using Selenium, tweets were collected over several months by navigating individual URLs with a headless browser. The extracted data was stored in a structured MySQL database. To prepare the data for analysis, preprocessing steps, such as text classification and stemming, were conducted to organize tweets by topic.

A central challenge in this analysis lies in the variation of keyword usage. For example, references to “vaccine” appear as terms like “vax,” “vaxed,” or brand names such as “Pfizer” and “Moderna.” Similarly, “mask” includes variations like “N95” and “face covering,” while references to public figures exhibit diverse naming conventions. To address these challenges, a large language model (LLM) was employed to group such variations under unified categories, enhancing keyword recognition and enabling precise time-series analysis.

Keyword trends were analyzed through time-series visualizations, reflecting shifts in public sentiment. LLMs were utilized to identify and group keywords into broader concepts, ensuring accurate categorization. For example, terms like “shot” and “jab” were associated with “vaccine,” while alternate spellings and references were integrated seamlessly.

The analysis uncovered trends in public discussions, such as a surge in vaccine-related mentions during rollout phases and varying sentiment towards mask mandates and social distancing policies. The study’s framework provides a scalable and adaptable approach to analyzing public sentiment dynamics, offering valuable insights for future crises.

By addressing keyword variation challenges and integrating advanced NLP techniques, this framework highlights public sentiment trends during COVID-19, supporting data-driven decision-making and public policy planning.

Presenter

Dan Li

Public Discourse During COVID-19: Analyzing Keyword Variations in Twitter Data

Description

Back to Sessions

Custom JS

Public Discourse During COVID-19: Analyzing Keyword Variations in Twitter Data

Category

Description

Custom CSS