2021 FDA Science Forum
Towards More Meaningful Social Media Analysis: Case Study of Using an Age Prediction Algorithm to Identify and Code Reddit Posts about E-cigarettes by Youth vs. Adults
- Authors:
- Center:
-
Contributing OfficeCenter for Tobacco Products
Abstract
Background
Many social media investigations have been limited to manual qualitative coding or investigating the utility of machine learning classification algorithms, which either limit the scope of the investigation or fail to provide context and utility. The current study combines the two methods to 1) predict Reddit users’ age into two categories (13-20, 21-54) and 2) qualitatively code Electronic Nicotine Delivery System [ENDS] related posts within the two age groups.
Methods
An algorithm using Reddit metadata was developed to classify Reddit posts as being created by 13-20 or 21-54 year old users. Three separate ENDS related search queries were conducted to pull Reddit posts related to the following topics: general vaping, Tobacco 21 minimum age laws, and flavor restriction policies. The age algorithm was then used to predict Reddit users’ ages. The 25 posts with the highest karma score (number of upvotes – number of downvotes) for each query and each predicted age group were qualitatively coded (N = 150).
Results
Across the three queries, there were nine prominently coded themes: Tobacco 21 Policies, Flavor Restriction Policies, Harm Perceptions, Use, Products, Memes/Jokes, COVID-19, Motivations, and Access. Tobacco 21 Policy and Flavor Restriction posts were evenly distributed across the 13-20 and 21-54 groups. Opposition to flavor restriction policies was a prominent sub-theme for both groups, but more common in the 21-54 group. The 13-20 group was more likely to discuss access in light of flavor restriction policies. Harm Perception and COVID-19 discussions were more prominent among the 21-54 group than in the 13-20 group but no dominant sub-theme emerged. The 13-20 group was more likely to post images without text (often memes), post on non-tobacco subreddits, and have higher karma scores. Adults were more likely to mention different brand names, while youth posts in this sample only mentioned Juul by name.
Conclusions
Users who were predicted to be in the 13-20 age group and the 21-54 age group posted and discussed different topics on Reddit, allowing for more nuanced insight. Future studies could utilize machine learning classification algorithms alongside qualitative coding to gain richer insights from target audiences using social media data.