U.S. flag An official website of the United States government

On Oct. 1, 2024, the FDA began implementing a reorganization impacting many parts of the agency. We are in the process of updating FDA.gov content to reflect these changes.

  1. Home
  2. Science & Research
  3. About Science & Research at FDA
  4. The FDA Science Forum
  5. Towards More Meaningful Social Media Analysis: Case Study of Using an Age Prediction Algorithm to Identify and Code Reddit Posts about E-cigarettes by Youth vs. Adults
  1. The FDA Science Forum

2021 FDA Science Forum

Towards More Meaningful Social Media Analysis: Case Study of Using an Age Prediction Algorithm to Identify and Code Reddit Posts about E-cigarettes by Youth vs. Adults

Authors:
Poster Author(s)
Navarro, Mario FDA/CTP; Chew, Robert, RTI; Kery, Caroline, RTI; Baum, Laura, RTI; Bukowski, Thomas, RTI; Kim, Annice, RTI
Center:
Contributing Office
Center for Tobacco Products

Abstract

Poster Abstract

Background

Many social media investigations have been limited to manual qualitative coding or investigating the utility of machine learning classification algorithms, which either limit the scope of the investigation or fail to provide context and utility. The current study combines the two methods to 1) predict Reddit users’ age into two categories (13-20, 21-54) and 2) qualitatively code Electronic Nicotine Delivery System [ENDS] related posts within the two age groups.

Methods

An  algorithm using Reddit metadata was developed to classify Reddit posts as being created by 13-20 or 21-54 year old users. Three separate ENDS related search queries were conducted to pull Reddit posts related to the following topics: general vaping, Tobacco 21 minimum age laws, and flavor restriction policies. The age algorithm was then used to predict Reddit users’ ages. The 25 posts with the highest karma score (number of upvotes – number of downvotes) for each query and each predicted age group were qualitatively coded (N = 150).

Results

Across the three queries, there were nine prominently coded themes: Tobacco 21 Policies, Flavor Restriction Policies, Harm Perceptions, Use, Products, Memes/Jokes, COVID-19, Motivations, and Access. Tobacco 21 Policy and Flavor Restriction posts were evenly distributed across the 13-20 and 21-54 groups. Opposition to flavor restriction policies was a prominent sub-theme for both groups, but more common in the 21-54 group. The 13-20 group was more likely to discuss access in light of flavor restriction policies. Harm Perception and COVID-19 discussions were more prominent among the 21-54 group than in the 13-20 group but no dominant sub-theme emerged. The 13-20 group was more likely to post images without text (often memes), post on non-tobacco subreddits, and have higher karma scores. Adults were more likely to mention different brand names, while youth posts in this sample only mentioned Juul by name.

Conclusions

Users who were predicted to be in the 13-20 age group and the 21-54 age group posted and discussed different topics on Reddit, allowing for more nuanced insight. Future studies could utilize machine learning classification algorithms alongside qualitative coding to gain richer insights from target audiences using social media data.


Poster Image
Preview image of the scientific poster. For more information, please refer to the abstract or download the PDF version of the poster.

Download the Poster (PDF; 0.40 MB)

Back to Top