Using Machine Learning to Compare Provaccine and Antivaccine Discourse Among the Public on #SocMed

Despite numerous counteracting efforts, antivaccine content linked to delays and refusals to vaccinate has grown persistently on social media, while only a few provaccine campaigns have succeeded in engaging with or persuading the public to accept immunization.


Many prior studies have associated the diversity of topics discussed by antivaccine advocates with the public’s higher engagement with such content. Nonetheless, a comprehensive comparison of discursive topics in pro- and antivaccine content in the engagement-persuasion spectrum remains unexplored.

Objective: We aimed to compare discursive topics chosen by pro- and antivaccine advocates in their attempts to influence the public to accept or reject immunization in the engagement-persuasion spectrum.


Our overall objective was pursued through three specific aims as follows:

(1) we classified vaccine-related tweets into provaccine, antivaccine, and neutral categories;

(2) we extracted and visualized discursive topics from these tweets to explain disparities in engagement between pro- and antivaccine content; and

(3) we identified how those topics frame vaccines using Entman’s four framing dimensions.


We adopted a multimethod approach to analyze discursive topics in the vaccine debate on public social media sites.


Our approach combined

(1) large-scale balanced data collection from a public social media site (ie, 39,962 tweets from Twitter);

(2) the development of a supervised classification algorithm for categorizing tweets into provaccine, antivaccine, and neutral groups;

(3) the application of an unsupervised clustering algorithm for identifying prominent topics discussed on both sides; and

(4) a multistep qualitative content analysis for identifying the prominent discursive topics and how vaccines are framed in these topics. In so doing, we alleviated methodological challenges that have hindered previous analyses of pro- and antivaccine discursive topics.

Results: Our results indicated that antivaccine topics have greater intertopic distinctiveness (ie, the degree to which discursive topics are distinct from one another) than their provaccine counterparts (t122=2.30, P=.02).

In addition, while antivaccine advocates use all four message frames known to make narratives persuasive and influential, provaccine advocates have neglected having a clear problem statement.


Conclusions: Based on our results, we attribute higher engagement among antivaccine advocates to the distinctiveness of the topics they discuss, and we ascribe the influence of the vaccine debate on uptake rates to the comprehensiveness of the message frames.


These results show the urgency of developing clear problem statements for provaccine content to counteract the negative impact of antivaccine content on uptake rates.