We used publicly available, anonymized, nationally aggregated data from Google’s Symptom Search Data (SSD), which reports the relative frequency of Internet searches for 420 signs, symptoms, and health conditions with well-documented privacy protections.31. For comparison, we used data from: (1) the National Syndromic Surveillance Program (NSSP) of the Centers for Disease Control and Prevention (CDC), which tracks emergency department (ED) visits for various conditions in facilities in 48 US states6 and (2) the US Census Bureau’s Household Pulse Survey (HPS) assessing the social and economic impact of the pandemicseven. The main features of these datasets are summarized in Table 1.
The SSD is publicly available30 and provides daily and weekly time series of the relative volume of searches in the United States in English or Spanish for common symptoms and conditions. Data is available at national, state and county levels in the United States and five other English-speaking countries. Search queries related to each symptom are aggregated and anonymized through the use of differential privacy32then normalized by the total search volume in that region, as detailed elsewhere31.
SSD was created by leveraging Google’s web search tools that map queries to Knowledge Graph33.34 entities by continuously learning the associations between the words in user queries and the entities described in the web pages viewed as a result of those queries. The 420 symptoms and conditions included in SSD represent the most frequently searched entities (by query volume). Each entity (symptom or condition) is associated with tens or hundreds of thousands of individual queries made by Google users on desktop or mobile devices. Quotes and capitals in queries are ignored and spelling mistakes are corrected automatically. Sample queries included [lexapro], [depression test]Where [signs of depression] for depression; [trazodone], [agoraphobia] Where [panic attack] for anxiety; and [I want to die], [how to die] and [I want to kill myself] for suicidal thoughts.
For the present study, we focused on SSD search queries related to anxiety, depression, and suicidal ideation between January 1, 2018 and December 31, 2020. We chose these entities a priori because they represent common conditions that are frequently searched for, and because of their high relevance to the mental health of the population. We also considered searches related to motion sickness as a putative negative control in a subset of our analyses.
We compared national-level weekly data on internet searches as measured by SSD to national-level data on emergency room visits as reported by the NSSP. The NSSP is a CDC-led collaboration to collect, analyze, and share electronic health data from approximately 3,500 emergency departments, urgent and ambulatory care centers, inpatient health care facilities, and laboratories ( collectively referred to as emergency facilities from here) across 48 states (excluding Hawaii and Wyoming) and Washington DC6. These institutions represent approximately 70% of all DU institutions in the United States. The data used in this analysis were previously used by Holland et al. (2021)20 and reused in this study with permission from the authors.
We focused on two variables reported by Holland et al. (2021)20: (1) national counts of weekly emergency room visits for mental health problems associated with natural or man-made disasters, such as stress, anxiety, symptoms consistent with acute stress disorder or post-traumatic stress, and panic, and (2) national counts of weekly suicide attempts. The dataset included the number of weekly emergency room visits from December 30, 2018 to October 10, 2020.
We further compared internet search data to HPS data. The HPS is a national survey designed to measure the impacts of the COVID-19 pandemic on the economic, physical, and mental health of American households.seven. Phase 1 of the survey took place between April 23, 2020 and July 21, 2020, phase 2 took place from August 19, 2020 to October 26, 2020 and phase 3 took place between October 28, 2020 and March 29, 2021. Although the investigation is still ongoing, in the current analysis we used HPS data from these three phases35.
Questions about symptoms of anxiety and depression were administered in all phases of the survey, while questions about mental health care were included in phases 2 and 3. Questions about symptoms of anxiety and depression included 4 items which are a modified version of the two items. Patient Health Questionnaire (PHQ-2) and Generalized Anxiety Disorder Two-Item Questionnaires (GAD-2). For each question, responses covered the last 7 days and were coded as follows: not at all = 0, several days = 1, more than half of the days = 2 and almost every day = 3. Scores for anxiety and depression were obtained by adding the responses to the two questions for each construct. The percentage of respondents scoring 3 or more on these summed scores is used in analyzes of survey results. The mental health care index items assessed the percentage of adults in the past 4 weeks who reported taking prescription medication, receiving advice or therapy from a mental health professional, or needing advice or therapy from a mental health professional but did not receive it (c. ).
We first used graphical approaches and descriptive statistics to identify temporal patterns in Internet searches related to anxiety, depression, and suicidal ideation. We then fit a generalized linear model with a logarithmic link function to quantify the impacts on relative search volumes associated with the Thanksgiving and Christmas holiday week and the onset of the COVID-19 pandemic (defined as the first 4 weeks of March 2020), depending on calendar year and season.
Second, we quantified the change in search volumes associated with the pandemic by calculating the percentage change in search frequency for each topic compared to the same week 1 year earlier for the period from January 1, 2020 to December 31, 2020 We also estimated the change in rates of emergency department visits for mental health symptoms and suicide attempts from the NSSP.
Third, we calculated pairwise Pearson correlation coefficients between contemporaneous measures derived from SSD, NSSP, and HPS. The results were not significantly different when using Spearman’s rather than Pearson’s correlation coefficients. We also used scatter plots to further visualize the relationship between specific pairs of markers. In sensitivity analyses, we considered the potential presence of a 1 or 2 week lag between change in search volumes and change in rates of ED visits for mental health reasons or attempted suicide. Specifically, we used a generalized linear model with a logarithmic link function to quantify the relative change in ED visits associated with searches the same week, the previous week, and 2 weeks earlier. We adapt separate models for each research concept. All analyzes were performed using R (version: 4.0.2). The code to replicate these scans is publicly available via GitHub at https://github.com/anthonysun95/Google_SSD_and_Mental_Health.