Global Mind Project data in the United States: A comparison with national statistics

Main Article Content

Joseph Taylor
Oleksii Sukhoi
https://orcid.org/0009-0002-7377-3189
Jennifer Jane Newson
Tara C Thiagarajan

Abstract

Background
The rapid growth of internet and mobile technologies has opened up new, low-cost methods for large-scale population surveys. The Global Mind Project (GMP) is one such survey that uses quota-based online strategies that dynamically target respondents by age, sex, and location. However, how well this method aligns with national population statistics remains unclear.


Objective
To evaluate how well GMP data collected through online recruitment aligns demographically with United States (US) benchmarks from traditional probability-based surveys, including the American Community Survey (ACS), Household Pulse Survey (HPS), and American Trends Panel (ATP).


Methods
We analysed 114,721 GMP responses collected in the US between 2020 and 2024. Participants were recruited via Facebook and Google AdSense using broad interest-based keywords and stratified demographic targeting. GMP data were time- and question-matched with ACS, HPS, and ATP data to compare trends in educational attainment, marital status, mental health treatment, and number of close friends.


Results
Demographic patterns in GMP data typically aligned with national statistics within a 5–7% margin. Educational attainment by age was similar to ACS data, except among 65+, where GMP consistently showed a 5% and 10% higher rate of High School and Bachelor’s completion, respectively. GMP and ACS matched near-perfectly for Divorced and Widowed marital status by age while ‘Not married’ in the GMP was 6-10% higher compared to ‘Never married’ individuals in the ACS and, conversely, lower in the Married group. GMP aggregate mental health treatment estimates were within ±1% of HPS values for three of the four years studied, although age-specific differences ranged from 5–8%. Compared to ATP, those reporting two or fewer friends were 15% higher in the GMP. These differences reflect differences in sampling methodology but also imperfect matches of categories and differing non-response bias arising from mode of survey.


Conclusions
GMP data demonstrate that with dynamic targeting and quota-based sampling, online recruitment methods can produce data that align well with traditional national surveys. This data, therefore, offers real-time, inclusive and cost-efficient population-level monitoring of mental health and social trends, with potential for use in public health research and policy.

Highlights

  • The Global Mind Project (GMP) uses quota-based dynamic online ad targeting (Q-DOAT) via Meta and Google Ads to recruit large-scale populations.
  • Analysis of 114,721 US responses (2020–2024) showed GMP demographic trends aligned within 5–7% of national statistics from ACS, HPS, and ATP.
  • Slight differences were observed, including 5-10% higher representation of single individuals, those with fewer close friends, and those seeking mental health treatment.
  • GMP data demonstrate that online recruitment, combined with post-stratification, can produce data that aligns well with national demographic and social trends.
  • The study supports the utility of GMP as a scalable, near real-time platform for population health monitoring.

Introduction

Traditionally, population or household surveys have relied on mail, telephone or face-to-face recruitment of individuals randomly selected using address-based sampling within demographically stratified bands, typically based on age, sex, geography and socioeconomic status [1, 2]. This probability sampling approach is widely regarded as the gold standard for generating representative data. However, such methods are costly, time-consuming, and difficult to scale across countries or at speed. Non-mandatory surveys often face low response rates, further limiting reach. When collecting data on sensitive topics, such as mental health, ensuring anonymity is also critical to address concerns around data privacy or fear of self-disclosure.

As the field of population data science evolves, there is growing interest in the potential of non-traditional data collection methods to support scalable, inclusive and timely public health insights. The global rise in internet and mobile phone use over recent decades [3], has opened up new possibilities for sampling and recruitment. These methods offer rapid, cost-effective access to large and diverse segments of the population. However, online recruitment approaches, such as advertisements on Google or Meta (Facebook and Instagram), use non-probability sampling driven by opaque platform algorithms. This can lead to overrepresentation of certain groups, and excludes individuals without stable internet access or presence on the chosen platform, raising concerns about sample bias and data quality [415]. Furthermore, when surveys are conducted anonymously, there is heightened risk of fraudulent, misleading or automated (bot) responses [16, 17].

The Global Mind Project (GMP) uses online population sampling to provide a real-time view of global mental wellbeing, or what we call ‘mind health,’ and the social, technological and lifestyle factors influencing it. It collects data across 85+ countries in 23 languages, surveying 47 aspects of mental function and feeling on a 9-point scale, alongside a wide range of demographic, cultural, lifestyle, and life experience variables, including age, sex, ethnicity, education, employment status, and income. Since its launch in 2020, it has gathered responses from over 2 million internet-enabled adults [18, 19]. Participants are recruited anonymously via paid advertisements on platforms including Meta, Google Display and Google AdSense, and are invited to complete a 15-minute online assessment. Instead of appealing to altruism or research goals, the assessment offers a free personalised report with wellbeing scores and self-help guidance, encouraging participation. In addition, GMP employs a quota-based dynamic online ad targeting strategy (hereafter called Quota-based Dynamic Online Ad Targeting, or Q-DOAT) which systematically targets predefined age-sex groups across selected geographies using a broad set of interest criteria and keywords with the goal of robust representation of the general population in each age-sex band for different countries or regions of interest. While full proportional representation cannot be ensured at the recruitment stage, the strategy aims to obtain sufficient sample sizes in each demographic group such that representative outcomes can be obtained through post-stratification weighting [11, 20].

This study evaluated how demographic and social trends in GMP data, collected using the Q-DOAT approach targeting five regions in the United States (US) [21], compared with time-aligned trends from 3 national US surveys using traditional sampling and recruitment methods: the American Community Survey (ACS; [22]) and Household Pulse Survey (HPS; [23]) conducted by the US Census Bureau, and the American Trends Panel (ATP) from the Pew Research Center. Comparisons focused on items with exact or near-exact matches, including educational attainment and marital status by age and biological sex (the target criteria), the percentage seeking treatment for mental health challenges, and number of close friends. We hypothesised that demographic trends would broadly align across sources, while greater variations would be observed in social and behavioural measures due to differing patterns of non-response bias. While this study focused on the US, we note that these evaluations could also be performed for other countries with sufficient national statistics, and where a significant majority of the population is internet-enabled, or where statistics distinguish this group.

Methods

GMP data

The GMP, which now actively acquires data in 23 languages from 85+ countries is a data resource freely available to the academic research community. The data, now available from 2 million+ people worldwide, consists of: (i) ratings of 47 aspects of emotional, social, cognitive and physical capacities and problems, (ii) aggregated metrics of overall mental wellbeing and 6 different mental wellbeing dimensions, (iii) detailed demographic information acquired across multiple data waves spanning family relationships, religion, diet, substance use, traumas and adversities amongst others. More details on these data elements, as well as the form to request data access can be found here: https://sapienlabs.org/researcher-hub. Data can be downloaded with filters for clean data and targeted data (see below).

While anonymously obtained, the GMP provides some identifiers for linkage and longitudinal analysis on request. These include: (i) encrypted email identifiers for a subset of participants who provide an email address for follow-up reminders, (ii) encrypted proxies of IP address for all participants, which are approximate identifiers and not as reliable as email address.

Participant recruitment using Q-DOAT

Participants are presently recruited to the GMP using campaigns on Google and Meta (Facebook and Instagram). Globally, 4.9 billion people use Google, and 3.7 billion people use Meta which represents 61% and 38% of the global population, respectively. However, in the US, 93% of the population uses the internet while ~87% use Google and 68-75% use Meta. Almost 100% of the internet-enabled use either Google or Meta. Non-users in the US are dominated by young children (not included in the GMP) and elderly. This method is therefore able to effectively target the US population with a possible bias in the elderly.

In the US, participants were recruited through English-language campaigns (from 2020 onwards) and Spanish-language campaigns (from 2021 onwards), spoken by 91.5% of the population as their first language, and targeted five regions: Northeast, Southeast, Southwest, Midwest and West (see Table 1, Supplementary Table 1 and Table 2 for a breakdown of the sample by age, biological sex and region). Advertisements featured the message ‘Get your mental wellbeing score: Fast, Free, Anonymous’ linking to the start of the open survey [21]. At any time, 30 to 100 ads were active, regionally targeted towards each age-sex group between 18 and 85 years. Beginning in June 2021, Spanish-language advertisement spend was scaled in proportion to the Spanish-speaking population. Regardless of the language of the initial advertisement, respondents could select any of the available languages (now 23) for completing the survey. At the start of the survey all potential respondents were provided with a description of how their data would be used, both on-screen and within a more detailed data privacy document.

Clean records 2020 2021 2022 2023 2024
18-24 F 1638 2424 2481 1004 1063
18-24 M 857 1548 1285 579 617
25-34 F 620 1326 1202 424 483
25-34 M 413 894 745 281 354
35-44 F 579 1079 969 435 423
35-44 M 366 629 623 297 291
45-54 F 890 1375 1140 637 579
45-54 M 527 840 833 467 391
55-64 F 1594 2098 2106 1449 1432
55-64 M 1103 1777 1891 1439 1037
65-74 F 2090 2868 3132 2801 3416
65-74 M 1391 2673 2681 2598 2385
75+ F 1141 2257 2550 3297 4250
75+ M 577 1412 1842 2324 2730
Total 13786 23200 23480 18032 19451
Table 1: Number of clean US records for each age-sex group for each year. *Those aged 75-84 and 85+ were consolidated into 75+ group due to smaller sample sizes.

Advertisements used a broad set of interest keywords including self-awareness, self-development, health, wellness and coaching but deliberately excluded terms directly related to mental health or disorders to avoid biasing recruitment towards individuals with mental health concerns. These keywords are necessary to ensure relevance under Meta and Google algorithms, which identify ‘look alike’ audiences to optimise completion rates. This approach involves a trade-off between the broader keywords which yield wider reach but higher costs per completion, while more specific targeting reduces cost but narrows the audience. We note that this recruitment approach cannot distinguish between civilian non-institutionalised populations and institutionalised populations with internet access. However, it is unlikely to include a substantial institutionalised population as the internet is often restricted for patients in these facilities, and who may not be capable of a fully independent response.

Under this targeting paradigm, assessment starts and completions were tracked for each advertisement within each platform (Google and Meta) using Urchin Tracking Module (UTM) codes and platform analytics. Advertisement spend was dynamically adjusted based on the demographic composition of respondents to maintain balanced representation across age, biological sex, and regional groups. New advertisements and sources were first piloted to assess demographic parity of age and sex against national statistics before being scaled and included in the GMP. Regional targeting provides a sample roughly proportional to the populations at the State level (Supplementary Table 2). In 2020, targets were set at a minimum of 500 responses per age-sex group, with at least 100 from each US region (Northeast, Southeast, Southwest, Midwest and West) to ensure sufficient data in all groups and to support post-stratification weighting (Table 2). In 2021-2024, these targets were proportionately increased per group given increased available budgets.

Region 18–24 25–34 35–44 45–54 55–64 65–74 75+
Northeast 614 374 277 372 760 1161 795
Southeast 809 433 351 519 1123 1700 1293
Southwest 450 218 177 188 406 507 429
Midwest 809 425 397 478 934 1367 982
West 1081 494 386 413 774 1075 893
State unidentified 3 3 4 3 0 3 0
Total 3766 1947 1592 1973 3997 5813 4392
Total 23480
Table 2: Number of clean records collected in 2022 for each age group in each region.

We note that cost per start can vary considerably across demographic groups. Cost per start refers to the cost needed to put against the advert to get someone to start the assessment. This depends on both the advertisement click rate and bids of other advertisers competing for eyeballs. This is a more black-box aspect of the recruitment as it is unknown how many times the survey prompt was served and where specifically it was served. In the US, cost per start tends to typically be highest for males aged 25-44. Cost per complete (rather than start) can also be managed by managing completion rates which relate to the assessment experience. Present completion rates among those who start the assessment range between 55% and 79% depending on demographic group and geography and fluctuate based on survey length and data waves. We also note that State wise targeting can be used in small States to supplement where statistical minimums are not met. However, granular demographic targeting is more expensive and was not used here due to budget constraints.

GMP data processing and quality checks

While bots are unlikely to pose a major issue since they are rarely served advertisements, anonymous online recruitment carries the risk of participants simply clicking through surveys without meaningful engagement. To address this, several data cleaning steps were applied. First, responses completed in under 7 minutes (the minimum time needed to read all questions) or over 60 minutes were excluded. Second, records with a standard deviation below 0.2 across all 47 rating items were removed. This indicates clicking the same response rating on almost all questions, which can very rarely be a genuine response but is more likely someone clicking through just to view the questions rather than answering genuinely. Third, responses were excluded if the participant answered ‘No’ to the question ‘Did you find this assessment easy to understand?’. Completions from organic traffic (e.g. peer referrals) were also excluded as they fell outside the managed targeting criteria. Overall, 6-26% of responses were excluded from the analysis depending on the year. After cleaning, the final sample sizes were 13,786 (2020), 23,200 (2021), 23,480 (2022) and 18,032 (2023) and 19,451 (2024). Sample sizes (total and after cleaning) for each age-sex group are shown in Table 1 and Supplementary Table 1 and for each age-regional group (example data from 2022 only) in Table 2. Samples sizes for individual analyses varied further depending on specific questions or “prefer not to say” responses.

Comparison Against the ACS, HPS and ATP Data

GMP US survey data was compared against data from the ACS, HPS and ATP. As the GMP collects data across a wide variety of demographics, cultural, lifestyle and life experience variables, only questions with exact or near-exact matches in the ACS, HPS and ATP were selected for inclusion. This included educational attainment and marital status by age and biological sex (GMP vs ACS, 2022-2024); percentage seeking treatment for mental health problems (GMP vs HPS, 2020-2023); and number of close friends (GMP vs ATP, 2023).

For each of these GMP measures, the proportion of respondents selecting each answer option was calculated for each age and biological sex group. A weighted national average was then computed using United Nations (UN) Population Statistics [24] to reflect the age distribution of the US population. For the comparator surveys (ACS, HPS, ATP), age-sex breakdowns or national aggregates were downloaded directly from official sources.

Comparison of marital status and education attainment data in GMP and ACS

ACS 2022-2024 data were downloaded from the ACS public data site [25], specifically tables S1501 (Educational Attainment; N=~3 million) and B12002 (Marital Status; N=~3 million) which include data from household persons as well as group quarters where institutionalised individuals comprise ~1.4%. Table S1501 reports the percentage of the population with ‘High School or Higher’ and ‘Bachelor’s or Higher’. To align with these categories, GMP data (2022: n = 22,480; 2023: n = 18,032; 2024: n = 19,451), were aggregated as follows: the percentages with Bachelor’s degree, master’s degree and PhD degree were summed to yield ‘Bachelor’s or higher’, while the inclusion of high school and associate degrees formed the ‘High School or Higher’ category.

ACS marital status options included: Never Married, Married, Married/Separated, Divorced, Widowed. GMP options were: Single (Never Married), In a relationship, Married/Civil Partnership, Divorced/Separated, Widowed, Prefer not to say (2022: n = 22,480; 2023: n = 18,032; 2024: n=19,451). To enable the closest comparison, data were aggregated as follows: (i) ACS Never Married matched to GMP ‘Not Married’ which included Single (Never Married) + In a relationship; (ii) ACS Divorced + Married/Separated matched to GMP Divorced/Separated; (iii) ACS Married (except separated) matched to GMP Married/Civil Partnership; (iv) ACS Widowed matched to GMP Widowed. We note that (i) and (iii) are not exact matches but rather the closest matches between the GMP and ACS options. ‘In a relationship’ is typically interpreted as dating someone and may include those who were previously married and therefore would be a larger group than those who were never married.

Comparison of mental health treatment status in GMP and HPS

The percentage of individuals seeking treatment for mental health concerns, as captured by HPS from January 2020 to October 2023 (N = 2,036,992), was compared with equivalent data from the GMP across the same time period (N=70,800 after excluding ‘Prefer not to say’ responses).

The HPS asked the following questions:

HPS1: At any time in the last 4 weeks, did you take prescription medication to help you with any emotions or with your concentration, behaviour, or mental health? Yes/No

HPS2: At any time in the last 4 weeks, did you receive counselling or therapy from a mental health professional such as a psychiatrist, psychologist, psychiatric nurse, or clinical social worker? Include counselling or therapy online or by phone. Yes/No

While GMP asked:

GMP1: Are you presently undergoing treatment for any mental health challenges? Yes/No/Prefer not to say

We compared the percentage of GMP respondents answering ‘Yes’ to GMP1 with the percentage of HPS respondents answering ‘Yes’ to either HPS1 or HPS2. Comparisons were made across age-sex weighted national estimates (2020–2023) and by age group for 2022. HPS results by age and year were downloaded directly from the CDC [26] and annual estimates were averaged across reporting periods within each year. HPS reported age bands as: 18-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+. GMP age bands were: 18-24, 25-34, 35-44, 45-54, 55-64, 65-74, 75-84. 85+ (with 75-84 and 85+ aggregated into 75+). Due to HPS reporting only pre-aggregated data, exact alignment of age bands was not possible.

Comparison of Number of Close Friends in GMP and ATP

The average percentage of the population reporting each number of close friends (from 0 to 5+) in the ATP from July 2023 (N=5,057) was compared with GMP data collected between January 1st and November 30th, 2023 (N=19,857 after excluding blanks and values over 100). The ATP question: ‘Not counting your family, how many close friends do you have?’ used answer options ranging from 0 to ‘10 or more’. ATP results were weighted to reflect the US adult population by sex, race, ethnicity, education, and other demographic categories. Equivalent percentages were computed from GMP data for the similar question: ‘How many close friends do you have?’ with a free-text numeric response field. GMP results were weighted by age and sex using UN population statistics to reflect national proportions.

Results

Demographic Trends in the ACS Compared to GMP

Figure 1 compares educational attainment by age group for data obtained in 2022 (ACS: N=~3.5M; GMP: N=22,480), 2023 (ACS: N=~3.5M; GMP: N=18,032) and 2024 (ACS: N=~3.5M; GMP: N=19,451). Overall, GMP trends aligned well with those from the ACS with a few significant differences. First, among those 65 and older, the fraction reporting ‘High school or higher’ and ‘Bachelors or higher’ were higher in GMP by ~5% and 10%, respectively, across all years. Second, the proportion of respondents reporting ‘Bachelor’s or higher’ varied by ±3-6% between the GMP sample and ACS across other age groups in 2023 and 2024 but was within 0.2% for 2022, indicating some fluctuation. Standard errors of the mean (Supplementary Table 3) for GMP ‘High school or higher’ ranged from 0.5% to 1.3% across age-year categories while averages across age categories ranged from 0.8% to 1.0% across years. Standard errors of the mean for GMP ‘Bachelor’s or higher’ ranged from 0.8% to 2.5% across age-year categories while averages across age categories ranged from 1.2% to 1.7% across years.

Figure 1: Comparison of population estimates (%) for educational attainment by age group using data from the ACS (dotted lines) and GMP (solid lines) in 2022 (A), 2023 (B) and 2024 (C).

Figure 2 shows a comparison of marital status by age for GMP and ACS for the years 2022, 2023 and 2024. Here the Divorced and Widowed groups, which were identically matched categories showed a near-perfect match with no significant difference. The GMP ‘Not Married’ followed a similar trend overall but was 5-7% higher than ACS ‘Never Married’ in the 35-54 age range. Conversely, GMP ‘Married/Civil Partnership’ was 5–7% lower than ACS ‘Married/Spouse Present’ but also followed the same trend. Standard errors of the mean (Supplementary Table 3) were lowest for GMP Widowed (~0.1% to 0.9% across age-year categories, average of ~0.5% across age categories for each year) and highest for GMP Married/Civil Partnership (0.9% to 2.6% across age groups, average of 1.7%, with cross-age averages ranging from 1.2% to 1.8% across years).

Figure 2: Comparison of population estimates (%) for marital status by age group using data from the ACS (dotted lines) and GMP (solid lines) in 2022 (A), 2023 (B) and 2024 (C).

Reported mental health treatment-seeking behaviour in HPS compared to GMP

Figure 3 compares trends of the percentage of adults seeking professional mental health treatment over time (2020–2023), and by age for 2022, between the HPS and GMP. While the specific questions were similar, the GMP asks about ‘current’ treatment without specifying type of treatment, whereas HPS asked whether participants had taken prescription medication and/or received therapy/counselling in the past 4 weeks. Nonetheless, the data provides a broadly similar comparison that can determine if GMP oversamples for individuals with mental health challenges. As shown in figure 3A, GMP’s age-sex weighted national estimates aligned well with HPS data across all years, with differences within ±1% of the national estimates except in 2021 when GMP was 5% higher. Figures 3B and 3C display 2022 treatment estimates by age group from HPS and GMP, respectively. HPS data tables use different age categories (e.g. 30-39, 40-49) than GMP (e.g. 35-44, 45-54), precluding a direct comparison. However, overall trends show that GMP reported generally higher treatment rates among adults aged 25-54 (average 8% higher; range 6-10%) and lower rates among adults aged 70+ (average 5% lower; range 4-7%). Standard errors of GMP values ranged from 0.7% for age 75+ to 1.5% and 1.7% for age 25-34 and 35-44, respectively.

Figure 3: (A) National estimates from 2020 to 2023: GMP (black) reflects the percentage currently receiving treatment; HPS (grey) reflects the percentage taking prescription medication and/or receiving counselling or therapy in the past four weeks. (B) HPS 2022 data: percentage of adults by age group who took prescription medication and/or received therapy in the past four weeks. (C) GMP 2022 data: percentage of adults by age group currently undergoing treatment for a mental health problem.

National Trends of Close Friendships in the ATP Compared to GMP

Figure 4 compares the number of close friends reported in the ATP in July 2023 with annual data from GMP for the same year. While the overall distribution patterns were broadly similar, some key differences emerged. GMP respondents were more likely to report having two or fewer close friends (average of 15%) compared to ATP respondents (average 10%), a 5% difference. Conversely, 28% of GMP respondents reported having 5 or more friends, compared to 38% in the ATP, a 10% difference.

Figure 4: Comparison of population estimates (%) for the number of close friends (0 to 5+) reported in the ATP in July 2023 (black) and GMP data (grey) for 2023.

Discussion

Principal results

This study demonstrates that US data, collected via the GMP using quota-based dynamic online ad targeting (Q-DOAT), closely aligns with national trends captured by rigorously stratified, probability-based surveys such as the ACS, HPS ATP. This alignment holds across a diverse range of variables, including educational attainment, marital status, mental healthcare utilisation and number of close friendships, suggesting that anonymous data collected through Q-DOAT can be reliably used to explore population rates in addition to relationships between factors in the general population, further establishing the quality of the GMP data. These findings are particularly relevant in light of the challenges associated with traditional probability-based surveys, which are often logistically complex, time-consuming, costly, increasingly affected by non-response, and difficult to scale globally [2731]. In contrast, the GMP offers several key advantages: it can recruit participants rapidly (currently 1000-2000 globally per day); is 10 to 20 times more cost-effective (average cost per respondent ranges from $0.15 to $10 depending on region and demographic); is globally scalable (currently runs in 85 countries); can adapt to changing societal trends and events; and can readily target specific populations of interest. Furthermore, when asking about potentially sensitive or stigmatising issues, such as those relating to mental health, its response anonymity helps address concerns over data privacy or fear of self-disclosure. Altogether, this positions GMP as an easily scalable and flexible platform for tracking national trends, and in particular emerging trends. These findings also contribute to the growing body of evidence supporting the use of online recruitment channels such as Meta and Google Ads for health-related research, particularly when targeting strategies are dynamic and responsive to ongoing demographic profiles [13, 14, 32, 33].

Non-response bias and limitations

Understanding non-response bias is critical, as all surveys are subject to biases shaped by both their topic and mode of delivery. Although the GMP is designed to reach the general population rather than specifically targeting individuals with mental health concerns, its recruitment relies on internet use and interest-based keywords such as self-awareness, health and wellness which raises the possibility of bias related to personal interest in these areas. Despite strong alignment with national data, several differences emerged between GMP and comparator surveys.

First, the proportion with Bachelor’s or higher in the age 65+ category was consistently higher by 10% in GMP compared to ACS, which likely arises because elderly people online tend to be more educated. In addition, GMP reported fluctuations compared to ACS between ages 25-45 years in the range of 3-6%. Small inconsistent differences in these age categories may simply represent statistical fluctuation in the GMP data which is substantially smaller in size compared to ACS.

Second, GMP “Not married” which included “Single (Never Married)” and “In a relationship” was 5-7% higher than ACS “Never Married” for ages above 25. This may be due to the category differences where a fraction of those who are not married but ‘In a relationship’ (~15% of adults age 25 and above) could have been previously married. There may also be a contribution by the 3% of GMP respondents who chose ‘Prefer not to say’. However, it is also possible that GMP data has a small 5-7% bias towards unmarried individuals.

Third, GMP data showed 5-7% higher treatment-seeking rates among adults aged 25-54 compared to HPS. This finding is consistent with prior studies reporting a greater representation of individuals with mental health challenges in similar surveys [33, 34]. However, alternative explanations are also possible. For example, GMP’s broader treatment definition includes all treatments, whereas the HPS focuses only on prescription medication and counselling. Some fraction of the difference may therefore reflect those undergoing other types of treatment (e.g. brain stimulation, neurofeedback). Nonetheless, while these factors may explain some of the observed differences, two non-mutually exclusive possibilities remain that cannot be ruled out: that there is a small non-response bias towards individuals with elevated mental health risk in GMP data or that there is an opposite non-response bias in the HPS, which is not strictly anonymous and may therefore deter those with mental health problems. We also note that the 2024 National Survey on Drug Use and Health (NSDUH) survey conducted by Substance Abuse and Mental Health Services Administration (SAMHSA) reported that 22.9% of adults sought mental health treatment in the past year. This is a lower number than either GMP or HPS despite the much longer time frame of treatment. This could be due to the in-person interview methodology of the NSDUH where those with substance use problems may prefer not to participate or respond with a social desirability bias.

Fourth, GMP respondents reported fewer close friends compared to the ATP. Here again, this may be due to differences in non-response bias between the two surveys. Online participants in the GMP may be less likely to socialise and may have fewer close friends. Conversely, ATP participants who are recruited for broader civic engagement research may be more socially connected due to a higher degree of civic mindedness.

Altogether differences in the range of 5-7% between certain GMP data elements and other surveys reflect the bounds of the accuracy range, particularly within age groups. When aggregating across age groups for population-level estimates, however, these differences are mitigated with accuracy within 1-3%. In addition, post-stratification by education, ethnicity and other demographic factors in alignment with the ACS could be used to further increase accuracy.

Altogether, the overall alignment and consistency across multiple years positions GMP as a relevant and valuable resource of population data across a wide range of mental health and wellbeing and social factors. As the recruitment methods are relatively consistent over time, year-on-year changes in GMP data can still provide reliable estimates of the magnitude of change. However, as the GMP is anonymously obtained, it has limitations on longitudinal analysis where linkages are available only for a subset of individuals who provide an email address and through encrypted IP addresses which are approximate in nature.

Dynamic vs static strategies

It is important to emphasise that these findings should not be taken to imply that all internet-based surveys using online recruitment strategies produce nationally representative samples. The success of the GMP, in aligning with national trends is largely due to the use of Q-DOAT which differs significantly from typical static river sampling strategies [21]. Q-DOAT involves continuous optimisation of demographic targeting, keyword selection, and other ad parameters, based on real-time analytics of response demographics and survey completion rates. This method requires ongoing monitoring and frequent adjustment, supported by a sophisticated, actively managed analytical system integrating multiple streams of information to manage a substantial advertising infrastructure, currently involving over 800 campaigns globally with diverse targeting criteria. The recruitment strategy used by GMP has been iteratively refined through numerous experimental optimisations to achieve broad population coverage.

By contrast, many online studies continue to report significant sampling biases e.g. [34] and emphasise the need for careful targeting and advertisement creation. For example, recruitment based on mental health-related search behaviour may increase response rates [33], but would likely overrepresent individuals experiencing mental health difficulties, leading to inflated estimates of treatment-seeking or distress compared to national benchmarks such as those reported by the HPS.

GMP data beyond the United States

GMP presently operates in 23 languages across 85+ countries, although sample sizes vary by country. While the findings presented here are specific to the US, we note that the same Q-DOAT methodology is used across the world, suggesting the potential for similar population-level alignment elsewhere. However, it must be noted that GMP recruits only from the internet-enabled population. In the US, where internet penetration is 94%, the vast majority of this population can be reached through Google and Meta, making national representation more feasible. In countries with lower internet access, GMP data will increasingly diverge from true national statistics, particularly where internet use is concentrated among specific subpopulations. Future work will assess GMP data from other countries in relation to nationally available benchmarks for internet-connected populations. However, it is important to note that reliable comparative statistics on online populations in many non-Western countries are currently limited [35].

Contribution and conclusion

This study contributes to the evolving field of population data science by demonstrating that GMP population data collected through the Q-DOAT method aligns well with benchmark US surveys that rely on probability-based sampling. These findings suggest that the GMP can generate demographically representative samples in the US, establishing Global Mind data as a valuable resource for research into mental health trends and their relationship to lifestyle and life context, and supporting its use as a scalable, real-time platform for monitoring mental health and broader population trends. As digital recruitment methods become more prevalent, evaluating their validity is critical to ensure they can responsibly inform population-level research and policymaking. The GMP approach offers a low-cost, inclusive alternative to traditional surveys, which is particularly valuable where conventional methods are resource-intensive or slow. This has important implications for public health, enabling more agile responses to emerging mental health risks and improving the reach and timeliness of population data infrastructures.

More broadly, this work contributes to the evolving field of population data science by illustrating how dynamic, non-probability sampling can support global mental health research. Given the growing burden of mental health conditions, particularly among younger populations [3638], there is a pressing need for innovative, scalable data collection strategies, something also noted by Sanchez and colleagues [35]: ‘Developing new strategies to increase recruitment for mental health research is essential to addressing the field’s most pressing problems’.

Funding

This work was supported by funding from Sapien Labs.

Acknowledgements

Joseph Taylor and Tara Thiagarajan developed the data acquisition methodology. Oleksii Sukhoi carried out the analysis. Jennifer Newson and Tara Thiagarajan drafted the manuscript. All authors approved the final version. With thanks to the Sapien Labs team for assistance with data infrastructure.

Data availability statement

The full dataset from the Global Mind Project is freely available for not-for profit purposes from the Sapien Labs Researcher Hub. Access can be requested here: https://sapienlabs.org/global-mind-project/researcher-hub/

Ethics statement

The language, recruitment methods and assessment were approved by the Health Media Lab Institutional Review Board (HML IRB; OHRP Institutional Review Board #00001211, Federal Wide Assurance #00001102, IORG #0000850). Participants took part in the online survey voluntarily, anonymously, and without any financial compensation. Participants consented to take part by clicking on a start button after reading a detailed privacy policy.

Conflicts of interest

None.

References

  1. Banerjee A, Chaudhury S. Statistics without tears: Populations and samples. Ind Psychiatry J 2010;19:60–5. 10.4103/0972-6748.77642

    10.4103/0972-6748.77642
  2. Levy PS, Lemeshow S. Sampling of Populations: Methods and Applications. John Wiley & Sons; 2013.

  3. Data Reportal. Digital around the world [Internet]. 2023; Available from: https://datareportal.com/global-digital-overview

  4. Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, et al. Summary Report of the AAPOR Task Force on Non-probability Sampling. Journal of Survey Statistics and Methodology 2013 Nov.;1:90–143. 10.1093/jssam/smt008

    10.1093/jssam/smt008
  5. Birnbaum MH. Human Research and Data Collection via the Internet. Annu. Rev. Psychol. 2004 Feb.;55:803–32. 10.1146/annurev.psych.55.090902.141601

    10.1146/annurev.psych.55.090902.141601
  6. Cornesse C, Blom AG. Response Quality in Nonprobability and Probability-based Online Panels. Sociological Methods & Research 2020 May;004912412091494. 10.1177/0049124120914940

    10.1177/0049124120914940
  7. Couper MP. Review: Web Surveys: A Review of Issues and Approaches*. Public Opinion Quarterly 2000 Feb.;64:464–94. 10.1086/318641

    10.1086/318641
  8. Couper MP. Issues of Representation in eHealth Research (with a Focus on Web Surveys). American Journal of Preventive Medicine 2007 May;32:S83–9. 10.1016/j.amepre.2007.01.017

    10.1016/j.amepre.2007.01.017
  9. Dutwin D, Buskirk TD. Apples to Oranges or Gala versus Golden Delicious?: Comparing Data Quality of Nonprobability Internet Samples to Low Response Rate Probability Samples. Public Opinion Quarterly 2017 Apr.;81:213–39. 10.1093/poq/nfw061

    10.1093/poq/nfw061
  10. Fricker RD. Sampling Methods for Online Surveys [Internet]. In: The SAGE Handbook of Online Research Methods. 1 Oliver’s Yard, 55 City Road London EC1Y 1SP: SAGE Publications Ltd; 2017 [cited 2023 12]. p. 162–83. Available from: https://sk.sagepub.com/Reference/the-sage-handbook-of-online-research-methods-2e/i1450.xml. 10.4135/9781473957992.n10

    10.4135/9781473957992.n10
  11. Goel S, Obeng A, Rothschild D. Non-Representative Surveys: Fast, Cheap, and Mostly Accurate. Working Paper 2016;

  12. Kennedy C, Mercer A, Keeter S, Hatley N, McGeeney K, Gimenez A. Evaluating Online Nonprobability Surveys. 2016;

  13. Schneider D, Harknett K. What’s to Like? Facebook as a Tool for Survey Data Collection. Sociological Methods & Research 2022 Feb.;51:108–40. 10.1177/0049124119882477

    10.1177/0049124119882477
  14. Thornton L, Batterham PJ, Fassnacht DB, Kay-Lambkin F, Calear AL, Hunt S. Recruiting for health, medical or psychosocial research using Facebook: Systematic review. Internet Interventions 2016 May;4:72–81. 10.1016/j.invent.2016.02.001

    10.1016/j.invent.2016.02.001
  15. Whitaker C, Stevelink S, Fear N. The Use of Facebook in Recruiting Participants for Health Research Purposes: A Systematic Review. Journal of Medical Internet Research 2017 Aug.;19:e7071. 10.2196/jmir.7071

    10.2196/jmir.7071
  16. Glazer JV, MacDonnell K, Frederick C, Ingersoll K, Ritterband LM. Liar! Liar! Identifying eligibility fraud by applicants in digital health research. Internet Interv 2021 Sept.;25:100401. 10.1016/j.invent.2021.100401

    10.1016/j.invent.2021.100401
  17. Wang J, Calderon G, Hager ER, Edwards LV, Berry AA, Liu Y, et al. Identifying and preventing fraudulent responses in online public health surveys: Lessons learned during the COVID-19 pandemic. PLOS Glob Public Health 2023;3:e0001452. 10.1371/journal.pgph.0001452

    10.1371/journal.pgph.0001452
  18. Newson JJ, Pastukh V, Thiagarajan TC. Assessment of Population Well-being With the Mental Health Quotient: Validation Study. JMIR Ment Health 2022 Apr.;9:e34105. 10.2196/34105

    10.2196/34105
  19. Newson JJ, Thiagarajan TC. Assessment of Population Well-Being With the Mental Health Quotient (MHQ): Development and Usability Study. JMIR Ment Health 2020 July;7:e17935. 10.2196/17935

    10.2196/17935
  20. Pedersen ER, Kurz J. Using Facebook for Health-related Research Study Recruitment and Program Delivery. Curr Opin Psychol 2016 May;9:38–43. 10.1016/j.copsyc.2015.09.011

    10.1016/j.copsyc.2015.09.011
  21. Taylor J, Topalo O, Newson JJ, Thiagarajan TC. Data Collection and Management Infrastructure of the Global Mind Project. Preprint 2025;

  22. US Census Bureau. American Community Survey (ACS) [Internet]. 2023; Available from: https://www.census.gov/programs-surveys/acs

  23. US Census Bureau. Household Pulse Survey: Measuring Emergent Social and Economic Matters Facing U.S. Households [Internet]. 2024; Available from: https://www.census.gov/data/experimental-data-products/household-pulse-survey.html.

  24. United Nations. World Population Prospects 2022 [Internet]. 2022; Available from: https://population.un.org/wpp/.

  25. US Census Bureau. American Community Survey Data [Internet]. 2023; Available from: https://www.census.gov/programs-surveys/acs/data.html.

  26. CDC. Mental Health Care: Household Pulse Survey [Internet]. 2022; Available from: https://www.cdc.gov/nchs/covid19/pulse/mental-health-care.htm.

  27. Kohut A, Keeter S, Doherty C, Dimock M, Christian L. Assessing the Representativeness of Public Opinion Surveys. Pew Research Centre. For the People and the Press 2012;

  28. US Government Accountability Office. The American Community Survey: Accuracy and Timeliness Issues [Internet]. 2002; Available from: https://www.gao.gov/products/gao-02-956r.

  29. Brick JM, Williams D. Explaining Rising Nonresponse Rates in Cross-Sectional Surveys. The ANNALS of the American Academy of Political and Social Science 2013 Jan.;645:36–59. 10.1177/0002716212456834

    10.1177/0002716212456834
  30. Keeter S, Kennedy C, Dimock M, Best J, Craighill P. Gauging the Impact of Growing Nonresponse on Estimates from a National RDD Telephone Survey. Public Opinion Quarterly 2006 Jan.;70:759–79. 10.1093/poq/nfl035

    10.1093/poq/nfl035
  31. Leeper TJ. Where Have the Respondents Gone? Perhaps We Ate Them All. Public Opinion Quarterly 2019 July;83:280–8. 10.1093/poq/nfz010

    10.1093/poq/nfz010
  32. Astley CM, Tuli G, Mc Cord KA, Cohn EL, Rader B, Varrelman TJ, et al. Global monitoring of the impact of the COVID-19 pandemic through online surveys sampled from the Facebook user base. Proc Natl Acad Sci U S A 2021 Dec.;118:e2111455118. 10.1073/pnas.2111455118

    10.1073/pnas.2111455118
  33. Batterham PJ. Recruitment of mental health survey participants using Internet advertising: content, characteristics and cost effectiveness. Int J Methods Psychiatr Res 2014 Feb.;23:184–91. 10.1002/mpr.1421

    10.1002/mpr.1421
  34. Lee S, Torok M, Shand F, Chen N, McGillivray L, Burnett A, et al. Performance, Cost-Effectiveness, and Representativeness of Facebook Recruitment to Suicide Prevention Research: Online Survey Study. JMIR Ment Health 2020 Oct.;7:e18762. 10.2196/18762

    10.2196/18762
  35. Sanchez C, Grzenda A, Varias A, Widge AS, Carpenter LL, McDonald WM, et al. Social media recruitment for mental health research: A systematic review. Compr Psychiatry 2020 Nov.;103:152197. 10.1016/j.comppsych.2020.152197

    10.1016/j.comppsych.2020.152197
  36. Sapien Labs. Mental State of the World 2020 [Internet]. 2021. Available from: https://mentalstateoftheworld.report/msw-2020/.

  37. Twenge JM, Cooper AB, Joiner TE, Duffy ME, Binau SG. Age, period, and cohort trends in mood disorder indicators and suicide-related outcomes in a nationally representative dataset, 2005–2017. Journal of Abnormal Psychology 2019 Apr.;128:185–99. 10.1037/abn0000410

    10.1037/abn0000410
  38. CDC. Youth Risk Behavior Survey: Data Summary & Trends Report [Internet]. 2023. Available from: https://www.cdc.gov/media/releases/2023/p0213-yrbs.html

Article Details

How to Cite
Taylor, J., Sukhoi, O., Newson, J. J. and Thiagarajan, T. C. (2026) “Global Mind Project data in the United States: A comparison with national statistics”, International Journal of Population Data Science, 11(1). doi: 10.23889/ijpds.v11i1.3148.