While synthetic data is a hot topic in market research, the industry hasn’t yet settled on standard terminology to describe synthetic research participants. Virtual panelists, digital twins, and silicon sample are just a few examples that are currently being used.
Regardless of what you call them, there’s no denying that these AI-powered participants are a powerful tool for simulating real-world audiences and helping organizations generate insights at speed.
If you’re considering using synthetic participants, asking the right questions upfront is key to evaluating which approach best supports your research and business needs. With that in mind, here are 4 questions to ask when considering synthetic solutions.
1. What level of granularity has been modeled? Is the digital entity built to mimic group-level or individual-level characteristics?
Why this is important: Group or segment-level synthetic entities can only provide you with the average answer for that group or segment. This average may be useful for certain types of insights, especially qualitative insights or those that “bring a segment to life” for your stakeholders.
But synthetic research participants created to mimic individual-level responses allow you to go further and obtain samples that have the type of heterogeneity or diversity of human samples. At the same time, the responses from individual-level synthetic research participants can be aggregated into groups to get a group average.
2. Do you need to conduct qualitative or quantitative research? And which can the vendor best support?
Why this is important: Synthetic research participants created at the individual level are crucial for the sample diversity required for quantitative survey research. While in the future, we will see the industry provide direct predictions more frequently, those who want results from samples that mimic today’s quantitative survey samples will want synthetic research participants built at the individual-level.
3. How are the profile(s) of synthetic research participants created? Are they based entirely on already observed data? Are any (or all) of these profile characteristics predicted?
Why this is important: This is an especially important question when choosing vendors who offer individual-level synthetic research participants. A digital entity with profile data that is exactly the same as the observed profile of a real person is a digital twin – a cloned profile of a specific person.
In contrast, a digital entity with most or all profile attributes predicted does not have a cloned profile. Either approach can be useful, but in the case of digital twins, it’s important to ensure that the privacy of the individuals who provided the profile data cannot be violated. There must be no way to reverse engineer the synthetic profile to identify the person that provided the original data. In addition, digital twins will only have the profile characteristics from the prior survey response or prior observations, which can be a limitation in certain situations.
4. How are responses generated? Are they prompt based or generated as part of an agentic system?
Why this is important: Using a single prompt (including those with detailed instructions) can be useful when what you want is to “bring to life” the conversational tone of a segment. However, it isn’t the best approach for a complex process of generating a synthetic survey response, checking that response in multiple ways, and ensuring that individual results aggregate into a sample that mimics the type of diversity we see with humans. And chaining 2-3 prompts together doesn’t solve the issue either.
No matter what the vendor calls their synthetic digital entities, the answers to these four questions can help you understand differences across vendors.
For those interested, Toluna Synthetic Personas:

Have individual-level profiles

With 250+ attributes, which are predominantly predicted from a fully anonymized, non-reversible seed profile from a human panelist

Operate within an agentic system that includes numerous checks and balances before a generated survey response is allowed to populate the data

Were developed initially as synthetic survey takers for quantitative research but will be extended to qualitative chats in the future.
To learn more about Toluna Synthetic Personas, visit our webpage or contact one of our experts.
