Crafting Virtual Personas for Language Models: An Anthology of Backstories
Discover how the innovative method of crafting virtual personas can revolutionize research in social sciences through rich, narrative-driven contexts.
We introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience.
What does it mean for large language models (LLMs) to be trained on massive text corpora, collectively produced by millions and billions of distinctive human authors?
In “Language Models as Agent Models”, compelling evidence suggests that recent language models could be considered models of agents: given a textual context, LLMs generate conditional text that reflects the traits of an agent likely to have produced that context. This indicates that, with proper conditioning, LLMs can mimic the responses of a specific human voice, moving beyond the usual mixture of voices. If achieved, this ability could significantly impact user research and the social sciences—conditioned language models as virtual personas may serve as cost-effective pilot studies supporting ethical best practices in research, such as the Belmont principles of justice and beneficence.
In this work, we introduce Anthology, an approach for steering LLMs to create representative, consistent, and diverse virtual personas through richly detailed life narratives of individuals as the conditioning context for the models. In doing so, we also present techniques to generate backstories using LLMs themselves, creating massive datasets that cover a broad spectrum of human demographics. By anchoring language models in realistic backstories, Anthology aims to enable LLMs to simulate individual human samples with greater fidelity, measured in their ability to match human response distributions and consistencies.
Our Approach: Anthology
Conditioning Language Model Generation with Individual Life Narratives
A key limitation of earlier methods for directing LLMs toward virtual personas has been their struggle to reliably approximate individual human samples. Past approaches have prompted LLMs with broad demographic data, e.g., “I am a 25-year-old from California with less than a high school education,” essentially creating responses based on demographic tuples. This method only achieves a population level approximation, leading to:
- Responses often defaulting to stereotypical or prototypical portrayals, since conditioning is based purely on demographic factors (e.g., race and gender)
- The inability to provide meaningful metrics of interest, such as covariance and statistical significance, since individual responses are necessary for such calculations
Anthology addresses the individual approximation by conditioning responses with comprehensive backstories. These narratives encapsulate both implicit and explicit markers of personal identity, encompassing demographic traits, cultural references, socioeconomic backgrounds, and life philosophies. Our methodology involves generating extensive backstories that represent diverse demographics through language models queried with open-ended prompts like, “Tell me about yourself.” Subsequently, we align virtual personas derived from these backstories with actual survey data.
Results: Closer Approximation of Public Opinion Polls
For evaluation, we assess the effectiveness of various methods for conditioning virtual personas in relation to three Pew Research Center ATP surveys: Waves 34, 92, and 99.
Results on approximating human responses for Pew Research Center ATP surveys. Boldface and underlined results indicate values closest and the second closest to those of humans, respectively.
To measure success in approximating human samples with virtual personas, we utilize the following metrics:
- Average Wasserstein distance (WD) between response distributions indicates representativeness
- Frobenius norm between correlation matrices serves as a consistency measure
- Cronbach’s alpha, providing an additional metric for internal consistency
Before analyzing virtual subjects, we estimate the lower bounds of each evaluation metric by dividing the human population into two equal groups randomly and calculating their metrics. We average values from 100 iterations to represent the lower-bound estimates.
Our findings consistently demonstrate that Anthology surpasses other conditioning methods across all metrics, for both the Llama-3-70B and the Mixtral-8x22B models. When contrasting two matching methods, the “greedy matching” approach typically exhibits better performance in average Wasserstein distance across all Waves. We attribute variations in matching performance to the one-to-one correspondence condition of maximum weight matching alongside the limited virtual user pool. Specifically, weight assignment to matched virtual subjects in maximum weight matching tends to be lower than in greedy matching; this discrepancy can result in lower demographic similarity between matched human and virtual users, contrasting with the outcome from greedy matching. These results imply that the richness of the backstories generated enhances the complexity of responses compared to baseline methods.
Final Thoughts
Anthology presents a promising strategy in conditioning virtual personas within LLMs, potentially transforming user research, public opinion surveys, and additional social science fields by providing scalable, and occasionally ethical alternatives to traditional human surveys. However, despite its advantages, the application of Anthology in social sciences raises considerations concerning biases and privacy. As such, results derived from our method should be interpreted cautiously.
Looking ahead, we envision our approach benefiting from an expanded and varied set of backstories that display consistent life narratives of individuals. Additionally, future work could focus on facilitating free-form response generation to support more natural persona simulations beyond structured formats like multiple-choice surveys. Lastly, a thrilling avenue in the use of LLMs for behavioral studies would involve simulating long-term effects, allowing virtual personas to model and retrospectively analyze changes over time.
All these pathways present a multitude of technical challenges; we welcome individuals interested in collaboration or wishing to discuss our research further!
Learn more about our work: link to full paper
/virtual-personas-for-language-models-via-an-anthology-of-backstories-the-berkeley-artificial-intelligence-research-blog