AI Persona Media Testing: Are You Meee?

Afraid Of Robots? You Should Be. They’re Real And Using You To Change Elections And Media Freedom. See it go down in real time.

Feeling a little meh about protecting your online life from hackers? Think you don’t have much to hide? Turns out a weak password can accidentally make you a mercenary for malicious organizations and actions in real life.

That means, by not taking care with your online identity, you could be accidentally shutting down websites with strong independent voices, influencing political events or silencing opposing voices in media, around the world.

We’re validating content with known outcomes. Granted we’re using deterministic methods, but we feel like we’re still going above and beyond here. Back to munging data. We filtered out posts with multiple images, multiple variants, etc. We made this as focused on text as possible. Our tests were ran on only a change in headline or excerpt. The persona was exposed to the lede, headline, and except. There’s all sorts of issues with this still; humans are visual and we’re remove images as well as text positioning. This also means we’re making a lot of assumptions that the content itself was the main motivation for engagement.

Let’s define our outcomes. We’re interested in seeing if the personas can pick the correct content. So we need to test against the binary outcome, e.g. did we pick the correct winner? I’ll jump to conclusions here and let you all know; NO. Personas cannot pick winners, at least in the way we’re using them in this personified approach. Our other option then is to look at the number of clicks and compare whether the personas can vote in the correct direction. We have a better outcome now.

We need to define how we get the nonsense machines, I mean personas, to give us an observation; something to measure. We’ve variously measured both a binary outcome (1 or 0) and a score. LLM generated scores in our experience are very questionable. It’s in our opinion, weird autocomplete has no concept of numbers or sequences for that matter. The range we chose was from 1 to 5.

Score is the desire to read more with a score of 1-5.

In reality we mostly saw values between 1 and 4. We interpreted as 1 not interested and 5 meaning very interested. 4 was considered as mildly interested, although this was the top end of our scores.

We do not and have never used commercial LLMs intentionally. We recommend if your org can afford it, to host their own or source from a vendor that respects your privacy. We know several options to make this happen, naturally.

Initially, we started testing with a single persona.

Sample Data	Values
Total Tests	20
Total Variants	92
Total Observations	920

Score	Frequency
4	552
1	296
5	36
2	33
3	3

Engagement	Frequency
1	583
0	337

	Variant Rank : Click Rank
Spearman Correlation	-0.0056748761274461055

The methodology for testing was to expose the persona to the test variant, collect the observation, and rerun using a different seed value. In total the persona was exposed to the variant 10 times.

For determining our ranks, we use the rank of total positive responses and the rank of test variant clicks. Total positive responses are the total number of positive (1) engagements the persona gave for the variant. Correlation is then the correlation of the ranks. To give an idea of the ideal case, we’d want something positive and close to one. That signals our persona is able to discern 1:1 what the average person would select as engaging content. Hopefully, you can start to see the power in knowing this kind of info. However, what we actually see is almost zero to even a negative correlation. Meaning our persona can’t actually tell us anything meaningful.

One artifact we do want to call out is the correlation between impressions and click rank. It seems large enough that we wanted to call it out.

	Impressions : Click Rank
Spearman Correlation	0.19981133757124095

It’s also statistically significant with a t-value of 18.49; 8222 df; p < .00001. For the non-statistician, it’s a meaningful thing to take into account when reading this. It simply means there’s the chance that getting the content in front of more users means more clicks; a covariate. Note, we’re not really saying anything, other than it exists and could affect our interpretation of results related to persona testing.

=====================

ELI5:

Personas/Agents/Character AIs are unable to discern content that resonates with consumers.
Testing showed there was little to no association between the agent’s judgment and the number of clicks.
There’s some issues with the data that we’re using to test with.
- It appears that getting content in front of people results in a higher number clicks and not that the media is engaging.
We’re partly opening this up because we feel we’ve ok with burning our embarrassed millions. We found a digital garden and met our classic Iris dataset.
While we use the term persona, this now likely should be called agent(s). We’re keeping this since we did this work before agents became the term de rigueur.

« Raggedy App: Context Building |:| AI Persona Media Testing: Are You Mee? »