Data Collection

AI Persona Media Testing: Are You Meee?

Afraid Of Robots? You Should Be. They’re Real And Using You To Change Elections And Media Freedom. See it go down in real time.

Feeling a little meh about protecting your online life from hackers? Think you don’t have much to hide? Turns out a weak password can accidentally make you a mercenary for malicious organizations and actions in real life.

That means, by not taking care with your online identity, you could be accidentally shutting down websites with strong independent voices, influencing political events or silencing opposing voices in media, around the world.

We’re validating content with known outcomes. Granted we’re using deterministic methods, but we feel like we’re still going above and beyond here. Back to munging data. We filtered out posts with multiple images, multiple variants, etc. We made this as focused on text as possible. Our tests were ran on only a change in headline or excerpt. The persona was exposed to the lede, headline, and except. There’s all sorts of issues with this still; humans are visual and we’re remove images as well as text positioning. This also means we’re making a lot of assumptions that the content itself was the main motivation for engagement.

Let’s define our outcomes. We’re interested in seeing if the personas can pick the correct content. So we need to test against the binary outcome, e.g. did we pick the correct winner? I’ll jump to conclusions here and let you all know; NO. Personas cannot pick winners, at least in the way we’re using them in this personified approach. Our other option then is to look at the number of clicks and compare whether the personas can vote in the correct direction. We have a better outcome now.

We need to define how we get the nonsense machines, I mean personas, to give us an observation; something to measure. We’ve variously measured both a binary outcome (1 or 0) and a score. LLM generated scores in our experience are very questionable. It’s in our opinion, weird autocomplete has no concept of numbers or sequences for that matter. The range we chose was from 1 to 5.

Score is the desire to read more with a score of 1-5.

In reality we mostly saw values between 1 and 4. We interpreted as 1 not interested and 5 meaning very interested. 4 was considered as mildly interested, although this was the top end of our scores.

We do not and have never used commercial LLMs intentionally. We recommend if your org can afford it, to host their own or source from a vendor that respects your privacy. We know several options to make this happen, naturally.

Initially, we started testing with a single persona.

Sample Data Values
Total Tests 20
Total Variants 92
Total Observations 920
Score Frequency
4 552
1 296
5 36
2 33
3 3
Engagement Frequency
1 583
0 337
  Variant Rank : Click Rank
Spearman Correlation -0.0056748761274461055

The methodology for testing was to expose the persona to the test variant, collect the observation, and rerun using a different seed value. In total the persona was exposed to the variant 10 times.

For determining our ranks, we use the rank of total positive responses and the rank of test variant clicks. Total positive responses are the total number of positive (1) engagements the persona gave for the variant. Correlation is then the correlation of the ranks. To give an idea of the ideal case, we’d want something positive and close to one. That signals our persona is able to discern 1:1 what the average person would select as engaging content. Hopefully, you can start to see the power in knowing this kind of info. However, what we actually see is almost zero to even a negative correlation. Meaning our persona can’t actually tell us anything meaningful.

One artifact we do want to call out is the correlation between impressions and click rank. It seems large enough that we wanted to call it out.

  Impressions : Click Rank
Spearman Correlation 0.19981133757124095

It’s also statistically significant with a t-value of 18.49; 8222 df; p < .00001. For the non-statistician, it’s a meaningful thing to take into account when reading this. It simply means there’s the chance that getting the content in front of more users means more clicks; a covariate. Note, we’re not really saying anything, other than it exists and could affect our interpretation of results related to persona testing.

=====================

ELI5: