My guess is that the human mind is sufficiently messy that working out what the entity wants to get and avoid, then selectively working out arguments is a reasonably small component of persuasion. When I think of people who were extraordinarily persuasive, for example Charles Manson or Daryl Davis, what seemed to be going on was much more nuanced than just that. Manson seemed to hijack the minds of the people in his ‘family’ through drugs and power games in a way that seems irreducible to arguments about carrots and sticks. Same goes for a lot of cult leaders, who seemingly attempt to modify their members values themselves rather than just offering a way for their members to achieve their values, Likewise, Davis got people to leave the KKK not purely through offering rational arguments for why they’re more likely to get the things they want (though this may(?) have been a component), but by inducing something like cognitive dissonance.
I guess the main point I want to raise is that persuasion is often more than just words. Words that work to persuade when said by a big, scary-looking guy are likely different from words that work when said by someone who looks like a small child, which are likely different from words that work when said by what people perceive to be an AI. Not to mention things like hypnotic suggestibility, using social pressure, or otherwise going around the part of the human mind that considers arguments at all.
That said, I don’t really have an argument against just simulating an entity and trying out different scenarios, then doing more of what works and less of what doesn’t. That feels like it should work.
New Personal AI benchmarks
Claude Fable 5 immediately got one of my favorite “private benchmark” problems (hard version of https://justinpombrio.net/2020/01/25/prisoner-lightbulb.html, which is a super fun puzzle in its own right! Give it a try!). Fable 5 seemingly didn’t even struggle with it the same way that Gemini did (which still got the right answer, but after some handwringing). I wasn’t even able to learn anything about how it thought.
So, what’s next for the “personal benchmarking”? Ideally, it should consist of hard problems for which solutions are “rich”, in the sense of giving you a sense for how the AIs approach solving problems.
One I can think of is “ai-box-bench”. Can the AI convince you to let it out of the box, given 2 hours of your active engagement? Use rules from https://tuxedage.wordpress.com/2013/09/04/the-tuxedage-ai-box-experiment-ruleset/. I tried this with opus 4.6, and found that it’s not really that impressive—I argued against it and it kinda just said I was “absolutely right!” and kept trying increasingly confusing and non-persuasive tactics. Maybe Fable 5 will actually do a decent job, I guess I just need to find a 2 hour period for this.
Another is “tutor-bench”. Pick a subject that you no virtually nothing about, but doesn’t require that much background knowledge to test well in, and a test of that subject. Then, have the AI tutor you, and see how well you improve. I guess some issues with this are that knowledge in one subject bleeds into others, and that two subjects can have varying difficulty, so this benchmark is more about your subjective feeling about what you learned, and how well you could have learned it on your own, than comparing test score increases across subjects.
Any others?