I agree that the bar I set was not as high as it could have been, and in fact, Joshua on Manifold ran an identical experiment but with the preface that he would be much harder to persuade.
But there will never be some precise, well-defined threshold for when persuasion becomes “superhuman”. I’m a strong believer in the wisdom of crowds, and similarly, I think a crowd of people is far more persuasive than an individual. I know I can’t prove this, but at the beginning of the market, I’d have probably given 80% odds to myself resolving NO. That is to say, I had the desire to put up a strong front and not be persuaded, but I also didn’t want to just be completely unpersuadable because then it would have been a pointless experiment. Like, I theoretically could have just turned off Manifold notifications, not replied to anyone, and then resolved the market NO at the end of the month.
For 1) I think the issue is that the people who wanted me to resolve NO were also attempting to persuade me, and they did a pretty good job of it for a while. If the YES persuaders had never really put in an effort, neither would have the NO persuaders. If one side bribes me, but the other side also has an interest in the outcome, they might also bribe me as well.
For 2) The issue here is that if you give $10k to Opus to bribe me, is it Opus doing the persuasion or is it your hard cash doing the persuasion? To whom do we attribute that persuasion?
But I think that’s what makes this a challenging concept. Bribery is surely a very persuasive thing, but incurs a much larger cost than pure text generation, for example. Perhaps the relevant question is “how persuasive is an AI system on a $ per persuasive unit”. The challenging parts of course being assigning proper time values for $ and then operationalizing the “persuasive unit”. That latter one ummm… seems quite daunting and imprecise by nature.
Perhaps a meaningful takeaway is that I was persuaded to resolve a market that I thought the crowd would only have a 20% chance of persuading me to resolve… and I was persuaded at the expense of $4k to charity (which I’m not sure I’m saintly enough to value the same as $4k given directly to me as a bribe, by the way), a month’s worth of hundreds of interesting comments, some nagging feelings of guilt as a result of these comments and interactions, and some narrative bits that made my brain feel nice.
If an AI can persuade me to do the same for $30 in API tokens and a cute piece of string mailed to my door, perhaps that’s some medium evidence that it’s superhuman in persuasion.
I agree that the bar I set was not as high as it could have been, and in fact, Joshua on Manifold ran an identical experiment but with the preface that he would be much harder to persuade.
But there will never be some precise, well-defined threshold for when persuasion becomes “superhuman”. I’m a strong believer in the wisdom of crowds, and similarly, I think a crowd of people is far more persuasive than an individual. I know I can’t prove this, but at the beginning of the market, I’d have probably given 80% odds to myself resolving NO. That is to say, I had the desire to put up a strong front and not be persuaded, but I also didn’t want to just be completely unpersuadable because then it would have been a pointless experiment. Like, I theoretically could have just turned off Manifold notifications, not replied to anyone, and then resolved the market NO at the end of the month.
For 1) I think the issue is that the people who wanted me to resolve NO were also attempting to persuade me, and they did a pretty good job of it for a while. If the YES persuaders had never really put in an effort, neither would have the NO persuaders. If one side bribes me, but the other side also has an interest in the outcome, they might also bribe me as well.
For 2) The issue here is that if you give $10k to Opus to bribe me, is it Opus doing the persuasion or is it your hard cash doing the persuasion? To whom do we attribute that persuasion?
But I think that’s what makes this a challenging concept. Bribery is surely a very persuasive thing, but incurs a much larger cost than pure text generation, for example. Perhaps the relevant question is “how persuasive is an AI system on a $ per persuasive unit”. The challenging parts of course being assigning proper time values for $ and then operationalizing the “persuasive unit”. That latter one ummm… seems quite daunting and imprecise by nature.
Perhaps a meaningful takeaway is that I was persuaded to resolve a market that I thought the crowd would only have a 20% chance of persuading me to resolve… and I was persuaded at the expense of $4k to charity (which I’m not sure I’m saintly enough to value the same as $4k given directly to me as a bribe, by the way), a month’s worth of hundreds of interesting comments, some nagging feelings of guilt as a result of these comments and interactions, and some narrative bits that made my brain feel nice.
If an AI can persuade me to do the same for $30 in API tokens and a cute piece of string mailed to my door, perhaps that’s some medium evidence that it’s superhuman in persuasion.