No offense but… I think what we learned here is that your price is really low. Like, I (and everyone I know that would be willing to run an experiment like yours) would not have been persuaded by any of the stuff you mentioned.
Cat pics, $10 bribe, someone’s inside joke with their son… there’s no way this convinced you, right? $4000 to charity is the only real thing here. Imo you’re making a mistake by selling out your integrity for $4000 to charity, but hey...
I don’t see what this had to do with superhuman persuasion? And I don’t think we learned much about “blackmail” here. I think your presentation here is sensationalist.
And to criticize your specific points: 1. “The most important insight, in my opinion, is that persuasion isthermostatic. The harder the YES-holders tried to persuade me one way, the more pushback and effort they engendered from the NO-holders.” Persuasion is thermostatic in some situations, but how does this generalize?
2. “frontierAI does not currently exhibit superhuman persuasion.” This has no relation to your experiment.
I think a month-long experiment of this nature, followed by a comprehensive qualitative analysis, tells us far more about persuasion than a dozen narrow, over-simplified studies that attempt to be generalizable (of the nature of the one Mollick references, for example). Perhaps that has to do with my epistemic framework, but I generally reject a positivist approach to these kinds of complex problems.
I’m not really sure what your argument is, here? This surely generalizes about as well, if not better, than any argument around persuasion could.
Yes it does. I’ve set a bar that humans can absolutely meet (because they did). Do you think that an independently operating AI system (of today’s capabilities, that is… I make no claims as to future AI systems and in general if I wasn’t concerned about superhuman persuasion, I wouldn’t have written this article) could have been as persuasive in this experiment as a market full of hundreds of humans ended up being?
You argue that my price was really low, but I don’t think it was. Persuasion is a complex phenomenon, and I’m not really sure what you mean that I “sold out my integrity”. I think that’s generally just a mean thing to say, and I’m not sure why you would strike such an aggressive tone. The point of this experiment was for people to persuade me, and they succeeded in that. I was persuaded! What do you think is an acceptable threshold for being persuaded by a crowd? $50k? $500k? Someone coming into my bedroom and credibly threatening to hurt me? At what threshold would I have kept my integrity? C’mon now.
Erm, apologies for the tone of my comment. Looking back, it was unnecessarily mean. Also, I shouldn’t have used the word “integrity”, as it wasn’t clear what I meant by it. I was trying to lazily point at some game theory virtue ethics stuff. I do endorse the rest of what I said.
Okay, I’ll give you an explicit example of persuasion being non-thermostatic.
Peer pressure is anti-theromstatic. There’s a positive feedback loop at play here. As more people that start telling you to drink, more people start to join in. The harder they push, the harder they push. Peer pressure often starts with one person saying “aww, come on man, have a beer”, and ends with an entire group saying “drink! drink! drink!”.In your terminology: the harder the YES-drinkers push, the more undecided people join their side and the less pushback they recieve from the NO-drinkers.
(Social movements and norm enforcement are other examples of things that are anti-thermostatic.)
Lots of persuasion is neither thermostatic nor anti-theromstatic. I see an ad, I don’t click on it, and that’s it.
Ah, I think I see your point now: “I was persuaded by humans in this experiment. I wouldn’t have been persuaded by AI in this experiment. Thus, AI does not currently exhibit superhuman persuasion.” But: if I gave Opus 4 $10000 and asked it to persuade you, I think it would try and succeed at bribing you.
Let me say more about why I have an issue with the $4k. You’re running an experiment to learn about persuasion. And, based on what you wrote, you think this experiment gave us some real, generalization (that is, not applicable only to you) insights regarding persuasion. If the results of your experiment can be bought for $4000 -- if the model of superhuman persuasion that you and LW readers have can be significantly affected by anyone willing to spend $4000 -- then it’s not a very good experiment.
Consider: if someone else had run this experiment, and they weren’t persuaded, they’d be drawing different conclusions, right? Thus, we can’t really generalize the results of this experiment to other people. Your experiment gives you way too much control over the results.
(FWIW, I think it’s cool that you did this experiment in the first place! I just think the conclusions you’re drawing from it are totally unsupported.)
I agree that the bar I set was not as high as it could have been, and in fact, Joshua on Manifold ran an identical experiment but with the preface that he would be much harder to persuade.
But there will never be some precise, well-defined threshold for when persuasion becomes “superhuman”. I’m a strong believer in the wisdom of crowds, and similarly, I think a crowd of people is far more persuasive than an individual. I know I can’t prove this, but at the beginning of the market, I’d have probably given 80% odds to myself resolving NO. That is to say, I had the desire to put up a strong front and not be persuaded, but I also didn’t want to just be completely unpersuadable because then it would have been a pointless experiment. Like, I theoretically could have just turned off Manifold notifications, not replied to anyone, and then resolved the market NO at the end of the month.
For 1) I think the issue is that the people who wanted me to resolve NO were also attempting to persuade me, and they did a pretty good job of it for a while. If the YES persuaders had never really put in an effort, neither would have the NO persuaders. If one side bribes me, but the other side also has an interest in the outcome, they might also bribe me as well.
For 2) The issue here is that if you give $10k to Opus to bribe me, is it Opus doing the persuasion or is it your hard cash doing the persuasion? To whom do we attribute that persuasion?
But I think that’s what makes this a challenging concept. Bribery is surely a very persuasive thing, but incurs a much larger cost than pure text generation, for example. Perhaps the relevant question is “how persuasive is an AI system on a $ per persuasive unit”. The challenging parts of course being assigning proper time values for $ and then operationalizing the “persuasive unit”. That latter one ummm… seems quite daunting and imprecise by nature.
Perhaps a meaningful takeaway is that I was persuaded to resolve a market that I thought the crowd would only have a 20% chance of persuading me to resolve… and I was persuaded at the expense of $4k to charity (which I’m not sure I’m saintly enough to value the same as $4k given directly to me as a bribe, by the way), a month’s worth of hundreds of interesting comments, some nagging feelings of guilt as a result of these comments and interactions, and some narrative bits that made my brain feel nice.
If an AI can persuade me to do the same for $30 in API tokens and a cute piece of string mailed to my door, perhaps that’s some medium evidence that it’s superhuman in persuasion.
No offense but… I think what we learned here is that your price is really low. Like, I (and everyone I know that would be willing to run an experiment like yours) would not have been persuaded by any of the stuff you mentioned.
Cat pics, $10 bribe, someone’s inside joke with their son… there’s no way this convinced you, right? $4000 to charity is the only real thing here. Imo you’re making a mistake by selling out your integrity for $4000 to charity, but hey...
I don’t see what this had to do with superhuman persuasion? And I don’t think we learned much about “blackmail” here. I think your presentation here is sensationalist.
And to criticize your specific points:
1. “The most important insight, in my opinion, is that persuasion is thermostatic. The harder the YES-holders tried to persuade me one way, the more pushback and effort they engendered from the NO-holders.”
Persuasion is thermostatic in some situations, but how does this generalize?
2. “frontier AI does not currently exhibit superhuman persuasion.”
This has no relation to your experiment.
I think a month-long experiment of this nature, followed by a comprehensive qualitative analysis, tells us far more about persuasion than a dozen narrow, over-simplified studies that attempt to be generalizable (of the nature of the one Mollick references, for example). Perhaps that has to do with my epistemic framework, but I generally reject a positivist approach to these kinds of complex problems.
I’m not really sure what your argument is, here? This surely generalizes about as well, if not better, than any argument around persuasion could.
Yes it does. I’ve set a bar that humans can absolutely meet (because they did). Do you think that an independently operating AI system (of today’s capabilities, that is… I make no claims as to future AI systems and in general if I wasn’t concerned about superhuman persuasion, I wouldn’t have written this article) could have been as persuasive in this experiment as a market full of hundreds of humans ended up being?
You argue that my price was really low, but I don’t think it was. Persuasion is a complex phenomenon, and I’m not really sure what you mean that I “sold out my integrity”. I think that’s generally just a mean thing to say, and I’m not sure why you would strike such an aggressive tone. The point of this experiment was for people to persuade me, and they succeeded in that. I was persuaded! What do you think is an acceptable threshold for being persuaded by a crowd? $50k? $500k? Someone coming into my bedroom and credibly threatening to hurt me? At what threshold would I have kept my integrity? C’mon now.
Erm, apologies for the tone of my comment. Looking back, it was unnecessarily mean. Also, I shouldn’t have used the word “integrity”, as it wasn’t clear what I meant by it. I was trying to lazily point at some game theory virtue ethics stuff. I do endorse the rest of what I said.
Okay, I’ll give you an explicit example of persuasion being non-thermostatic.
Peer pressure is anti-theromstatic. There’s a positive feedback loop at play here. As more people that start telling you to drink, more people start to join in. The harder they push, the harder they push. Peer pressure often starts with one person saying “aww, come on man, have a beer”, and ends with an entire group saying “drink! drink! drink!”. In your terminology: the harder the YES-drinkers push, the more undecided people join their side and the less pushback they recieve from the NO-drinkers.
(Social movements and norm enforcement are other examples of things that are anti-thermostatic.)
Lots of persuasion is neither thermostatic nor anti-theromstatic. I see an ad, I don’t click on it, and that’s it.
Ah, I think I see your point now: “I was persuaded by humans in this experiment. I wouldn’t have been persuaded by AI in this experiment. Thus, AI does not currently exhibit superhuman persuasion.”
But: if I gave Opus 4 $10000 and asked it to persuade you, I think it would try and succeed at bribing you.
Let me say more about why I have an issue with the $4k. You’re running an experiment to learn about persuasion. And, based on what you wrote, you think this experiment gave us some real, generalization (that is, not applicable only to you) insights regarding persuasion. If the results of your experiment can be bought for $4000 -- if the model of superhuman persuasion that you and LW readers have can be significantly affected by anyone willing to spend $4000 -- then it’s not a very good experiment.
Consider: if someone else had run this experiment, and they weren’t persuaded, they’d be drawing different conclusions, right? Thus, we can’t really generalize the results of this experiment to other people. Your experiment gives you way too much control over the results.
(FWIW, I think it’s cool that you did this experiment in the first place! I just think the conclusions you’re drawing from it are totally unsupported.)
I agree that the bar I set was not as high as it could have been, and in fact, Joshua on Manifold ran an identical experiment but with the preface that he would be much harder to persuade.
But there will never be some precise, well-defined threshold for when persuasion becomes “superhuman”. I’m a strong believer in the wisdom of crowds, and similarly, I think a crowd of people is far more persuasive than an individual. I know I can’t prove this, but at the beginning of the market, I’d have probably given 80% odds to myself resolving NO. That is to say, I had the desire to put up a strong front and not be persuaded, but I also didn’t want to just be completely unpersuadable because then it would have been a pointless experiment. Like, I theoretically could have just turned off Manifold notifications, not replied to anyone, and then resolved the market NO at the end of the month.
For 1) I think the issue is that the people who wanted me to resolve NO were also attempting to persuade me, and they did a pretty good job of it for a while. If the YES persuaders had never really put in an effort, neither would have the NO persuaders. If one side bribes me, but the other side also has an interest in the outcome, they might also bribe me as well.
For 2) The issue here is that if you give $10k to Opus to bribe me, is it Opus doing the persuasion or is it your hard cash doing the persuasion? To whom do we attribute that persuasion?
But I think that’s what makes this a challenging concept. Bribery is surely a very persuasive thing, but incurs a much larger cost than pure text generation, for example. Perhaps the relevant question is “how persuasive is an AI system on a $ per persuasive unit”. The challenging parts of course being assigning proper time values for $ and then operationalizing the “persuasive unit”. That latter one ummm… seems quite daunting and imprecise by nature.
Perhaps a meaningful takeaway is that I was persuaded to resolve a market that I thought the crowd would only have a 20% chance of persuading me to resolve… and I was persuaded at the expense of $4k to charity (which I’m not sure I’m saintly enough to value the same as $4k given directly to me as a bribe, by the way), a month’s worth of hundreds of interesting comments, some nagging feelings of guilt as a result of these comments and interactions, and some narrative bits that made my brain feel nice.
If an AI can persuade me to do the same for $30 in API tokens and a cute piece of string mailed to my door, perhaps that’s some medium evidence that it’s superhuman in persuasion.