In response to your criticism of the strict validity of my experiment, in one sense I completely agree, it was mostly performed for fun, not for practical purposes, and I don’t think it should be interpreted as some rigorous metric:
Obviously this suggestion was given in jest, is highly imperfect, and I’m sure if you think about it for a second, you can find dozens of holes to poke… ah who cares.
That being said, I do think it yields some qualitative insights that more formalized, social science-type experiments would be woefully inadequate in generating.
Something like “superhuman persuasion” is loosely defined, resists explicit classification by its own nature, and means different things for different people. On top of that, any strict benchmark for measuring it would be rapidly Goodharted out of existence. So some contrived study like “how well does this AI persuade a judge in a debate, when facing a human,” or “which AI can persuade the other first,” or something of this nature, is likely to be completely meaningless at determining the superhuman persuasion capabilities of a model.
As to whether AIs inducing trances/psychosis in people is representative of superhuman persuasion, I’m not sure I agree. As Scott Alexander has noted, these kinds of things are happening relatively rarely, and forums like LessWrong likely exhibit extremely strong selection effects for the kind of people that become psychotic due to AI. Moreover, I don’t think that other psychosis-producing technologies, such as the written word, radio, or colonoscopies, are necessarily “persuading” in a meaningful sense. Even if AI is much stronger of a psychosis-generator than previous things that generate psychosis in people prone to that, I still think that’s a different class of problem than superhuman persuasion.
As an aside, some things, like social media, clearly can induce psychosis through the transmission of information that is persuasive, but I think that’s also meaningfully different than being persuasive in and of itself, although I didn’t get into that whole can of worms in the article.
In response to your criticism of the strict validity of my experiment, in one sense I completely agree, it was mostly performed for fun, not for practical purposes, and I don’t think it should be interpreted as some rigorous metric:
That being said, I do think it yields some qualitative insights that more formalized, social science-type experiments would be woefully inadequate in generating.
Something like “superhuman persuasion” is loosely defined, resists explicit classification by its own nature, and means different things for different people. On top of that, any strict benchmark for measuring it would be rapidly Goodharted out of existence. So some contrived study like “how well does this AI persuade a judge in a debate, when facing a human,” or “which AI can persuade the other first,” or something of this nature, is likely to be completely meaningless at determining the superhuman persuasion capabilities of a model.
As to whether AIs inducing trances/psychosis in people is representative of superhuman persuasion, I’m not sure I agree. As Scott Alexander has noted, these kinds of things are happening relatively rarely, and forums like LessWrong likely exhibit extremely strong selection effects for the kind of people that become psychotic due to AI. Moreover, I don’t think that other psychosis-producing technologies, such as the written word, radio, or colonoscopies, are necessarily “persuading” in a meaningful sense. Even if AI is much stronger of a psychosis-generator than previous things that generate psychosis in people prone to that, I still think that’s a different class of problem than superhuman persuasion.
As an aside, some things, like social media, clearly can induce psychosis through the transmission of information that is persuasive, but I think that’s also meaningfully different than being persuasive in and of itself, although I didn’t get into that whole can of worms in the article.