[Question] Bayesian Persuasion?

I’m an economist and quite new to AI alignment. In reading about the perils of persuasive AI, I was reminded of an influential model in economic theory: the Bayesian persuasion model (Kamenica and Gentzkow, 2011). It’s used to model situations in which a decisionmaker wants to learn from a biased expert (e.g. a judge learning from a prosecutor). The punchline of the basic model I linked is that in equilibrium, communication depends on the curvature of the expert’s payoff function: if the expert’s payoff is concave in the decisionmaker’s beliefs, then no communication occurs, whereas if the expert’s payoff is convex in the decisionmaker’s beliefs, then the expert discloses all their information.

To my non-expert eyes, this approach seems like it could be very useful in modelling the challenge of learning from AI while trading off the risk of persuasive AI. Does it seem promising, and if so, has it been done?

No comments.