[Linkpost] Will AI avoid exploitation?

As some of you may know, I’m editing a special issue of the journal Philosophical Studies on AI safety (along with @Dan H ). I thought I’d share the first paper from the issue, which deals with some issues in AI safety theory that have been frequently discussed on LessWrong.

Here’s the abstract:

A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expect advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.

You can find the paper here: https://​​link.springer.com/​​article/​​10.1007/​​s11098-023-02023-4