And the AI would have got away with it too, if...

Paul Chris­ti­ano pre­sented some low-key AI catas­tro­phe sce­nar­ios; in re­sponse, Robin Han­son ar­gued that Paul’s sce­nar­ios were not con­sis­tent with the “large (mostly eco­nomic) liter­a­ture on agency failures”.

He con­cluded with:

For con­crete­ness, imag­ine a twelve year old rich kid, per­haps a king or queen, seek­ing agents to help man­age their wealth or king­dom. It is far from ob­vi­ous that this child is on av­er­age worse off when they choose a smarter more ca­pa­ble agent, or when the over­all pool of agents from which they can choose be­comes smarter and more ca­pa­ble. And its even less ob­vi­ous that the kid be­comes max­i­mally worse off as their agents get max­i­mally smart and ca­pa­ble. In fact, I sus­pect the op­po­site.

Think­ing on that ex­am­ple, my mind went to Ed­ward the Vth of England (one of the “Princes in the Tower”), de­posed then likely kil­led by his “pro­tec­tor” Richard III. Or of the Guangxu Em­peror of China, put un­der house ar­rest by the Re­gent Em­press Dowa­ger Cixi. Or maybe the ten year-old Athi­taya­wong, king of Ayut­thaya, de­posed by his main ad­minis­tra­tor af­ter only 36 days of reign. More ex­am­ples can be dug out from some of Wikipe­dia’s list of rulers de­posed as chil­dren.

We have no rea­son to re­strict to child-monar­chs—so many Em­per­ors, Kings, and Tsars have been de­posed by their ad­visers or “agents”. So yes, there are many cases where agency fails catas­troph­i­cally for the prin­ci­pal and where hav­ing a smarter or more ra­tio­nal agent was a dis­as­trous move.

By re­strict­ing at­ten­tion to agency prob­lems in eco­nomics, rather than in poli­tics, Robin re­stricts at­ten­tion to situ­a­tions where in­sti­tu­tions are strong and be­havi­our is pun­ished if it gets too egre­gious. Though even to­day, there is plenty of be­trayal by “agents” in poli­tics, even if the re­sults are less lethal than in times gone by. In eco­nomics, too, we have fraud­u­lent in­vestors, some of which es­cape pun­ish­ment. Agents be­tray their prin­ci­pals to the ut­most—when they can get away with it.

So Robin’s ar­gu­ment is en­tirely de­pen­dent on the as­sump­tion that in­sti­tu­tions or ri­vals will pre­vent AIs from be­ing able to abuse their agency power. Ab­sent that as­sump­tion, most of the “large (mostly eco­nomic) liter­a­ture on agency failures” be­comes ir­rele­vant.

So, would in­sti­tu­tions be able to de­tect and pun­ish abuses by fu­ture pow­er­ful AI agents? I’d ar­gue we can’t count on it, but it’s a ques­tion that needs its own ex­plo­ra­tion, and is very differ­ent from what Robin’s eco­nomic point seemed to be.