The AI is very honorable/honest/trustworthy — in particular, the AI would keep its promises even in extreme situations.
NB: It seems like we need a (possibly much weaker, but maybe in practice no weaker) assumption that we can detect whether the AI is lying about deals of the form in Step 2.
NB: It seems like we need a (possibly much weaker, but maybe in practice no weaker) assumption that we can detect whether the AI is lying about deals of the form in Step 2.