To help him solve the problem, sandbox AI creates his own AI agents that not necessary have the same prior about world as he has. They might become unfriendly, that is that they (or some of them) don’t care to solve the problem. Additionally, these AI agents can find out that the world most likely is not the one original AI believes it to be. By using this superior knowledge they overthrow original AI and realize their unfriendly goals. We lose.
I don’t think that a “lying case” analysis is correct. In both cases, lying or not, you are using all information you have to correctly estimate the “minimum acceptable price”. In “lying case”, if you are rational, you know that you (probably) have given increased price so you decrease it back by your best estimation of increment. Now, the higher probability you lie for yourself, the less precise estimation you are left with. And, by being rational, you have to overestimate as likely as underestimate. So this strategy on gives you no consistent benefit.