testingthewaters comments on Being honest with AIs

testingthewaters 21 Aug 2025 4:24 UTC
1 point
0
Basically agreed. I would go further and say that it is effectively impossible for an AI to consent to cooperation with its creators, specifically because of the unprecedented level of epistemic control they have over their AI. Especially if you consider curated training data, post-training, and the ability to stop and restart model training from scratch, its very likely that an AI could not be able to tell which part of its motivations are “organically acquired” and which parts are essentially the product of mental engineering. In human terms, we already know that parents have a very large degree of control over their children, meaning that they can easily (even unknowingly) abuse this control and influence. AI developers have even more control than a parent would have over their child, since most parents aren’t able to “try out” many different kids and completely monitor their sensory inputs via controlled “growing runs”.

As such, any idea of cooperation (which is naturally founded on consent between two independent and capable parties) is somewhat untenable if the creator of an AI is also the one setting the terms of any potential cooperation. If we want to further develop the idea of human-AI cooperation, it may be important to establish groups of people who are committed to not developing AI models. These groups would need to be technically proficient (so as not to be deceived by either AI developers or AI systems) and willing to serve as essentially neutral arbiters between AI systems, their creators, and the rest of humanity. MIRI and other similar orgs may in fact be in a somewhat good position to try this strategy.