JuliaHP comments on Can we ever ensure AI alignment if we can only test AI personas?

JuliaHP 16 Mar 2025 9:34 UTC
2 points
0
This to me seems to be akin to “sponge-alignment” IE not building a powerful AI.

We understand personas because they are simulating human behavior which we understand. But that human behavior is mostly limited to human capabilities (expect for maybe speed-up possibilities).

Building truly powerful AI’s will probably involve systems that do something different than human brains, or at-least do not grow with human biases for learning, which causes them to learn the human behaviors we are familiar with.

If the “power” of the AI comes through something else than the persona, then trusting the persona won’t do you much good.
- Karl von Wendt 16 Mar 2025 11:28 UTC
  1 point
  0
  Parent
  Thanks for the comment! If I understand you correctly, you’re saying the situation is even worse because with superintelligent AI, we can’t even rely on testing a persona.
  I agree that superintelligence makes things much worse, but if we define “persona” not as a simulacrum of a human being, but more generally as a kind of “self-model”, a set of principles, values, styles of expression etc., then I think even a superintelligence would use at least one such persona, and possibly many different ones. It might even decide to use a very human-like persona in its interactions with us, just like current LLMs do. But it would also be capable of using very alien personas which we would have no hope of understanding. So I agree with you in that respect.