I often wonder if there is a No-Alignment Theorem that says you can’t always control the actions of an intelligent entity. Maybe something with the flavor of the undecidability of the halting problem or Godel’s Incompleteness Theorems, where the issue stems from the fact that an intelligent entity can model itself and reflect from a distance on the goals you’ve given it.
I doubt such a thing exists, but it’s fun to think about. It would also require a mathematical formulation of an intelligent entity, which seems to be quite a ways off. And even if such a theorem does exist, it would almost certainly be irrelevant for doing alignment in practice, the same way Godel’s Incompleteness Theorems do not affect the day-to-day work of mathematicians.
I often wonder if there is a No-Alignment Theorem that says you can’t always control the actions of an intelligent entity. Maybe something with the flavor of the undecidability of the halting problem or Godel’s Incompleteness Theorems, where the issue stems from the fact that an intelligent entity can model itself and reflect from a distance on the goals you’ve given it.
I doubt such a thing exists, but it’s fun to think about. It would also require a mathematical formulation of an intelligent entity, which seems to be quite a ways off. And even if such a theorem does exist, it would almost certainly be irrelevant for doing alignment in practice, the same way Godel’s Incompleteness Theorems do not affect the day-to-day work of mathematicians.