Eli Tyre comments on Foom & Doom 2: Technical alignment is hard

Eli Tyre 25 Jun 2025 2:53 UTC
2 points
0
As another example, I’ve seen people imagine non-consequentialist preferences as “rules that the AI grudgingly follows, while searching for loopholes”, rather than “preferences that the AI enthusiastically applies its intelligence towards pursuing”.
I imagine that this might be yet another view that is downstream of visualizing a “train then deploy” paradigm for future AI systems?

If the human operators successfully install some static deontological constraints in the AI, while also training it to accomplish consequentialist goals, there’s a continual training incentive to learn to game and to route around the deontological constrains.

Another way to say this: There are tradeoffs between the consequentialist and non-consequentialist desires, current AIs are only reinforced on the basis of behavioral outcomes (which are served better by consequentialist desires than non-consequentialist?) so training tends to gradually nudge the AIs towards having consequentialist goals.
- Eli Tyre 25 Jun 2025 2:54 UTC
  2 points
  0
  Parent
  Oh, that’s your very next point! : P