Noosphere89 comments on Why The Focus on Expected Utility Maximisers?

Noosphere89 28 Dec 2022 15:47 UTC
1 point
−3

Cooperation is game theoretically optimal.

This is a claim I strongly disagree with, assuming there aren’t enforcement mechanisms like laws or contracts. If there isn’t enforcement, then this reduces to the Prisoner’s Dilemma, and there defection is game-theoretically optimal. Cooperation only works if things can be enforced, and the likelihood that we will be able to enforce things like contracts on superhuman intelligences is essentially like that of animals enforcing things on a human, i.e so low that it’s not worth privileging the hypothesis.

And this is important, because it speaks to why the alignment problem is hard: agents with vastly differing capabilities can’t enforce much of anything, so defection is going to happen. And I think this prediction bears out in real life relations with animals, that is humans can defect consequence free, so this usually happens.

One major exception is pets, where the norm really is cooperation, and the version that would be done for humans is essentially benevolent totalitarianism. Life’s good in such a society, but modern democratic freedoms are almost certainly gone or so manipulated that it doesn’t matter.

That might not be bad, but I do want to note that in game theory without enforcement is where defection rules.

instrumental convergence towards developing some form of morality.

That respects the less capable agent’s wants, and stably is the necessary thing. And the answer to this is negative, expect in the pets case. And even here, this will entail the end of democracy and most freedom as we know it. It might actually be benevolent totalitarianism, and you may make an argument that this is desirable, though I do want to note the costs.
- Noosphere89 30 Oct 2023 15:04 UTC
  2 points
  0
  Parent
  In one sense, I no longer endorse the previous comment. In another sense, I sort of endorse the previous comment.
  
  I was basically wrong about alignment requiring human values to be game-theoretically optimal, and I think that cooperation is actually doable without relying on game theory tools like enforcement, because the situation with human alignment is very different than the situation with AI alignment, because we have access to the AI’s brain and can directly reward good things and negatively reward bad things, combined with the fact that we have a very powerful optimizer called SGD that lets us straightforwardly select over minds and directly edit the AI’s brain, which aren’t things we have to align humans, partially for ethical reasons and partially for technological reasons.
  
  I also think my analogy of human-animal alignment is actually almost as bad as human-evolution alignment, which is worthless, and instead the better analogy for how to predict the likelihood of AI alignment is prefrontal cortex-survival value alignment, or innate reward alignment, which is very impressive alignment.
  
  However, I do think that even with that assumption of aligned AIs, I do think that democracy is likely to decay pretty quickly under AI, especially because of the likely power imbalances, especially hundreds of years into the future. We will likely retain some freedoms under aligned AI rule, but I expect it to be a lot less than what we’re used to today, and it will transition into a form of benevolent totalitarianism.