ryan_greenblatt comments on Matthew Barnett’s Shortform

ryan_greenblatt 26 Jan 2024 20:14 UTC
2 points
0
Are you conditioning on the prior claims when stating your probabilities? Many of these properties are highly correlated. E.g., “seriously misaligned” and “broadly strategic about achieving long run goals in ways that lead to scheming” seem very correlated to me. (Your probabilites seem higher than I would have expected without any correlation, but I’m unsure.)

I think we probably disagree about the risk due to misalignment by like a factor of 2-4 or something. But probably more of the crux is in value on working on other problems.
- Matthew Barnett 26 Jan 2024 21:14 UTC
  6 points
  0
  Parent
  I’m not conditioning on prior claims.
  
  One potential reason why you might have inferred that I was is because my credence for scheming is so high, relative to what you might have thought given my other claim about “serious misalignment”. My explanation here is that I tend to interpret “AI scheming” to be a relatively benign behavior, in context. If we define scheming as:
  - behavior intended to achieve some long-tern objective that is not quite what the designers had in mind
  - not being fully honest with the designers about its true long-term objectives (especially in the sense of describing accurately what it would do with unlimited power)
  then I think scheming is ubiquitous and usually relatively benign, when performed by rational agents without godlike powers. For example, humans likely “scheme” all the time by (1) pursuing long-term plans, and (2) not being fully honest to others about what they would do if they became god. This is usually not a big issue because agents don’t generally get the chance to take over the world and do a treacherous turn; instead, they have to play the game of compromise and trade like the rest of us, along with all the other scheming AIs, who have different long-term goals.