paulfchristiano comments on Power dynamics as a blind spot or blurry spot in our collective world-modeling, especially around AI

paulfchristiano 8 Jun 2021 0:29 UTC
26 points
0
Let’s define:
- X = thinking about the dynamics of conflict + how they affect our collective ability to achieve things we all want; prioritizing actions based on those considerations
- Y = thinking about how actions shift the balance of power + how we should be trying to shift the balance of power; prioritizing actions based on those considerations
I’m saying:
- I think the alignment community traditionally avoids Y but does a lot of X.
- I think that the factors you listed (including in the parent) are mostly reasons we’d do less Y.
- So I read you as mostly making a case for “why the alignment community might be inappropriately averse to Y.”
- I think that separating X and Y would make this discussion clearer.
- I’m personally sympathetic to both activities. I think the altruistic case for marginal X is stronger.
Here are some reasons I perceive you as mostly talking about Y rather than X:
- You write: “Rather, the concern is that we are underperforming the forces that will actually shape the future, which are driven primarily by the most skilled people who are going around shifting the balance of power.” This seems like a good description of Y but not X.
- You listed “Competitive dynamics as a distraction from alignment.” But in my people from the alignment community very often bring up X themselves both as a topic for research and as a justification for their research (suggesting that in fact they don’t regard it as a distraction), and in my experience Y derails conversations about alignment perhaps 10x more often than X.
- You talk about the effects of the PMK post. Explicitly that post is mostly about Y rather than X and it is often brought up when someone starts Y-ing on LW. It may also have the effect of discouraging X, but I don’t think you made the case for that.
- You mention the causal link from “fear of being manipulated” to “skill at thinking about power dynamics” which looks very plausible (to me) in the context of Y but looks like kind of a stretch (to me) in the context of X. You say “they find it difficult to think about topics that their friends or co-workers disagree with them about,” which again is most relevant to Y (where people frequently disagree about who should have power or how important it is) and not particularly relevant to X (similar to other technical discussions).
- In your first section you quote Eliezer. But he’s not complaining about people thinking about how fights go in a way that might disruptive a sense of shared purpose, he’s complaining that Elon Musk is in fact making their decisions in order to change which group gets power in a way that more obviously disrupts any sense of shared purpose. This seems like complaining about Y, rather than X.
- More generally, my sense is that X involves thinking about politics and Y mostly is politics, and most of your arguments describe why people might be averse to doing politics rather than discussing it. Of course that can flow backwards (people who don’t like doing something may also not like talking about it) but there’s certainly a missing link.
- Relative to the broader community thinking about beneficial AI, the alignment community does unusually much X and unusually little Y. So prima facie it’s more likely that “too little X+Y” is mostly about “too little Y” rather than “too little X.” Similarly, when you list corrective influences they are about X rather than Y.
I care about this distinction because in my experience discussions about alignment of any kind (outside of this community) are under a lot of social pressure to turn into discussions about Y. In the broader academic/industry community it is becoming harder to resist those pressures.
I’m fine with lots of Y happening, I just really want to defend “get better at alignment” as a separate project that may require substantial investment. I’m concerned that equivocating between X and Y will make this difficulty worse, because many of the important divisions are between (alignment, X) vs (Y) rather than (alignment) vs (X, Y).
- habryka 8 Jun 2021 3:42 UTC
  2 points
  0
  Parent
  This also summarizes a lot of my take on this position. Thank you!
- Ben Pace 8 Jun 2021 2:13 UTC
  2 points
  0
  Parent
  This felt like a pretty helpful clarification of your thoughts Paul, thx.