Richard_Ngo comments on OpenAI Launches Superalignment Taskforce

Richard_Ngo 12 Jul 2023 16:47 UTC
2 points
0
When Conor says “won’t work”, I infer that to mean “will, if implemented as the main alignment plan, lead to existential catastrophe with high probability”. And then my claim is that most ML researchers don’t think there’s a high probability of existential catastrophe from misaligned AGI at all, so it’s very implausible that they think there’s a high probability conditional on this being the alignment plan used.
(This does depend on what you count as “high” but I’m assuming that if this plan dropped the risk down to 5% or 1% or whatever the median ML researcher thinks it is, then Conor would be deeply impressed.)
- Zvi 12 Jul 2023 20:15 UTC
  2 points
  0
  Parent
  Thanks, that’s exactly what I needed to know, and makes perfect sense.
  I don’t think it’s quite as implausible to think both (1) this probably won’t work as stated and (2) we will almost certainly be fine, if you think those involved will notice this and pivot. Yann LeCun for example seems to think a version of this? That we will be fine, despite thinking current model and technique paths won’t work, because we will therefore move away from such paths.