ryan_greenblatt comments on OpenAI Launches Superalignment Taskforce

ryan_greenblatt 12 Jul 2023 3:24 UTC
20 points
17
Given the current paradigm and technology it seems far safer to have an AI work on alignment research than highly difficult engineering tasks like nanotech. In particular, note that we only need to have an AI totally obsolete prior effors for this to be as good of a position as we could reasonably hope for.

In the current paradigm, it seem like the AI capability profile for R&D looks reasonably similar to humans.

Then, my overall view is that (for the human R&D capability profile) totally obsoleting alignment progress to date will be much, much easier than developing engineering based hard power necessary for a pivotal act.

This is putting aside the extreme toxicity of directly trying to develop decisive strategic advantage level hard power.

For instance, it’s no concidence that current humans work on advancing alignment research rather than trying to develop hard power themselves...

So, you’ll be able to use considerably dumber systems to do alignment research (merely human level as opposed to vastly superhuman).

Then, my guess is that the reduction in intelligence will dominate world model censorship.
- RobertM 12 Jul 2023 5:17 UTC
  5 points
  1
  Parent
  This is putting aside the extreme toxicity of directly trying to develop decisive strategic advantage level hard power.
  The pivotal acts that are likely to work aren’t antisocial. My guess is that the reason nobody’s working on them is lack of buy-in (and lack of capacity).
  - Dalcy 12 Jul 2023 5:52 UTC
    2 points
    −1
    Parent
    Also, davidad’s Open Agency Architecture is a very concrete example of what such a non-antisocial pivotal act that respects the preferences of various human representatives would look like (i.e. a pivotal process).
    Perhaps not realistically feasible in its current form, yes, but davidad’s proposal suggests that there might exist such a process, and we just have to keep searching for it.
    What links here?
    Gearing Up for Long Timelines in a Hard World by Dalcy (14 Jul 2023 6:11 UTC; 18 points)
  - ryan_greenblatt 12 Jul 2023 16:38 UTC
    1 point
    0
    Parent
    Yeah, if this wasn’t clear, I was refering to ‘pivotal acts’ which use hard engineering power sufficient for decisive strategic advantage. Things like ‘brain emulations’ or ‘build a fully human interpretable AI design’ don’t seem particularly anti-social (but may be poor ideas for feasiblity reasons).
- Dalcy 12 Jul 2023 5:39 UTC
  1 point
  0
  Parent
  Agree that current AI paradigm can be used to make significant progress in alignment research if used correctly. I’m thinking something like Cyborgism; leaving most of the “agency” to humans and leveraging prosaic models to boost researcher productivity which, being highly specialized in scope, wouldn’t involve dangerous consequentialist cognition in the trained systems.
  However, the problem is that this isn’t what OpenAI is doing—iiuc, they’re planning to build a full-on automated researcher that does alignment research end-to-end, for which orthonormal was pointing out that this is dangerous due to their cognition involving dangerous stuff.
  So, leaving aside the problems with other alternatives like pivotal act for now, it doesn’t seem like your points are necessarily inconsistent with orthonormal’s view that OpenAI’s plans (at least in its current form) seem dangerous.
  - ryan_greenblatt 12 Jul 2023 16:45 UTC
    2 points
    0
    Parent
    I think OpenAI is probably agnostic about how to use AIs to get more alignment research done.
    
    That said, speeding up human researchers by large multipliers will eventually be required for the plan to be feasible. Like 10-100x rather than 1.5-4x. My guess is that you’ll probably need AIs running considerably autonomously for long stretches to achieve this.