Some AI safety methods/mechanisms can be tacked onto many kinds of AI systems. But separately, some paths to powerful AI are safer or more alignable than others:

Maybe whole-brain emulations are safer than de novo AI
Maybe feedforward-y systems are safer than recurrent-y systems
Maybe LM agents (LM-based systems with sophisticated scaffolding) are safer than similarly powerful base models^[1]
Maybe process-based systems are safer than outcome-based systems

WBE seems very unlikely to appear before strong de novo AI. But other relatively-safe-paths may be competitive (i.e. not require much extra-cost and capabilities-sacrifice relative to unsafe paths). This has important implications—it means that AI developers should prioritize those paths, and especially should differentially publish research on those paths to differentially boost others on those paths.^[2]

Which paths to powerful AI are relatively safe and potentially competitive, and thus should be boosted?

This question is a more focused successor to Which possible AI systems are relatively safe?

^
Paul says “My guess is that if you hold capability fixed and make a marginal move in the direction of (better LM agents) + (smaller LMs) then you will make the world safer. It straightforwardly decreases the risk of deceptive alignment, makes oversight easier, and decreases the potential advantages of optimizing on outcomes.”
^
There’s a quote I’m forgetting on differential technological development like if there’s an unsafe path and a safer path and the unsafe path is ahead (in terms of capabilities), we should rush to make progress on the safer path so that it gets ahead and even non-safety-motivated researchers switch to the safer path.

[Question] Which paths to powerful AI should be boosted?

Zach Stein-Perlman23 Aug 2023 16:00 UTC

5 points

1 comment1 min readLW link

Differential Intellectual Progress AI

Brendon_Wong 19 Sep 2024 23:15 UTC
3 points
0
Unfortunately I see this question didn’t get much engagement when it was originally posted, but I’m going to put a vote in for highly federated systems along the axes of agency, cognitive processes, and thinking, especially those that maximize transparency and determinism. I think that LM agents are just a first step into this area of safety. I write more about this here: https://www.lesswrong.com/posts/caeXurgTwKDpSG4Nh/safety-first-agents-architectures-are-a-promising-path-to

For specific proposals I’d recommend Drexler’s work on federating agency https://www.lesswrong.com/posts/5hApNw5f7uG8RXxGS/the-open-agency-model and federating cognitive processes (memory) https://www.lesswrong.com/posts/FKE6cAzQxEK4QH9fC/qnr-prospects-are-important-for-ai-alignment-research

No comments.