I don’t think your claim makes the argument circular / question-begging; it just means there’s an extra step in explaining why and how a random action sequence destroys the world.
Maybe you mean that I’m putting the emphasis in the wrong place, and it would be more illuminating to highlight some specific feature of random smart short programs as the source of the ‘instrumental convergence’ danger? If so, what do you think that feature is?
From my current perspective I think the core problem really is that most random short plans that succeed in sufficiently-hard tasks kill us. If the causal process by which this happens includes building a powerful AI optimizer, or building an AI that builds an AI, or building an AI that builds an AI that builds an AI, etc., then that’s interesting and potentially useful to know, but that doesn’t seem like the key crux to me, and I’m not sure it helps further illuminate where the danger is ultimately coming from.
(That said, I don’t expect the plan to necessarily literally kill all humans, just to takeover the world, but this is due to galaxy brained trade and common sense morality arguments which are mostly out of scope and shouldn’t be a thing people depend on.)
Very happy to hear someone with an idea like this who explicitly flags that we shouldn’t gamble on this being true!
One reason I like “the danger is in the space of action sequences that achieve real-world goals” rather than “the danger is in the space of short programs that achieve real-world goals” is that it makes it clearer why adding humans to the process can still result in the world being destroyed.
If powerful action sequences are dangerous, and humans help execute an action sequence (that wasn’t generated by human minds), then it’s clear why that is dangerous too.
If the danger instead lies in powerful “short programs”, then it’s more tempting to say “just don’t give the program actuators and we’ll be fine”. The temptation is to imagine that the program is like a lion, and if you just keep the lion physically caged then it won’t harm you. If you’re instead thinking about action sequences, then it’s less likely to even occur to you that the whole problem might be solved by changing the AI from a plan-executor to a plan-recommender. Which is a step in the right direction in terms of actually grokking the nature of the problem.
I don’t think your claim makes the argument circular / question-begging; it just means there’s an extra step in explaining why and how a random action sequence destroys the world.
Maybe you mean that I’m putting the emphasis in the wrong place, and it would be more illuminating to highlight some specific feature of random smart short programs as the source of the ‘instrumental convergence’ danger? If so, what do you think that feature is?
From my current perspective I think the core problem really is that most random short plans that succeed in sufficiently-hard tasks kill us. If the causal process by which this happens includes building a powerful AI optimizer, or building an AI that builds an AI, or building an AI that builds an AI that builds an AI, etc., then that’s interesting and potentially useful to know, but that doesn’t seem like the key crux to me, and I’m not sure it helps further illuminate where the danger is ultimately coming from.
Very happy to hear someone with an idea like this who explicitly flags that we shouldn’t gamble on this being true!
One reason I like “the danger is in the space of action sequences that achieve real-world goals” rather than “the danger is in the space of short programs that achieve real-world goals” is that it makes it clearer why adding humans to the process can still result in the world being destroyed.
If powerful action sequences are dangerous, and humans help execute an action sequence (that wasn’t generated by human minds), then it’s clear why that is dangerous too.
If the danger instead lies in powerful “short programs”, then it’s more tempting to say “just don’t give the program actuators and we’ll be fine”. The temptation is to imagine that the program is like a lion, and if you just keep the lion physically caged then it won’t harm you. If you’re instead thinking about action sequences, then it’s less likely to even occur to you that the whole problem might be solved by changing the AI from a plan-executor to a plan-recommender. Which is a step in the right direction in terms of actually grokking the nature of the problem.