One reason I like “the danger is in the space of action sequences that achieve real-world goals” rather than “the danger is in the space of short programs that achieve real-world goals” is that it makes it clearer why adding humans to the process can still result in the world being destroyed.
If powerful action sequences are dangerous, and humans help execute an action sequence (that wasn’t generated by human minds), then it’s clear why that is dangerous too.
If the danger instead lies in powerful “short programs”, then it’s more tempting to say “just don’t give the program actuators and we’ll be fine”. The temptation is to imagine that the program is like a lion, and if you just keep the lion physically caged then it won’t harm you. If you’re instead thinking about action sequences, then it’s less likely to even occur to you that the whole problem might be solved by changing the AI from a plan-executor to a plan-recommender. Which is a step in the right direction in terms of actually grokking the nature of the problem.
One reason I like “the danger is in the space of action sequences that achieve real-world goals” rather than “the danger is in the space of short programs that achieve real-world goals” is that it makes it clearer why adding humans to the process can still result in the world being destroyed.
If powerful action sequences are dangerous, and humans help execute an action sequence (that wasn’t generated by human minds), then it’s clear why that is dangerous too.
If the danger instead lies in powerful “short programs”, then it’s more tempting to say “just don’t give the program actuators and we’ll be fine”. The temptation is to imagine that the program is like a lion, and if you just keep the lion physically caged then it won’t harm you. If you’re instead thinking about action sequences, then it’s less likely to even occur to you that the whole problem might be solved by changing the AI from a plan-executor to a plan-recommender. Which is a step in the right direction in terms of actually grokking the nature of the problem.