Is agency actually the issue by itself or just a necessary component?
Considering Robert miles stamp collecting robot:
“Order me some stamps in the next 32k tokens/60 seconds” is less scope than “guard my stamps today” than “ensure I always have enough stamps”. The last one triggers power seeking, the first 2 do not benefit from seeking power unless the payoff on the power seeking investment is within the time interval.
Note also that AutoGPT even if given a goal and allowed to run forever has immutable weights and a finite context window hobbling it.
So you need human level prediction + relevant modalities+ agency + long duration goal + memory at a bare minimum. Remove any element and the danger may be negligible.
Is agency actually the issue by itself or just a necessary component?
Considering Robert miles stamp collecting robot:
“Order me some stamps in the next 32k tokens/60 seconds” is less scope than “guard my stamps today” than “ensure I always have enough stamps”. The last one triggers power seeking, the first 2 do not benefit from seeking power unless the payoff on the power seeking investment is within the time interval.
Note also that AutoGPT even if given a goal and allowed to run forever has immutable weights and a finite context window hobbling it.
So you need human level prediction + relevant modalities+ agency + long duration goal + memory at a bare minimum. Remove any element and the danger may be negligible.