What baby gate protects you from Claude subtly misspecifying all your unit tests? If you have to carefully check them all this negates the benefit of automating the work. This applies to most complex intellectual work, e.g. literature review. It’s kind of like saying “what if you just had a general baby gate, so that people never have to grow up?”, well they don’t really make baby gates like that, or at least people you have to baby gate like that are not economically productive.
More generally, if you want an autonomous agent it must be self monitoring and self evaluating. Humans, or at least the kind of humans you want as employees, do not need to be carefully externally vetted for each thing they do to ensure they do it properly. Rewards coming from the environment, as they do in most formal RL models, is an academic convenience. An actually autonomous agent has to be able to ontologize reward over the computable environment in a general way that doesn’t require some other mind to come in and correct it all the time. If you don’t have that, you’re not getting meaningful autonomy.
I stand by my basic point in Varieties Of Doom that these models don’t plan very much yet, and as soon as we start having them do planning and acting over longer time horizons we’ll see natural instrumental behavior emerge. We could also see this emerge from e.g. continuous learning that lets them hold a plan in implicit context over much longer action trajectories.