I’d agree that equivalently rapid progress in something like deep reinforcement learning would be dramatically more concerning. If we were already getting such high quality results while constructing a gradient out of noisy samples of a sparse reward function, I’d have to shorten my timelines even more. RL does tend to more directly imply agency, and it would also hurt my estimates on the alignment side of things in the absence of some very hard work (e.g. implemented with IB-derived proof of ‘regret bound is alignment’ or somesuch).
I also agree that token predictors are less prone to developing these kinds of directly worrisome properties, particularly current architectures with all their limitations.
I’m concerned that advancements on one side will leak into others. It might not look exactly the same as most current deep RL architectures, but they might still end up serving similar purposes and having similar risks. Things like decision transformers come to mind. In the limit, it wouldn’t be too hard to build a dangerous agent out of an oracle.
Maybe there is some consolation in that if the humanity were to arrive at something approaching AGI, it would rather be better for it to do so using an architecture that’s limited in its ultimate capability, demonstrates as little natural agency as possible, ideally that’s a bit of a dead end in terms of further AI development. It could serve as a sort of vaccine if you will.
Running with the singularity scenario for a moment, I have very serious doubts that a purely theoretical research performed largely in a vacuum will yield any progress on AI safety. The history of science certainly doesn’t imply that we will solve this problem before it becomes a serious threat. So the best case scenario we can hope for is that the first crisis caused by the AGI will not be fatal due to the underlying technology’s limitations and manageable speed of improvement.
I’d agree that equivalently rapid progress in something like deep reinforcement learning would be dramatically more concerning. If we were already getting such high quality results while constructing a gradient out of noisy samples of a sparse reward function, I’d have to shorten my timelines even more. RL does tend to more directly imply agency, and it would also hurt my estimates on the alignment side of things in the absence of some very hard work (e.g. implemented with IB-derived proof of ‘regret bound is alignment’ or somesuch).
I also agree that token predictors are less prone to developing these kinds of directly worrisome properties, particularly current architectures with all their limitations.
I’m concerned that advancements on one side will leak into others. It might not look exactly the same as most current deep RL architectures, but they might still end up serving similar purposes and having similar risks. Things like decision transformers come to mind. In the limit, it wouldn’t be too hard to build a dangerous agent out of an oracle.
Maybe there is some consolation in that if the humanity were to arrive at something approaching AGI, it would rather be better for it to do so using an architecture that’s limited in its ultimate capability, demonstrates as little natural agency as possible, ideally that’s a bit of a dead end in terms of further AI development. It could serve as a sort of vaccine if you will.
Running with the singularity scenario for a moment, I have very serious doubts that a purely theoretical research performed largely in a vacuum will yield any progress on AI safety. The history of science certainly doesn’t imply that we will solve this problem before it becomes a serious threat. So the best case scenario we can hope for is that the first crisis caused by the AGI will not be fatal due to the underlying technology’s limitations and manageable speed of improvement.