I liked this post and thought it gave a good impression of just how crazy AIs could get if we allow progress to continue. It also made me even more confident that we really cannot allow AI progress to continue unabated, at least not to the point where AIs are automating AI R&D and getting to this level of capability.
I also think it is very unlikely that AIs 4 SDs above the human range would be controllable, I’d expect them to be able to fairly easily sabotage research they were given without humans noticing. When I think of intelligence gaps like that in humans it feels pretty insurmountable
I also think it is very unlikely that AIs 4 SDs above the human range would be controllable, I’d expect them to be able to fairly easily sabotage research they were given without humans noticing. When I think of intelligence gaps like that in humans it feels pretty insurmountable
Seems reasonable. It’s worth noting there might be a distinction between:
We can extract useful work from these systems.
We can prevent these systems from successfully causing egregious security failures.
I think both are unlikely but plausible (given massive effort) and that preventing egregious security failures seems plausible even without massive effort (though still unlikely.
I liked this post and thought it gave a good impression of just how crazy AIs could get if we allow progress to continue. It also made me even more confident that we really cannot allow AI progress to continue unabated, at least not to the point where AIs are automating AI R&D and getting to this level of capability.
I also think it is very unlikely that AIs 4 SDs above the human range would be controllable, I’d expect them to be able to fairly easily sabotage research they were given without humans noticing. When I think of intelligence gaps like that in humans it feels pretty insurmountable
Seems reasonable. It’s worth noting there might be a distinction between:
We can extract useful work from these systems.
We can prevent these systems from successfully causing egregious security failures.
I think both are unlikely but plausible (given massive effort) and that preventing egregious security failures seems plausible even without massive effort (though still unlikely.
You might be interested in Jankily controlling superintelligence.