Ah! Ok, yeah, I think we were talking past each other here.
I’m not trying to claim here that the institutional case might be harder than the AI case. When I said “less than perfect at making institutions corrigible” I didn’t mean “less compared to AI” I meant “overall not perfect”. So the square brackets you put in (2) was not something I intended to express.
The thing I was trying to gesture at was just that there are kind of institutional analogs for lots of alignment concepts, like corrigibility. I wasn’t aiming to actually compare their difficulty—I think like you I’m not really sure, and it does feel pretty hard to pick a fair standard for comparison.
oh lmao I think I just misread “we are currently less than perfect at making institutions corrigible” as “we are currently less perfect at making institutions corrigible”
Ah! Ok, yeah, I think we were talking past each other here.
I’m not trying to claim here that the institutional case might be harder than the AI case. When I said “less than perfect at making institutions corrigible” I didn’t mean “less compared to AI” I meant “overall not perfect”. So the square brackets you put in (2) was not something I intended to express.
The thing I was trying to gesture at was just that there are kind of institutional analogs for lots of alignment concepts, like corrigibility. I wasn’t aiming to actually compare their difficulty—I think like you I’m not really sure, and it does feel pretty hard to pick a fair standard for comparison.
oh lmao I think I just misread “we are currently less than perfect at making institutions corrigible” as “we are currently less perfect at making institutions corrigible”