On reflection, I can see how what you talk about is a hard part. I haven’t focused much on it, because I’ve seen it as a capabilities question (much of it seems to boil down to an outer optimizer not being capable enough to achieve inner alignment, which depends on the capabilities of the outer optimizer), but on reflection there may be quite worthwhile for alignment researchers to spend time on this.
However, I don’t think it’s the (only) hard part. Even if we can turn the world into diamondoid or create two identical strawberries, there’s still the equally important issue of figuring out human values well enough to direct to that.
I also agree with skepticism about many of the existing research programs.
On reflection, I can see how what you talk about is a hard part. I haven’t focused much on it, because I’ve seen it as a capabilities question (much of it seems to boil down to an outer optimizer not being capable enough to achieve inner alignment, which depends on the capabilities of the outer optimizer), but on reflection there may be quite worthwhile for alignment researchers to spend time on this.
However, I don’t think it’s the (only) hard part. Even if we can turn the world into diamondoid or create two identical strawberries, there’s still the equally important issue of figuring out human values well enough to direct to that.
I also agree with skepticism about many of the existing research programs.