I think I basically agree with this and think it’s right. In some ways you might say focusing too much on “values” acts like a barrier to deeper investigation of the mechanisms at work here, and I think looking deeper is necessary because I expect that optimization against the value abstraction layer alone will result in Goodharting.
I think I basically agree with this and think it’s right. In some ways you might say focusing too much on “values” acts like a barrier to deeper investigation of the mechanisms at work here, and I think looking deeper is necessary because I expect that optimization against the value abstraction layer alone will result in Goodharting.