I think “alignment plasticity” is called “corrigibility”.
I agree with your view that approval reward as an AGI target would be complicated. I’d add the detail that even robustly desiring the approval of humans is probably not a good thing for an ASI to be doing, in the same way as a “smile optimizer” would not be a good thing for people who want to smile because they are happy.
unless we discover some magic universal value system that is relevant for all of humanity for all eternity
I’m not a huge fan of your dismissive tone here. My goal is to help humanity build a system for encoding such a thing. I think it is very difficult. Probably the most difficult thing humanity has ever attempted by far. But I do not think it is impossible, and it is only “magic” in the sense that any engineering discipline is magic.
Thank you! My intent definitely wasn’t to be dismissive, maybe skeptical, but I’m definitely aligned with you that solving this particular problem is both extremely hard and extremely important. Thanks for pointing out how that landed.
I think “alignment plasticity” is called “corrigibility”.
I agree with your view that approval reward as an AGI target would be complicated. I’d add the detail that even robustly desiring the approval of humans is probably not a good thing for an ASI to be doing, in the same way as a “smile optimizer” would not be a good thing for people who want to smile because they are happy.
I’m not a huge fan of your dismissive tone here. My goal is to help humanity build a system for encoding such a thing. I think it is very difficult. Probably the most difficult thing humanity has ever attempted by far. But I do not think it is impossible, and it is only “magic” in the sense that any engineering discipline is magic.
Thank you! My intent definitely wasn’t to be dismissive, maybe skeptical, but I’m definitely aligned with you that solving this particular problem is both extremely hard and extremely important. Thanks for pointing out how that landed.
No problem. Hope my criticism didn’t come across as overly harsh. I’m grateful for your engagement : )