Great post and great format (particularly liked “generalization upstream of reward signals” which I sort of had intuitions about (from reading your work) but hadn’t seen presented so crisply)
I’d be excited to see more treatment of generalization upstream of reward signals (i.e. hypothesized mechanisms for the reward function learning algorithms, mapping to potential ML setups), though all of this has genuine potential capability externalities.
Great post and great format (particularly liked “generalization upstream of reward signals” which I sort of had intuitions about (from reading your work) but hadn’t seen presented so crisply)
I’d be excited to see more treatment of generalization upstream of reward signals (i.e. hypothesized mechanisms for the reward function learning algorithms, mapping to potential ML setups), though all of this has genuine potential capability externalities.