This is the “what kind of minds are we even building” problem. …
we are building systems that could turn out to have that same cognitive property as humans and other animals: namely, having interests they actually care about. “What would it even look like to respect or ignore these interests?”
That intermediate problem of “having interests they actually care about” seems to be quite close to what Steven Byrnes calls We need a field of Reward Function Design.
Thanks Gunnar, agree this is closely related (and reward circuitry is naively where one would expect valence-relevant representational structure to live). He’s approaching this from the alignment angle rather than the the “what kind of minds are we even building” angle AFAICT, but this is a good example of why I see both pursuits as being essential and complementary.
That intermediate problem of “having interests they actually care about” seems to be quite close to what Steven Byrnes calls We need a field of Reward Function Design.
Thanks Gunnar, agree this is closely related (and reward circuitry is naively where one would expect valence-relevant representational structure to live). He’s approaching this from the alignment angle rather than the the “what kind of minds are we even building” angle AFAICT, but this is a good example of why I see both pursuits as being essential and complementary.