The neuroscience/psychology rather than ml side of the alignment problem seems quite neglected (because it harder on the one hand, but it’s easier to not work on something capabilities related if you just don’t focus on the cortex). There’s reverse engineering human social instincts. In principle would benefit from more high quality experiments in mice, but those are expensive.
The neuroscience/psychology rather than ml side of the alignment problem seems quite neglected (because it harder on the one hand, but it’s easier to not work on something capabilities related if you just don’t focus on the cortex). There’s reverse engineering human social instincts. In principle would benefit from more high quality experiments in mice, but those are expensive.