Bogdan Ionut Cirstea comments on Neuroscience and Alignment

Bogdan Ionut Cirstea 19 Mar 2024 8:59 UTC
3 points
0
Why is value alignment different from these? Because we have working example of a value-aligned system right in front of us: The human brain. This permits an entirely scientific approach, requiring minimal philosophical deconfusion. And in contrast to corrigibility solutions, biological and artificial neural-networks are based upon the same fundamental principles, so there’s a much greater chance that insights from the one easily work in the other.
The similarities go even deeper, I’d say, see e.g. The neuroconnectionist research programme for a review and quite a few of my past linkposts (e.g. on representational alignment and how it could be helpful for value alignment, on evidence of [by default] (some) representational alignment between LLMs and humans, etc.); and https://www.lesswrong.com/posts/eruHcdS9DmQsgLqd4/inducing-human-like-biases-in-moral-reasoning-lms.