The brain algorithms that do moral reasoning are value-aligned in the same way a puddle is aligned with the shape of the hole it’s in.
They’re shaped by all sorts of forces, ranging from social environment to biological facts like how we can’t make our brains twice as large. Not just during development, but on an ongoing basis our moral reasoning exists in a balance with all these other forces. But of course, a puddle always coincidentally finds itself in a hole that’s perfectly shaped for it.
If you took the decision-making algorithms from my brain and put them into a brain 357x larger, that tautological magic spell might break, and the puddle that you’ve moved into a different hole might no longer be the same shape as it was in the original hole.
If you anticipate this general class of problems and try to resolve them, that’s great! I’m not saying nobody should do neuroscience. It’s just I don’t think it’s a “entirely scientific approach, requiring minimal philosophical deconfusion,” nor does it lead to safe AIs that are just emulations of humans except smarter.
I think that even if something is lost in that galaxy-brain, for most it would be hard to lose the drives not to kill your friends and family for meaningless squiggles. But maybe this is the case. Either way, I don’t think you need radical philosophical de-confusion in order to solve this problem. You just need to understand what does and doesn’t determine what values you have. What you describe is determining the boundary conditions for a complicated process, not figuring out what “justice” is. It has the potential to be a hard technical problem, but technical it still is.
The brain algorithms that do moral reasoning are value-aligned in the same way a puddle is aligned with the shape of the hole it’s in.
They’re shaped by all sorts of forces, ranging from social environment to biological facts like how we can’t make our brains twice as large. Not just during development, but on an ongoing basis our moral reasoning exists in a balance with all these other forces. But of course, a puddle always coincidentally finds itself in a hole that’s perfectly shaped for it.
If you took the decision-making algorithms from my brain and put them into a brain 357x larger, that tautological magic spell might break, and the puddle that you’ve moved into a different hole might no longer be the same shape as it was in the original hole.
If you anticipate this general class of problems and try to resolve them, that’s great! I’m not saying nobody should do neuroscience. It’s just I don’t think it’s a “entirely scientific approach, requiring minimal philosophical deconfusion,” nor does it lead to safe AIs that are just emulations of humans except smarter.
I think that even if something is lost in that galaxy-brain, for most it would be hard to lose the drives not to kill your friends and family for meaningless squiggles. But maybe this is the case. Either way, I don’t think you need radical philosophical de-confusion in order to solve this problem. You just need to understand what does and doesn’t determine what values you have. What you describe is determining the boundary conditions for a complicated process, not figuring out what “justice” is. It has the potential to be a hard technical problem, but technical it still is.