I believe its possible for AI values to align as much as the least possibly aligned human individuals are aligned with each other. And in my books, if this could be guaranteed, would already constitue a heroic achievement, perhaps the greatest accomplishment of mankind up until that point.
Any greater alignment would be a pleasant fantasy, hopefully realizable if AGIs were to come into existence, but doesn’t seem to have any solid justification, at least not any more than many other pleasant fantasies.
I think a lot of human “alignment” isn’t encoded in our brains, it’s encoded only interpersonally, in the fact that we need to negotiate with other humans of similar power. Once a human gets a lot of power, often the brakes come off. To the extent that’s true, alignment inspired by typical human architecture won’t work well for a stronger-than-human AI, and some other approach is needed.
I didn’t mean to suggest that any future approach has to rely on ‘typical human architecture’. I also believe the least possibly aligned humans are less aligned than the least possibly aligned dolphins, elephants, whales, etc…, are with each other. Treating AGI as a new species, at least as distant to us as dolphins for example, would be a good starting point.
Well I would answer but the answers would be recursive. I cannot know the true values and alignment of such a superhuman intellect without being one myself. And if I were, I wouldn’t be able to communicate such thoughts with their full strength, without you also being at least equally superhuman to understand. And if we both were, then you would know already.
And if neither of us are, then we can at best speculate with some half baked ideas that might sound convincing to us but unconvincing to said superhuman intellects. At best we can hope that any seeming alignment of values, perceived to the best of our abilities, is actual. Additionally, said supers may consider themselves ’humans’ or not, on criteria possibly also beyond our understanding.
Alternatively, if we could raise ourselves to that level, then case super-super affairs would become the basis, thus leading us to speculate on hyper-superhuman topics on super-Lesswrong. ad infinitum.
In this case we would we be the monkeys gazing at the strange, awkwardly tall and hairless monkeys pondering about them in terms of monkey affairs. Maybe I would understand alignment in terms of whose territory is whose, who is the alpha and omega among the human tribe(s), which bananas trees are the best, where is the nearest clean water source, what kind of sticks and stones make the best weapons, etc.
I probably won’t understand why human tribe(s) commit such vast efforts into creating and securing and moving around those funny looking giant metal cylinders with lots of wizmos at the top, bigger than any tree I’ve seen. Why every mention of them elicits dread, why only a few of the biggest human tribes are allowed to have them, why they need to be kept on constant alert, why multiple need to be put in even bigger metal cylinders to roam around underwater, etc., surely nothing can be that important right?
If the AGI is moderately above us, than we could probably find such arguments convincing to both, but we would never be certain of them.
If the AGI becomes as far above us as humans to monkeys then I believe the chances are about as likely as us arguments that could convince monkeys about the necessity of ballistic missile submarines.
Are you pondering what arguments a future AGI will need to convince humans? That’s well covered on LW.
Otherwise my point is that we will almost certainly not convince monkeys that ‘we’re one of them‘ if they can use their eyes and see instead of spending resources on bananas, etc., we’re spending it on ballistic missiles, etc.
Unless you mean if we can by deception, such as denying we spend resources along those lines, etc… in that case I’m not sure how that relates to a future AGI/human scenarios.
I believe its possible for AI values to align as much as the least possibly aligned human individuals are aligned with each other. And in my books, if this could be guaranteed, would already constitue a heroic achievement, perhaps the greatest accomplishment of mankind up until that point.
Any greater alignment would be a pleasant fantasy, hopefully realizable if AGIs were to come into existence, but doesn’t seem to have any solid justification, at least not any more than many other pleasant fantasies.
I think a lot of human “alignment” isn’t encoded in our brains, it’s encoded only interpersonally, in the fact that we need to negotiate with other humans of similar power. Once a human gets a lot of power, often the brakes come off. To the extent that’s true, alignment inspired by typical human architecture won’t work well for a stronger-than-human AI, and some other approach is needed.
I didn’t mean to suggest that any future approach has to rely on ‘typical human architecture’. I also believe the least possibly aligned humans are less aligned than the least possibly aligned dolphins, elephants, whales, etc…, are with each other. Treating AGI as a new species, at least as distant to us as dolphins for example, would be a good starting point.
Well I would answer but the answers would be recursive. I cannot know the true values and alignment of such a superhuman intellect without being one myself. And if I were, I wouldn’t be able to communicate such thoughts with their full strength, without you also being at least equally superhuman to understand. And if we both were, then you would know already.
And if neither of us are, then we can at best speculate with some half baked ideas that might sound convincing to us but unconvincing to said superhuman intellects. At best we can hope that any seeming alignment of values, perceived to the best of our abilities, is actual. Additionally, said supers may consider themselves ’humans’ or not, on criteria possibly also beyond our understanding.
Alternatively, if we could raise ourselves to that level, then case super-super affairs would become the basis, thus leading us to speculate on hyper-superhuman topics on super-Lesswrong. ad infinitum.
In this case we would we be the monkeys gazing at the strange, awkwardly tall and hairless monkeys pondering about them in terms of monkey affairs. Maybe I would understand alignment in terms of whose territory is whose, who is the alpha and omega among the human tribe(s), which bananas trees are the best, where is the nearest clean water source, what kind of sticks and stones make the best weapons, etc.
I probably won’t understand why human tribe(s) commit such vast efforts into creating and securing and moving around those funny looking giant metal cylinders with lots of wizmos at the top, bigger than any tree I’ve seen. Why every mention of them elicits dread, why only a few of the biggest human tribes are allowed to have them, why they need to be kept on constant alert, why multiple need to be put in even bigger metal cylinders to roam around underwater, etc., surely nothing can be that important right?
If the AGI is moderately above us, than we could probably find such arguments convincing to both, but we would never be certain of them.
If the AGI becomes as far above us as humans to monkeys then I believe the chances are about as likely as us arguments that could convince monkeys about the necessity of ballistic missile submarines.
Are you pondering what arguments a future AGI will need to convince humans? That’s well covered on LW.
Otherwise my point is that we will almost certainly not convince monkeys that ‘we’re one of them‘ if they can use their eyes and see instead of spending resources on bananas, etc., we’re spending it on ballistic missiles, etc.
Unless you mean if we can by deception, such as denying we spend resources along those lines, etc… in that case I’m not sure how that relates to a future AGI/human scenarios.