I think you’ve got it the wrong way around. In fact, that’s probably my biggest issue with the whole field of alignment. I think that it’s probably easier to solve the problem of human institution alignment than AI alignment, and that to do so would help solve the AI alignment problem as well.
The reason I say this is because individual humans are already aligned to human values, and it should be possible by some means to preserve this fact even while scaling up to entire organizations. There is no a priori reason that this would be more difficult than literally reverse engineering the entire human mind from scratch! That is, it doesn’t actually matter what human values are, if you believe that humans actually can be trusted to have them—all that matters is that you can be certain they are preserved by the individual-to-organization transition. So my position can be summarized as “designing an organization which preserves human values already present is easier than figuring out what they are to begin with and injecting them into something with no humanity already in it.”
As a matter of fact, this firm belief is the basis of my whole theory of what we ought to be doing as a species—I think AI should not be allowed to gain general intelligence, and that instead we should focus on creating an aligned superintelligence out of humans (with narrow AIs as “mortar”, mere extensions to human capacities) - first in the form of an organization, later on a “hive mind” using brain computer interfaces to achieve varying degrees of voluntary mind-to-mind communication.
I think you’ve got it the wrong way around. In fact, that’s probably my biggest issue with the whole field of alignment. I think that it’s probably easier to solve the problem of human institution alignment than AI alignment, and that to do so would help solve the AI alignment problem as well.
The reason I say this is because individual humans are already aligned to human values, and it should be possible by some means to preserve this fact even while scaling up to entire organizations. There is no a priori reason that this would be more difficult than literally reverse engineering the entire human mind from scratch! That is, it doesn’t actually matter what human values are, if you believe that humans actually can be trusted to have them—all that matters is that you can be certain they are preserved by the individual-to-organization transition. So my position can be summarized as “designing an organization which preserves human values already present is easier than figuring out what they are to begin with and injecting them into something with no humanity already in it.”
As a matter of fact, this firm belief is the basis of my whole theory of what we ought to be doing as a species—I think AI should not be allowed to gain general intelligence, and that instead we should focus on creating an aligned superintelligence out of humans (with narrow AIs as “mortar”, mere extensions to human capacities) - first in the form of an organization, later on a “hive mind” using brain computer interfaces to achieve varying degrees of voluntary mind-to-mind communication.
That might not necessarily be required for AGI, though that does seem to be what figuring out how to program values is.
The latter is more what I was pointing to.