how humans are aligned
This is a description of how I think humans are kept somewhat aligned towards genetically-specified goals. It’s not an argument, just my views.
It works well enough, but not perfectly. South Korea isn’t having kids anymore. Sometimes you get serial killers or Dick Cheney. So, anything less seems likely to be inadequate.
previously: AI self-improvement is possible
Don’t allow systems to modify lower-level systems, and strongly limit self-modification at the same system level. When it’s done at all, reduce the amount allowed after an initial learning period, so that children have more flexibility than old people.
Don’t have long chains of systems generating systems. Limit things to 2 steps of systems generating higher-level systems.
Even if you try to restrict self-modification, some “leakage” will happen anyway. A physical time limit that can’t be extended past a maximum keeps that from becoming excessive. (Perhaps that’s why some octopus species are “hardcoded” to die at specific points in their reproductive cycle.)
More-capable misaligned systems are more dangerous, so degrading capabilities of systems as they approach their time limit makes them safer. I don’t actually think this is the limiting factor for human senility; I think that’s mainly due to uncontrolled covalent modification of DNA. What I do think is that humans have some low-level system that, when they get old, reduces influence of or shuts down certain mid-level systems.
Humans have monitoring systems with a lesser degree of self-modification than what they monitor. They have access to the internal opinions of what they monitor, but can be deceived. The net benefit is actually somewhat questionable.
If alignment drift is somewhat random, then making many separate agents that act according to consensus reduces the net drift. Hermits and dictators do weird stuff.
Like democracy, but for multiple agents inside a single individual. Obviously there has to be some way to prevent agents from coalescing into a single blob, but that could be managed by lower-level hardcoded systems that blindly bottleneck the bandwidth of some connection patterns and force some of that limited bandwidth through low-level systems.
Individual humans are not aligned at all, see “power corrupts”. Human societies are somewhat aligned with individual humans, in the sense that they need humans to exist and keep the society going, and those “unaligned” disappear pretty quickly. I do not see any alignment difference between totalitarian and democratic regimes, if you measure alignment by the average happiness of society. I don’t disagree that human misalignment has only moderate effects because of various limits on their power.
Good categorizations! Perhaps this fits in with your “limited self-modification” point, but another big reason why humans seem “aligned” with each other is that our capability spectrum is rather narrow. The gap in capability (if we include both mental intelligence and physical capabilities) between the median human and the most capable human is not so big that ~5 median humans can’t outmatch/outperform the most capable human. Contrary to what silly 1980s action movies might suggest where goons attack the hero one at a time, 5 median humans could probably subdue prime-age Arnold Schwarzenegger in a dark alley if need be. This tends to force humans to play iterated prisoners’ dilemma games with each other.
The times in history when humans have been the most mis-aligned is when humans became much more capable by leveraging their social intelligence / charisma stats to get millions of other humans to do their bidding. But even there, those dictators still find themselves in iterated prisoners’ dilemmas with other dictators. We have yet to really test just how mis-aligned humans can get until we empower a dictator with unquestioned authority over a total world government. Then we would find out just how intrinsically aligned humans really are to other humans when unshackled by iterated prisoners’ dilemmas.
To me it isn’t clear what alignment are you talking about.
You say that the list is about “alignment towards genetically-specified goals”, which I read as “humans are aligned with inclusive genetic fitness”, but then you talk about what I would describe as “humans aligned with each other” as in “humans want humans to be happy and have fun”. Are you confusing the two?
Here the first one shows misalignment towards IGF, while the second shows misalignment towards other humans, no?