how humans are aligned

This is a description of how I think humans are kept somewhat aligned towards genetically-specified goals. It’s not an argument, just my views.

It works well enough, but not perfectly. South Korea isn’t having kids anymore. Sometimes you get serial killers or Dick Cheney. So, anything less seems likely to be inadequate.

previously: AI self-improvement is possible


limited self-modification

Don’t allow systems to modify lower-level systems, and strongly limit self-modification at the same system level. When it’s done at all, reduce the amount allowed after an initial learning period, so that children have more flexibility than old people.

limited length

Don’t have long chains of systems generating systems. Limit things to 2 steps of systems generating higher-level systems.

lifespan

Even if you try to restrict self-modification, some “leakage” will happen anyway. A physical time limit that can’t be extended past a maximum keeps that from becoming excessive. (Perhaps that’s why some octopus species are “hardcoded” to die at specific points in their reproductive cycle.)

old age

More-capable misaligned systems are more dangerous, so degrading capabilities of systems as they approach their time limit makes them safer. I don’t actually think this is the limiting factor for human senility; I think that’s mainly due to uncontrolled covalent modification of DNA. What I do think is that humans have some low-level system that, when they get old, reduces influence of or shuts down certain mid-level systems.

monitoring

Humans have monitoring systems with a lesser degree of self-modification than what they monitor. They have access to the internal opinions of what they monitor, but can be deceived. The net benefit is actually somewhat questionable.

democracy

If alignment drift is somewhat random, then making many separate agents that act according to consensus reduces the net drift. Hermits and dictators do weird stuff.

internal democracy

Like democracy, but for multiple agents inside a single individual. Obviously there has to be some way to prevent agents from coalescing into a single blob, but that could be managed by lower-level hardcoded systems that blindly bottleneck the bandwidth of some connection patterns and force some of that limited bandwidth through low-level systems.