It could both be the case that there exists catastrophic inner alignment failure between humans and evolution, and also that humans don’t regularly experience catastrophic inner alignment failures internally.
In practice I do suspect humans regularly experience internal inner alignment failures, but given that suspicion I feel surprised by how functional humans do manage to be. In other words, I notice expecting that regular inner alignment failures would cause far more mayhem than I observe, which makes me wonder whether brains are implementing some sort of alignment-relevant tech.
In practice I do suspect humans regularly experience internal (within-brain) inner alignment failures, but given that suspicion I feel surprised by how functional humans manage to be. That is, I notice expecting that regular inner alignment failures would cause far more mayhem than I observe, which makes me wonder whether brains are implementing some sort of alignment-relevant tech.
I don’t know why you expect an inner alignment failure to look dysfunctional. Instrumental convergence suggests that it would look functional. What the world looks like if there are inner alignment failures inside the human brain is (in part) that humans pursue a greater diversity of terminal goals than can be accounted for by genetics.
What would inner alignment failures even look like? Overdosing on meth sure makes the dopamine system happy. Perhaps human values reside in the prefrontal complex, and all of humanity is a catastrophic alignment failure of the dopamine system (except a small minority of drug addicts) on top of being a catastrophic alignment failure of natural selection.
It could both be the case that there exists catastrophic inner alignment failure between humans and evolution, and also that humans don’t regularly experience catastrophic inner alignment failures internally.
In practice I do suspect humans regularly experience internal inner alignment failures, but given that suspicion I feel surprised by how functional humans do manage to be. In other words, I notice expecting that regular inner alignment failures would cause far more mayhem than I observe, which makes me wonder whether brains are implementing some sort of alignment-relevant tech.
I don’t know why you expect an inner alignment failure to look dysfunctional. Instrumental convergence suggests that it would look functional. What the world looks like if there are inner alignment failures inside the human brain is (in part) that humans pursue a greater diversity of terminal goals than can be accounted for by genetics.
What would inner alignment failures even look like? Overdosing on meth sure makes the dopamine system happy. Perhaps human values reside in the prefrontal complex, and all of humanity is a catastrophic alignment failure of the dopamine system (except a small minority of drug addicts) on top of being a catastrophic alignment failure of natural selection.