We sometimes see with rationalists and utilitarian EAs do something like the same thing we worry about with AI: unaligned optimization that produces outcomes we don’t like. Unfortunately, because humans disagree on norms/ethics/values, it’s kind of hard to know the difference between “going off the rails” and “correcting a massive oversight or collective moral failing”, especially from the inside.
I’m gonna add an even more pessimistic hypothesis: That the disagreements around values are fundamentally irresolvable because there is no truth at the end of the tunnel.
Or, one man’s “going off the rails” is another man’s “correcting a massive oversight or collective moral failing”, and these perspectives can’t be reconciled.
No. Because “going off the rails” often involves doing things that are observably irrational even by your own worldview. Like killing your parents and your landlord.
You can say: “this might make sense from their worldview! (soy)”
And the obvious response is: Yes. Because they’re crazy. Because they went off the rails.
You can also say: “But we’ll never know! Who can know? Nobody knows! Truth is subjective blargh”
And the again obvious response is: Yes, but we can observe patterns. And if you can’t update on this evidence and use some basic sense when this sort of thing repeats, you are not thinking clearly.
Actually, this makes me think of something.
We sometimes see with rationalists and utilitarian EAs do something like the same thing we worry about with AI: unaligned optimization that produces outcomes we don’t like. Unfortunately, because humans disagree on norms/ethics/values, it’s kind of hard to know the difference between “going off the rails” and “correcting a massive oversight or collective moral failing”, especially from the inside.
I’m gonna add an even more pessimistic hypothesis: That the disagreements around values are fundamentally irresolvable because there is no truth at the end of the tunnel.
Or, one man’s “going off the rails” is another man’s “correcting a massive oversight or collective moral failing”, and these perspectives can’t be reconciled.
No. Because “going off the rails” often involves doing things that are observably irrational even by your own worldview. Like killing your parents and your landlord.
You can say: “this might make sense from their worldview! (soy)”
And the obvious response is: Yes. Because they’re crazy. Because they went off the rails.
You can also say: “But we’ll never know! Who can know? Nobody knows! Truth is subjective blargh”
And the again obvious response is: Yes, but we can observe patterns. And if you can’t update on this evidence and use some basic sense when this sort of thing repeats, you are not thinking clearly.