I’m familiar with (and tend to enjoy, though not always agree with) Wei Dai’s writing, but was unable to find any that addressed the issues of human values becoming technologically far more malleable and that combining with Value Learning or corrigibility to produce an unstable feedback loop: can you point me to which one you mean? I looked through all of them in Section 3 that you directed me to, and none of them address it: the nearest I could find is Intentional and unintentional manipulation of / adversarial attacks on humans by AI, but that doesn’t actually address the same issue.
As for his Six Plausible Meta-Ethical Alternatives — as far as I’m concerned he’s as confused about meta-ethics as every other current philosopher who still hasn’t yet noticed that there has, for the last half-century, been a scientific theory of how human moral intuitions evolved, and that its predictions don’t match any of the six clean meta-ethical alternatives that he thought covered all the possibilities. Briefly, humans can only comfortably use ethical systems fairly compatible with human moral intuitions, those are evolved strategies, they’re the product of human evolutionary circumstances, some of them would for game-theoretic reasons generalize somewhat to other social sapient species evolved to live in large non-kin groups, but not all. None of his six alternatives match that: the scientific evidence is that reality is a fuzzy blend of 3–4 of them. But post-ASI, a lot of the evolutionary constraints go away by default, and we could rebuild ourselves to have whatever moral instincts we wanted to, whether that’s truly devout Christians of unshakable faith, or something far, far WEIRDer.
I don’t see how the Intelligence Curse is relevant to this: that appears to be about what happens if we get AI that is aligned only to the heads of the companies that make it, or at least only to peopel with a great deal of capital, and not to anyone else. Yes, of course that’s bad — but I was assuming for sake of argument that we’d succeeded in aligning ASI, not messed that up in the most Moloch way possible.
I enjoyed Buck’s take, thanks for the link. He has a different flavor of WEIRD than me: he’s concerned about society fragmenting and balkanizing, and some of the fragments having value lock-in — I suspect ASI would just step in and quietly fix that, because most people outside that small fragment would want it to. But it’s at least in the same area: human values get very malleable, weirdness ensues. I agree a little more with Matosczi’s comment, and some of the other comment threads on that post.
Just looking at the summaries, the closest to WEIRD in the Post-AGI talks you mentioned is a discussion of the opposite problem, value lock-in (which has been discussed at great length for years):
Tianyi Qui of Peking University spoke on spoke on “LLM-Mediated Cultural Feedback Loops”. They empirically studied “culture lock-in”, where LLM output affects human output and causes a feedback loop which locks in a certain belief, value or practice.
— so which of these should I be watching?
You say:
However, I find it hard to understand what causes values to mutate. Suppose that changes are only due to finding inconsistencies in the existing moral framework
Why would changes in human values only be due to finding “inconsistencies” (whatever that means — ways in which we’re not well adapted to our current environment, perhaps)? The cultural parts of human values change endlessly, sometimes for necessary and adaptive reasons (two thousand years ago the Roman economy was based on slavery, now that’s extremely out), but often for reasons no more well-thought out than fads or fashions: one year skirts are short, another year they’re long, and similarly politics has pendulum swings. Currently, the genetic parts of human values drag us back again from cultural gyrations. But once you have genetic engineering, that sort of change becomes potentially permanent. Suppose at some point society decides for no very good reason (to anyone not French) that blue cheese is really important and everyone should be into it? You think it’s stinky, but nowadays you can keep up with the Joneses, and get gene edited to become a true-blue-cheese-afficionado who innately adores the stuff, and then pass that tendency on to your kids. And that lasts until overwritten with something even less like humanity 1.0. Ditto for your level of machiavellianism, your moral intuitions on animal welfare, the innate default size of your moral circle, your inclinations on individualism vs collectivism, your innate degree of romantic fidelity vs infidelity (yes, that has a genetic basis), and whether you find flowers pretty or prefer moss. At that point, what about human values is still fixed? Anything can be edited. So why, sooner or later, wouldn’t everything have been edited?
I’m afraid I don’t get why this would be hard to understand. Apparently I didn’t explain that part well enough. (Possibly I’ve read more post-Singularity science fiction than some of my readership — a situation where human nature and values is very mutable isn’t a new concept to me.)
I’m familiar with (and tend to enjoy, though not always agree with) Wei Dai’s writing, but was unable to find any that addressed the issues of human values becoming technologically far more malleable and that combining with Value Learning or corrigibility to produce an unstable feedback loop: can you point me to which one you mean? I looked through all of them in Section 3 that you directed me to, and none of them address it: the nearest I could find is Intentional and unintentional manipulation of / adversarial attacks on humans by AI, but that doesn’t actually address the same issue.
As for his Six Plausible Meta-Ethical Alternatives — as far as I’m concerned he’s as confused about meta-ethics as every other current philosopher who still hasn’t yet noticed that there has, for the last half-century, been a scientific theory of how human moral intuitions evolved, and that its predictions don’t match any of the six clean meta-ethical alternatives that he thought covered all the possibilities. Briefly, humans can only comfortably use ethical systems fairly compatible with human moral intuitions, those are evolved strategies, they’re the product of human evolutionary circumstances, some of them would for game-theoretic reasons generalize somewhat to other social sapient species evolved to live in large non-kin groups, but not all. None of his six alternatives match that: the scientific evidence is that reality is a fuzzy blend of 3–4 of them. But post-ASI, a lot of the evolutionary constraints go away by default, and we could rebuild ourselves to have whatever moral instincts we wanted to, whether that’s truly devout Christians of unshakable faith, or something far, far WEIRDer.
I don’t see how the Intelligence Curse is relevant to this: that appears to be about what happens if we get AI that is aligned only to the heads of the companies that make it, or at least only to peopel with a great deal of capital, and not to anyone else. Yes, of course that’s bad — but I was assuming for sake of argument that we’d succeeded in aligning ASI, not messed that up in the most Moloch way possible.
I enjoyed Buck’s take, thanks for the link. He has a different flavor of WEIRD than me: he’s concerned about society fragmenting and balkanizing, and some of the fragments having value lock-in — I suspect ASI would just step in and quietly fix that, because most people outside that small fragment would want it to. But it’s at least in the same area: human values get very malleable, weirdness ensues. I agree a little more with Matosczi’s comment, and some of the other comment threads on that post.
Just looking at the summaries, the closest to WEIRD in the Post-AGI talks you mentioned is a discussion of the opposite problem, value lock-in (which has been discussed at great length for years):
— so which of these should I be watching?
You say:
Why would changes in human values only be due to finding “inconsistencies” (whatever that means — ways in which we’re not well adapted to our current environment, perhaps)? The cultural parts of human values change endlessly, sometimes for necessary and adaptive reasons (two thousand years ago the Roman economy was based on slavery, now that’s extremely out), but often for reasons no more well-thought out than fads or fashions: one year skirts are short, another year they’re long, and similarly politics has pendulum swings. Currently, the genetic parts of human values drag us back again from cultural gyrations. But once you have genetic engineering, that sort of change becomes potentially permanent. Suppose at some point society decides for no very good reason (to anyone not French) that blue cheese is really important and everyone should be into it? You think it’s stinky, but nowadays you can keep up with the Joneses, and get gene edited to become a true-blue-cheese-afficionado who innately adores the stuff, and then pass that tendency on to your kids. And that lasts until overwritten with something even less like humanity 1.0. Ditto for your level of machiavellianism, your moral intuitions on animal welfare, the innate default size of your moral circle, your inclinations on individualism vs collectivism, your innate degree of romantic fidelity vs infidelity (yes, that has a genetic basis), and whether you find flowers pretty or prefer moss. At that point, what about human values is still fixed? Anything can be edited. So why, sooner or later, wouldn’t everything have been edited?
I’m afraid I don’t get why this would be hard to understand. Apparently I didn’t explain that part well enough. (Possibly I’ve read more post-Singularity science fiction than some of my readership — a situation where human nature and values is very mutable isn’t a new concept to me.)