For the monkeys example, I mean that I expect that in practice there will be activists that will actually save the monkeys if they are wealthy enough to succeed in doing so on a whim. There are already expensive rainforest conservation efforts costing hundreds of milions of dollars. Imagine that they instead cost $10 and anyone could pay that cost without needing to coordinate with others. Then, I claim, someone would.
By analogy, the same should happen with humanity instead of monkeys, if AGIs reason in a sufficiently human-like way. I don’t currently find it likely that most AGIs would normatively accept some decision theory that rules it out. It’s obiously possible in principle to construct AGIs that follow some decision theory (or value paperclips), but that’s not the same thing as such properties of AGI behavior being convergent and likely.
I think a default shape of a misaligned AGIs is a sufficiently capable simulacrum, a human-like alien thing that faces the same value extrapolation issues as humanity, in a closely analogous way. (That is, if an AGI alignment project doesn’t make something clever instead that becomes much more alien and dangerous as a result.) And a default aligned AGI is the same, but not that alien, more of a generalized human.
For the monkeys example, I mean that I expect that in practice there will be activists that will actually save the monkeys if they are wealthy enough to succeed in doing so on a whim. There are already expensive rainforest conservation efforts costing hundreds of milions of dollars. Imagine that they instead cost $10 and anyone could pay that cost without needing to coordinate with others. Then, I claim, someone would.
By analogy, the same should happen with humanity instead of monkeys, if AGIs reason in a sufficiently human-like way. I don’t currently find it likely that most AGIs would normatively accept some decision theory that rules it out. It’s obiously possible in principle to construct AGIs that follow some decision theory (or value paperclips), but that’s not the same thing as such properties of AGI behavior being convergent and likely.
I think a default shape of a misaligned AGIs is a sufficiently capable simulacrum, a human-like alien thing that faces the same value extrapolation issues as humanity, in a closely analogous way. (That is, if an AGI alignment project doesn’t make something clever instead that becomes much more alien and dangerous as a result.) And a default aligned AGI is the same, but not that alien, more of a generalized human.