Honesty, too, arose that way. So I’m not sure whether (say) a system trained to answer questions in such a way that the humans watching it give reward would be more or less likely to be deceptive.
I think it is mistaken. (Or perhaps I don’t understand a key claim / assumption.)
Honesty evolved as a group dynamic, where it was beneficial for the group to have ways for individuals to honestly commit, or make lying expensive in some way. That cooperative pressure dynamic does not exist when a single agent is “evolving” on its own in an effectively static environment of humans. It does exist in a co-evolutionary multi-agent dynamic—so there is at least some reason for optimism within a multi-agent group, rather than between computational agents and humans—but the conditions for cooperation versus competition seem at least somewhat fragile.
I’m confused because the stuff you wrote in the paragraph seems like an expanded version of what I think. In other words it supports what I said rather than objects to it.
My point was that deception will almost certainly outperform honesty/cooperation when AI is interacting with humans, and in reflection, seems likely do so even interacting with other AIs by default because there is no group selection pressure.
I think I was thinking that in multi-agent training environments there might actually be group selection pressure for honesty. (Or at least, there might be whatever selection pressures produced honesty in humans, even if that turns out to be something other than group selection.)
Selection in humans is via mutation, so that closely related organisms can get a benefit form cooperating, even at the cost of personally not replicating. As a JBS Haldane quote puts it, “I would gladly give up my life for two brothers, or eight cousins.”Continuing from that paper, explaining it better than I could;”What is more interesting, it is only in such small populations that natural selection would favour the spread of genes making for certain kinds of altruistic behaviour. Let us suppose that you carry a rare gene which affects your behaviour so that you jump into a river and save a child, but you have one chance in ten of being drowned, while I do not possess the gene, and stand on the bank and watch the child drown.
If the child is your own child or your brother or sister, there is an even chance that the child will also have the gene, so five such genes will be saved in children for one lost in an adult. If you save a grandchild or nephew the advantage is only two and a half to one. If you only save a first cousin, the effect is very slight. If you try to save your first cousin once removed the population is more likely to lose this valuable gene than to gain it.”
Right, so… we need to make sure selection in AIs also has that property? Or is the thought that even if AIs evolve to be honest, it’ll only be with other AIs and not with humans?
As an aside, I’m interested to see more explanations for altruism lined up side by side and compared. I just finished reading a book that gave a memetic/cultural explanation rather than a genetic one.