A few points on examples from humans in capacity-to-succeed-through-deception (tricking in the transcript):
It’s natural that we don’t observe anyone successfully doing this, since success entails not being identified as deceptive. This could involve secrecy, but more likely things like charisma and leverage of existing biases.
When making comparisons with very-smart-humans, I think it’s important to consider very-smart-across-all-mental-dimensions-humans (including charisma etc).
It may be that people have paths to high utility (which may entail happiness, enlightenment, meaning, contentment… rather than world domination) that don’t involve the risks of a deceptive strategy. If human utility were e.g. linear in material resources, things may look different.
Human deception is often kept in check by cost-of-punishments outweighing benefit-of-potential-success. With AI agents the space of meaningful punishments will likely look different.
Very interesting, thanks.
A few points on examples from humans in capacity-to-succeed-through-deception (tricking in the transcript):
It’s natural that we don’t observe anyone successfully doing this, since success entails not being identified as deceptive. This could involve secrecy, but more likely things like charisma and leverage of existing biases.
When making comparisons with very-smart-humans, I think it’s important to consider very-smart-across-all-mental-dimensions-humans (including charisma etc).
It may be that people have paths to high utility (which may entail happiness, enlightenment, meaning, contentment… rather than world domination) that don’t involve the risks of a deceptive strategy. If human utility were e.g. linear in material resources, things may look different.
Human deception is often kept in check by cost-of-punishments outweighing benefit-of-potential-success. With AI agents the space of meaningful punishments will likely look different.