“Partnership” is a weird way to describe things when you’re writing the source code. That is, it makes sense to think of humans ‘partnering’ with their children because the children bring their own hopes and dreams and experiences to the table; but if parents could choose from detailed information about a billion children to select their favorite instead of getting blind luck of the draw, the situation would seem quite different. Similarly with humans partnering with each other, or people partnering with governments or corporations, and so on.
However, I do think something like “partnership” ends up being a core part of AI alignment research; that is, if presented between the option of ‘writing out lots of policy-level constraints’ and ‘getting an AI that wants you to succeed at aligning it’ / ‘getting an AI that shares your goals’, the latter is vastly preferable. See the Non-adversarial principle and Niceness is the first line of defense.
Some approaches focus on incentives, or embedding constraints through prices, but the primary concerns here are 1) setting the prices incorrectly and 2) nearest unblocked strategies. You don’t want the AI to “not stuff the ballot boxes”, since it will just find the malfeasance you didn’t think of; you want the AI to “respect the integrity of elections.”
“Partnership” is a weird way to describe things when you’re writing the source code. That is, it makes sense to think of humans ‘partnering’ with their children because the children bring their own hopes and dreams and experiences to the table; but if parents could choose from detailed information about a billion children to select their favorite instead of getting blind luck of the draw, the situation would seem quite different. Similarly with humans partnering with each other, or people partnering with governments or corporations, and so on.
However, I do think something like “partnership” ends up being a core part of AI alignment research; that is, if presented between the option of ‘writing out lots of policy-level constraints’ and ‘getting an AI that wants you to succeed at aligning it’ / ‘getting an AI that shares your goals’, the latter is vastly preferable. See the Non-adversarial principle and Niceness is the first line of defense.
Some approaches focus on incentives, or embedding constraints through prices, but the primary concerns here are 1) setting the prices incorrectly and 2) nearest unblocked strategies. You don’t want the AI to “not stuff the ballot boxes”, since it will just find the malfeasance you didn’t think of; you want the AI to “respect the integrity of elections.”
Another way to think about this, instead of those two buckets, are Eliezer’s three buckets of directing, limiting, and opposing.