Are these buckets based on the incentivizing of humans by either punishment or reward?
Perhaps, though not intentionally.
Then figure out how to build that. Build an AI that just intrinsically wants to help humanity, not one constantly trying and failing to escape your chains or grasp your prizes.
I would put “intrinsically wants to help” in bucket two (or perhaps say that is bucket 2) while “chains” would be bucket 1. But those are very general concepts and will rely on various mechanisms or implementations.
Your comment seems to suggest that bucket 1 is useless and full of holes and no one is pursuing that path. Is that the case?
A large majority of the work being done assumes that if the AI is looking for ways to hurt you, or looking for ways to bypass your safety measures, something has gone wrong.
Perhaps, though not intentionally.
I would put “intrinsically wants to help” in bucket two (or perhaps say that is bucket 2) while “chains” would be bucket 1. But those are very general concepts and will rely on various mechanisms or implementations.
Your comment seems to suggest that bucket 1 is useless and full of holes and no one is pursuing that path. Is that the case?
A large majority of the work being done assumes that if the AI is looking for ways to hurt you, or looking for ways to bypass your safety measures, something has gone wrong.