So the question is, can we transfer niceness in this way, without needing a solution to the full problem of niceness in general?
How do you determine that it will be nice under the given condition?
As posed, it’s entirely possible that the niceness is a coincidence: an artifact of the initial conditions fitting just right with the programming. Think of a coin landing on its side or a pencil being balanced on its tip. These positions are unstable and you need very specific initial conditions to get them to work.
The safe bet would be to have the AI start plotting an assassination and hope it lets you out of prison once its coup succeeds.
How do you determine that it will be nice under the given condition?
As posed, it’s entirely possible that the niceness is a coincidence: an artifact of the initial conditions fitting just right with the programming. Think of a coin landing on its side or a pencil being balanced on its tip. These positions are unstable and you need very specific initial conditions to get them to work.
The safe bet would be to have the AI start plotting an assassination and hope it lets you out of prison once its coup succeeds.