I’m not sure if you’re arguing that this is a good world in which to think about alignment. If you are arguing this, then I disagree.
It seems like in this formalization the human has to write down the code that contains human values, ship it off to the right side of the universe via a one-time-only Godly Transfer Of Information, and then that code needs to do things perfectly. (You can’t have any transfer of information back, since otherwise the AI can affect humans.) But it seems like this rules out a huge number of promising-seeming approaches to alignment, where the human gives feedback based on the AI’s behavior (see also Human-AI Interaction).
I’m not sure if you’re arguing that this is a good world in which to think about alignment.
I am not arguing this. Quoting my reply to ofer:
I think I sometimes bump into reasoning that feels like “instrumental convergence, smart AI, & humans exist in the universe → bad things happen to us / the AI finds a way to hurt us”; I think this is usually true, but not necessarily true, and so this extreme example illustrates how the implication can fail.
I’m not sure if you’re arguing that this is a good world in which to think about alignment. If you are arguing this, then I disagree.
It seems like in this formalization the human has to write down the code that contains human values, ship it off to the right side of the universe via a one-time-only Godly Transfer Of Information, and then that code needs to do things perfectly. (You can’t have any transfer of information back, since otherwise the AI can affect humans.) But it seems like this rules out a huge number of promising-seeming approaches to alignment, where the human gives feedback based on the AI’s behavior (see also Human-AI Interaction).
I am not arguing this. Quoting my reply to ofer:
(Edited post to clarify)