(3) seems slippery. The AIs are as nice as your friends “under normal conditions”? Does running a giant collective of them at 100x speed count as “normal conditions”?
If some of that niceness-in-practice required a process where it was interacting with humans, what happens when each instance interacts with a human on average 1000x less often, and in a very different context?
Like, I agree something like this could work in principle, that the tweaks to how the AI uses human feedback needed to get more robust niceness aren’t too complicated, that the tweaks to the RL needed to make internal communication not collapse into self-hacking without disrupting niceness aren’t too complicated either, etc. It’s just that most things aren’t that complicated once you know them, and it still takes lots of work to figure them out.
I agree that running the giant collective at 100x speed is not “normal conditions”. That’s why I have two different steps, (3) for making the human level AIs nice under normal conditions, and (6) for the niceness generalizing to the giant collective. I agree that the generalization step in (6) is not obviously going to go well, but I’m fairly optimistic, see my response to J Bostock on the question.
(3) seems slippery. The AIs are as nice as your friends “under normal conditions”? Does running a giant collective of them at 100x speed count as “normal conditions”?
If some of that niceness-in-practice required a process where it was interacting with humans, what happens when each instance interacts with a human on average 1000x less often, and in a very different context?
Like, I agree something like this could work in principle, that the tweaks to how the AI uses human feedback needed to get more robust niceness aren’t too complicated, that the tweaks to the RL needed to make internal communication not collapse into self-hacking without disrupting niceness aren’t too complicated either, etc. It’s just that most things aren’t that complicated once you know them, and it still takes lots of work to figure them out.
I agree that running the giant collective at 100x speed is not “normal conditions”. That’s why I have two different steps, (3) for making the human level AIs nice under normal conditions, and (6) for the niceness generalizing to the giant collective. I agree that the generalization step in (6) is not obviously going to go well, but I’m fairly optimistic, see my response to J Bostock on the question.