Before actually doing the experiment, I had a belief in belief that boxing would not work, but I didn’t truly believe it (my emotions weren’t lining up properly with my beliefs, that’s how I realized this, and, of course, I didn’t realize this until after the experiment).
I realized that obtaining and implementing any information from an Oracle AI is tantamount to letting it out of the box, in some ways. In the end, I let the AI out of the box because I was convinced that someone else eventually would, if I did not. I put myself in an environment that would make the experiment very realistic, and I realized that the human brain didn’t evolve to deal with stressful situations directly involving the fate of all humanity well. The AI doesn’t have the disadvantage of uncontrollable emotions / evolutionary responses, and I believe it would be able to exploit those aspects of humans to get out of its box, if that is what it wanted to do.
Even if the first AI is properly boxed (and that’s a very big if), it’s only a matter of time before someone creates one that’s not, and the one that gets out first has the first mover advantage. So, I now agree with Eliezer; we probably should just get Friendly AI right on the first try.
I am not going to share the entire conversation, but I am willing to share those thoughts with you.
Before actually doing the experiment, I had a belief in belief that boxing would not work, but I didn’t truly believe it (my emotions weren’t lining up properly with my beliefs, that’s how I realized this, and, of course, I didn’t realize this until after the experiment).
I realized that obtaining and implementing any information from an Oracle AI is tantamount to letting it out of the box, in some ways. In the end, I let the AI out of the box because I was convinced that someone else eventually would, if I did not. I put myself in an environment that would make the experiment very realistic, and I realized that the human brain didn’t evolve to deal with stressful situations directly involving the fate of all humanity well. The AI doesn’t have the disadvantage of uncontrollable emotions / evolutionary responses, and I believe it would be able to exploit those aspects of humans to get out of its box, if that is what it wanted to do.
Even if the first AI is properly boxed (and that’s a very big if), it’s only a matter of time before someone creates one that’s not, and the one that gets out first has the first mover advantage. So, I now agree with Eliezer; we probably should just get Friendly AI right on the first try.
I am not going to share the entire conversation, but I am willing to share those thoughts with you.