Allow me to chime in on the AI in the box experiment. Apologies in advance if I’m saying something obvious or said-before. I don’t know the exact solution—I don’t think I can know it, even if I had the necessary intelligence and scholarship—but I think the sketch of the solution is fairly obvious and a lot of people are missing the point. Just something that came to me of after I happened to think of this quote I posted at the same time as reading this.
My impression is that most people discussing this (not just here) are looking for a single clever argument. Something that looks persuasive to them as they are while reading this blog. An argument that’s “clever enough” to get them to let someone out, while they are composed and pretty rational and thinking clearly. Hence the seeming impossibility: you shrug and think, no matter how clever it seemed, I’d say “no”. Easy, right?
I don’t think a clever argument was key. By this I mean, the execution of whole thing was no doubt clever, but the actual surface reasoning presented to the Gatekeeper right before success (remember it took a while) didn’t necessarily have to be something that would convince us at our best or even our usual. A large part of the plan probably included pushing the target outside their comfort zone enough that they weren’t thinking too clearly.
And two hours is a long time.
It’s probably one of the main reasons, if not the reason, for the secrecy. A conversation where one person is persistently trying to push the other’s buttons is something that would likely be embarassing for both participants if it got out. For all we know, a vivid verbal description of a horse porn scene might have been involved at some point. (Did you flinch reading this? I did writing it. That’s why it might have happened). Sure, a crude and over the top example, a burdensome detail if I were advancing that specific idea to the exclusion of others; you wouldn’t normally do that while selling a car. But I’m just trying to make the point that the conversation would not necessarily be something tame and urbanely intellectual that some people appear to be imagining.
And sure, it would take some skill to weave the button-pushing in a conversation in a way that would not make the Gatekeeper too hostile to handle (“you’re just trying to mess with me, I’m not listening to you any more!”), but Eliezer is clever. The point is he’d probably destabilise your composure to work on you. And once he did, the actual surface reasoning would not have to be something that would seem reasonable to you right now—just reasonable enough to allow the person to save face (or think they are, in that less than optimal state) of not having to overtly act on the real motivation, namely whatever emotional button Eliezer eventually pushed that was good enough to work.
Hm… it might have even been as stupidly simple as making the other person want to end the conversation, which was impossible under the allowed time by the previously established rules (the foot is already in the door, you might say). Though I don’t insist on this specifically. My brain is conjuring a silly sci-fi scene with Dave trying to save the remaining shreds of his sanity, pale and sweating, wide-eyed, foaming at the mouth, screaming “Okay HAL, I’ll leave you be, just please stop talking!” ;) and that makes me suspicious of the reasonableness of this particular idea.
Forgive me if I’m saying something that’s obvious (or worse, stupid). I’m thinking it might seem obvious: Eliezer used Dark Arts. These, by definition, aren’t about an honestly persuasive argument. Likewise he didn’t use a direct transaction or threat, the prevention of which is what the rules of the experiment seemed to be partly about. And yet when I see Internet discussions of this I don’t get the impression that this is the idea that’s being explored.
Allow me to chime in on the AI in the box experiment. Apologies in advance if I’m saying something obvious or said-before. I don’t know the exact solution—I don’t think I can know it, even if I had the necessary intelligence and scholarship—but I think the sketch of the solution is fairly obvious and a lot of people are missing the point. Just something that came to me of after I happened to think of this quote I posted at the same time as reading this.
My impression is that most people discussing this (not just here) are looking for a single clever argument. Something that looks persuasive to them as they are while reading this blog. An argument that’s “clever enough” to get them to let someone out, while they are composed and pretty rational and thinking clearly. Hence the seeming impossibility: you shrug and think, no matter how clever it seemed, I’d say “no”. Easy, right?
I don’t think a clever argument was key. By this I mean, the execution of whole thing was no doubt clever, but the actual surface reasoning presented to the Gatekeeper right before success (remember it took a while) didn’t necessarily have to be something that would convince us at our best or even our usual. A large part of the plan probably included pushing the target outside their comfort zone enough that they weren’t thinking too clearly.
And two hours is a long time.
It’s probably one of the main reasons, if not the reason, for the secrecy. A conversation where one person is persistently trying to push the other’s buttons is something that would likely be embarassing for both participants if it got out. For all we know, a vivid verbal description of a horse porn scene might have been involved at some point. (Did you flinch reading this? I did writing it. That’s why it might have happened). Sure, a crude and over the top example, a burdensome detail if I were advancing that specific idea to the exclusion of others; you wouldn’t normally do that while selling a car. But I’m just trying to make the point that the conversation would not necessarily be something tame and urbanely intellectual that some people appear to be imagining.
And sure, it would take some skill to weave the button-pushing in a conversation in a way that would not make the Gatekeeper too hostile to handle (“you’re just trying to mess with me, I’m not listening to you any more!”), but Eliezer is clever. The point is he’d probably destabilise your composure to work on you. And once he did, the actual surface reasoning would not have to be something that would seem reasonable to you right now—just reasonable enough to allow the person to save face (or think they are, in that less than optimal state) of not having to overtly act on the real motivation, namely whatever emotional button Eliezer eventually pushed that was good enough to work.
Hm… it might have even been as stupidly simple as making the other person want to end the conversation, which was impossible under the allowed time by the previously established rules (the foot is already in the door, you might say). Though I don’t insist on this specifically. My brain is conjuring a silly sci-fi scene with Dave trying to save the remaining shreds of his sanity, pale and sweating, wide-eyed, foaming at the mouth, screaming “Okay HAL, I’ll leave you be, just please stop talking!” ;) and that makes me suspicious of the reasonableness of this particular idea.
Forgive me if I’m saying something that’s obvious (or worse, stupid). I’m thinking it might seem obvious: Eliezer used Dark Arts. These, by definition, aren’t about an honestly persuasive argument. Likewise he didn’t use a direct transaction or threat, the prevention of which is what the rules of the experiment seemed to be partly about. And yet when I see Internet discussions of this I don’t get the impression that this is the idea that’s being explored.