I have found that the more I use my simulation of HPMOR!Quirrell for advice, the harder it is to shut him up. As with any mental discipline, thinking in particular modes wears thought-grooves into your brain’s hardware, and before you know it you’ve performed an irreversible self-modification. Consequently, I would definitely recommend that anybody attempting to supplant their own personality (for lack of a better phrasing) with a model of some idealized reasoner try to make sure that the idealized reasoner shares your values as thoroughly as possible.
I’ve now got this horrifying idea that this has been Quirrell’s plan all along: to escape from HPMOR to the real world by tempting you to simulate him until he takes over your mind.
afaict the quirrell tulpa is one of the more common types of tulpas. if you have one, do not use it. it is secretly voldemort and will destroy your soul.
Rational agents cannot be successfully blackmailed by other agents that simulate them accurately, and especially not by figments of their own imagination.
Are you implying that rational agents can be successfully blackmailed by other agents that simulate them inaccurately? (This does seem plausible to me, and is an interesting rare example of accurate knowlage posing a hazard.)
Well, that’s quite obvious. Just imagine the blackmailer is a really stupid human with a big gun that’d fall for blackmail in a variety of awful ways, and has a bad case of typical mind fallacy, and if anything goes other than their expectations they get angry and just shot them before thinking through the consequences.
its a situation where stupidity is decisive advantage!
Not quite stupidity—irrationality. And it is well-known that (credible) irrationality can be a big advantage in negotiations and other game theory scenarios. Essentially, if I’m irrational then you cannot simulate me accurately and cannot predict what I will do which means that your risk aversion pushes you towards safe choices which limit your downside at the cost of your upside. And if it’s a zero-sum game, I get this upside.
Of course, I need to be credible in showing my irrationality.
The reason such a strategy is not used more often is because (a) often there is the option to walk away which many people do when faced with an irrational counterparty; and (b) when two irrational counterparties meet, bad things happen :-)
There are instances where (arguably) irrationality confers a big game-theoretic advantage even though you’re predictable.
For instance, suppose you’re leading a nuclear superpower. If you can make it credibly clear that you really truly would be happy to launch World War Three if the other guys don’t back down, then they probably will. Not because they can’t predict your actions, but because they can.
In this sort of case it’s either debatable whether it’s really irrationality, or debatable whether it’s really a game-theoretic advantage. If you can really be sure that the other guys will back down, then maybe it’s not irrationality because you never have to blow up the world. If you can’t, then maybe you don’t have a game-theoretic advantage after all because if you play this game often enough then the other guys call your bluff, you push the big red button, and everyone dies.
[EDITED to add: I think this sort of case is nearer to the example discussed upthread than the sort where unpredictability is key.]
For instance, suppose you’re leading a nuclear superpower. If you can make it credibly clear that you really truly would be happy to launch World War Three
That’s more like sheer bloodymindedness X-) not irrationality.
then the other guys call your bluff, you push the big red button, and everyone dies.
Yeah, it’s called the game of chicken and that’s a slightly different thing.
I think you mean that rational agents cannot be successfully blackmailed by others agents that for which it is common knowledge that the other agents can simulate them accurately and will only use blackmail if they predict it to be successful. All of this of course in the absence of mitigating circumstances (including for example the theoretical likelihood of other agents that reward you for counterfactualy giving into blackmail under these circumstances).
That doesn’t seem true. How can the victim know for sure that the blackmailer is simulating them accurately or being rational?
Suppose you get mugged in an alley by random thugs. Which of these outcomes seems most likely:
You give them the money, they leave.
You lecture them about counterfactual reasoning, they leave.
You lecture them about counterfactual reasoning, they stab you.
Any agent capable of appearing irrational to a rational agent can blackmail that rational agent. This decreases the probability of agents which appear irrational being irrational, but not necessarily to the point that you can dismiss them.
I think it basically comes to, if the rational agent recognizes that the rational thing to do is to NOT buckle under blackmail, regardless of what the rational agent simulating them threatens, then the blackmailer’s simulation of the blackmailee will also not respond to that pressure, and so it’s pointless to go to the effort of pressuring them in the first place.
However, if the blackmailer is irrational, their simulation of the blackmailee will be irrational, and thus they will carry through with the threat. This means that the blackmailee’s simulation of the blackmailer as rational is itself inaccurate, as the simulation does not correspond to reality.
If the blackmailee is irrational, their simulation of the blackmailer will be irrational, and thus they will concede to their demands.
Yet, each party acts as if their simulation of the other was correct, until actual, photon-transmitted information about the world can impress itself into their cognitive function.
So, no-one gets what they want. The best choice for a rational agent here is just to ignore the good professor.
On the other hand, you can’t argue with results.
And there’s a simulation of Quirrel s-quirreled away in your brain, whispering.
It looks like you are saying that both rational and irrational agents model competitors as behaving in the same way they do.
Is that why you think that an irrational simulation of a rational agent must be wrong, and why a rational simulation of an irrational agent must be wrong? I suggest that an irrational agent can correctly model even a perfectly rational one.
Worryingly, this sounds like a good deal—getting skills for faster power/control increase, keeping continuity of consciousness, and increasing the odds of escaping from this reality into the next higher one...
I have found that the more I use my simulation of HPMOR!Quirrell for advice, the harder it is to shut him up. As with any mental discipline, thinking in particular modes wears thought-grooves into your brain’s hardware, and before you know it you’ve performed an irreversible self-modification. Consequently, I would definitely recommend that anybody attempting to supplant their own personality (for lack of a better phrasing) with a model of some idealized reasoner try to make sure that the idealized reasoner shares your values as thoroughly as possible.
I’ve now got this horrifying idea that this has been Quirrell’s plan all along: to escape from HPMOR to the real world by tempting you to simulate him until he takes over your mind.
Hmm, so the Fanfiction.net website is his horcrux?
In retrospect, I’m kind of glad that my plan to make a Quirrell-tulpa never got off the ground.
afaict the quirrell tulpa is one of the more common types of tulpas. if you have one, do not use it. it is secretly voldemort and will destroy your soul.
But Quirrell didn’t cause Eliezer to write HPMOR...
It’s to Quirrell’s advantage that you believe that, of course.
Beware acausal trade! Once Eliezer imagined Quirrel, he had to write HPMOR to stop Quirrel from counterfactually simulating 3^^^3 dustspeckings.
Rational agents cannot be successfully blackmailed by other agents that simulate them accurately, and especially not by figments of their own imagination.
Are you implying that rational agents can be successfully blackmailed by other agents that simulate them inaccurately? (This does seem plausible to me, and is an interesting rare example of accurate knowlage posing a hazard.)
Well, that’s quite obvious. Just imagine the blackmailer is a really stupid human with a big gun that’d fall for blackmail in a variety of awful ways, and has a bad case of typical mind fallacy, and if anything goes other than their expectations they get angry and just shot them before thinking through the consequences.
Its kinda obvious, but deeply counter-intuitive—I mean its a situation where stupidity is decisive advantage!
Not quite stupidity—irrationality. And it is well-known that (credible) irrationality can be a big advantage in negotiations and other game theory scenarios. Essentially, if I’m irrational then you cannot simulate me accurately and cannot predict what I will do which means that your risk aversion pushes you towards safe choices which limit your downside at the cost of your upside. And if it’s a zero-sum game, I get this upside.
Of course, I need to be credible in showing my irrationality.
The reason such a strategy is not used more often is because (a) often there is the option to walk away which many people do when faced with an irrational counterparty; and (b) when two irrational counterparties meet, bad things happen :-)
There are instances where (arguably) irrationality confers a big game-theoretic advantage even though you’re predictable.
For instance, suppose you’re leading a nuclear superpower. If you can make it credibly clear that you really truly would be happy to launch World War Three if the other guys don’t back down, then they probably will. Not because they can’t predict your actions, but because they can.
In this sort of case it’s either debatable whether it’s really irrationality, or debatable whether it’s really a game-theoretic advantage. If you can really be sure that the other guys will back down, then maybe it’s not irrationality because you never have to blow up the world. If you can’t, then maybe you don’t have a game-theoretic advantage after all because if you play this game often enough then the other guys call your bluff, you push the big red button, and everyone dies.
[EDITED to add: I think this sort of case is nearer to the example discussed upthread than the sort where unpredictability is key.]
That’s more like sheer bloodymindedness X-) not irrationality.
Yeah, it’s called the game of chicken and that’s a slightly different thing.
I think you mean that rational agents cannot be successfully blackmailed by others agents that for which it is common knowledge that the other agents can simulate them accurately and will only use blackmail if they predict it to be successful. All of this of course in the absence of mitigating circumstances (including for example the theoretical likelihood of other agents that reward you for counterfactualy giving into blackmail under these circumstances).
That doesn’t seem true. How can the victim know for sure that the blackmailer is simulating them accurately or being rational?
Suppose you get mugged in an alley by random thugs. Which of these outcomes seems most likely:
You give them the money, they leave.
You lecture them about counterfactual reasoning, they leave.
You lecture them about counterfactual reasoning, they stab you.
Any agent capable of appearing irrational to a rational agent can blackmail that rational agent. This decreases the probability of agents which appear irrational being irrational, but not necessarily to the point that you can dismiss them.
Why not? Are rational agents generally immune to blackmail, or is it not strictly advantageous to be able to simulate another agent accurately?
I think it basically comes to, if the rational agent recognizes that the rational thing to do is to NOT buckle under blackmail, regardless of what the rational agent simulating them threatens, then the blackmailer’s simulation of the blackmailee will also not respond to that pressure, and so it’s pointless to go to the effort of pressuring them in the first place. However, if the blackmailer is irrational, their simulation of the blackmailee will be irrational, and thus they will carry through with the threat. This means that the blackmailee’s simulation of the blackmailer as rational is itself inaccurate, as the simulation does not correspond to reality. If the blackmailee is irrational, their simulation of the blackmailer will be irrational, and thus they will concede to their demands. Yet, each party acts as if their simulation of the other was correct, until actual, photon-transmitted information about the world can impress itself into their cognitive function. So, no-one gets what they want. The best choice for a rational agent here is just to ignore the good professor. On the other hand, you can’t argue with results. And there’s a simulation of Quirrel s-quirreled away in your brain, whispering.
It looks like you are saying that both rational and irrational agents model competitors as behaving in the same way they do.
Is that why you think that an irrational simulation of a rational agent must be wrong, and why a rational simulation of an irrational agent must be wrong? I suggest that an irrational agent can correctly model even a perfectly rational one.
sorry
Worryingly, this sounds like a good deal—getting skills for faster power/control increase, keeping continuity of consciousness, and increasing the odds of escaping from this reality into the next higher one...
Possibly valuable to talk with Robin Hanson and I for revision to HPMOR!Quirrell decision procedures from the source?
I would give a finger from my wand hand for such an opportunity.
I bid two.
This whole comment thread is utterly delightful.