nowl answers Can a pre-commitment to not give in to blackmail be “countered” by a pre-commitment to ignore such pre-commitments?

nowl 6 Jul 2025 10:06 UTC

2 points

I’ve thought about this before too, and I no longer feel confused about it. It helps to reduce this into a decision problem. The decision problem could ‘be about’ programs deciding anything, in principle; it doesn’t need to be ‘agents deciding whether to blackmail’.

I’ll show decision structures symmetric to your examples, then give some more examples that might help.

Language of post’s examples	Language for decision problems
Crime boss Mayor	Program C Program M
Crime boss does not blackmail mayor Crime boss blackmails mayor	C outputs 0 C outputs 1
Mayor does not give in to blackmail Mayor gives in to blackmail	M outputs 0 M outputs 1

Your first example: M is a more advanced conditioner
C runs: if [M outputs 1 if C outputs 1], output 1; else, output 0
M runs: if C runs "If [M outputs 1 if C outputs 1], output 1; else, output 0", output 0; else, <doesn't occur, unspecified>
Outcome: Both output 0

Your second example
C runs: output 1^[1]
M runs: <unspecified>
Outcome: unspecified

When put like this, it seems clear to me that there’s no paradox here.

Below are examples not from the post. The last one where both try to condition is most interesting.

3. C is commit-rock^[2], M is conditioner
C runs: output 1
M runs: if C runs "If [M outputs 1 if C outputs 1], output 1; else, output 0", output 0; else, output 1
Outcome: both output 1

4. Both are commit-rocks
C runs: output 1
M runs: output 0
Outcome: C outputs 1, M outputs 0

5. Both condition
C runs: run M. if M outputs 1 when C outputs 1, output 1; else, output 0
M runs: run C. if C outputs 0 when M outputs 0, output 0; else, output 1
Outcome: The programs run the other recursively and never halt, as coded.

Again, there is no paradox here.

To directly answer the question in the title, I think a commitment “to not give into blackmail” and a commitment “to blackmail” are logically symmetric, because what a decision problem is about (what the 0s and 1s correspond to in real life) is arbitrary. (Also, separately, there is no “commitment” primitive)

^
I know in your second example you want the Crime boss’s decision to be conditional on the Mayor in some way, but it’s not specified how, so I’m going to just leave it like this with this footnote.
^
In some posts about decision dilemmas, the example is used of “a rock with the word defect written on it” to make it clear that the decision to defect was not conditional on the other player.

What links here?

nowl's comment on nowl’s Shortform by nowl (6 Jul 2025 12:06 UTC; 1 point)

Sappique 6 Jul 2025 10:59 UTC
1 point
0
Parent
Thanks, that’s a interesting way to think about pre-commitments.

However, I’m not sure if I understand what your conclusion is. Do you believe that actors can not protect themself from blackmail with pre-commitments?
- nowl 6 Jul 2025 11:06 UTC
  2 points
  0
  Parent
  Do you believe that actors can not protect themself from blackmail with pre-commitments?
  I don’t believe that. If I could prove that, I could also prove the opposite (i.e. replace ‘cannot’ with ‘can always’), because what a decision problem is about is arbitrary. The arbitrariness means any abstract solution has to be symmetric. In example 1, an actor protects themself from blackmail. We can also imagine an inverted example 1, where the more sophisticated conditioner instead represents the blackmailer.
  I think that what happens when both agents are advanced enough to fully understand this kind of problem is most similar to example 5. But in reality, they wouldn’t recursively simulate each other forever, because they’d think that would be a waste of resources. They’d have to make some choice eventually. They’d recognize that there is no asymmetric solution to the abstract problem, before making that choice. I don’t know what their choice would be.
  I can give a guess, with much less confidence than what I wrote about the logic. Given they’re both maximally advanced, they’d know they’ll perform similar reasoning; it’s similar to the prisoners-dillema-with-clone situation. They could converge to a compromise policy-about-blackmail-in-general for their values in their universe, if there are any such compromises available for their values in their universe. I’m finding it hard to predict what such a ‘compromise’ could be when they’re not on relatively equal footing, though, e.g. when one can blackmail the other, and the other can’t do it back. When they are on equal footing, e.g. have equal incentive to blackmail each other, maybe they would do this: “give each other the things the other wants, in cases where this increases our average value” (which is like normal acausal trade).
  After thinking about it more (38 minutes more, compared to when I first posted this comment. I’ve been heavily editing/expanding it), it does feel like a game of ‘mutually’ choosing where-they-end-up-in-the-logical-space, and not one of ‘committing’. Of course, to the extent the decisions are symmetric, they could choose to lock in “I commit to not give in to blackmail, you commit to make and follow through on blackmail”; they just both wouldn’t want that.
  I don’t quite know what else there is to do in that situation other than “symmetrically converge to the mid-point”. Even though I dislike where that leads in “unequal” cases like I described two paragraphs up (<the better-situated superintelligence makes half the blackmail, and the worse-situated superintelligence gives in every time>). Logic doesn’t care what I dislike. If this is true, I’ll just have to hope the side of good wins situationally and can prevent this from manifesting in cases it cares about.
  Disclaimer: the above is about two superintelligences in isolation, not humans.
  What links here?
  - nowl's comment on nowl’s Shortform by nowl (6 Jul 2025 12:06 UTC; 1 point)