James Payor

Karma: 1,359

I think about AI alignment; send help.

I’m also on twitter. More links on my homepage payor.io.

James Payor 3 Jul 2026 1:48 UTC
LW: 5 AF: 2
0
AF
in reply to: James Payor’s comment on: Payorian cooperation is easy with Kripke frames
Man, okay I’m now wondering if the following works for a PrudentBot implementation? Calling our PrudentBot and our opponent in the encounter, let or something that grows a bit bigger than :
- Let be the length of the shortest proof that
- Let be the length of the shortest proof that
- If and then cooperate, otherwise defect
In symbols,

This is derived from the limit of an algorithm that searches for increasingly long proofs that we can obtain the opponent’s cooperation one way or another, with some budget for looking harder for a (defect, cooperate) proof should we find a (cooperate, cooperate) proof first.

The point of looking harder for the (defect, cooperate) option is that you don’t want to cooperate with a CooperateBot (or some complicated code version of CooperateBot) just because the mutual-cooperation proof was the easiest to find. And the point of having some budget that is not unlimited for searching harder for the (defect, cooperate) outcome is that once your opponent can see that you can find a (cooperate, cooperate) proof, they then get an upper limit on how much further they need to search to check you won’t throw that away in favor of a (defect, cooperate) proof.
And fwiw I haven’t properly vetted this idea yet, could easily break or be unsatisfying in a number of ways!

James Payor 3 Jul 2026 1:33 UTC
LW: 14 AF: 6
0
AF
in reply to: transhumanist_atom_understander’s comment on: Payorian cooperation is easy with Kripke frames
Writing to say that I quite like and agree with the perspective here! I have been continuing to do my own puzzling about what would be a more wholesome instantiation would be than the “exists finite length proof” modality, as part of a project to ground things out better, though even absent that I feel confident that the intuition stands up and the processes above you’re outlining in Kripke terms should be realizable.

I also have two rambly thoughts to offer here, which I’ve included below if they’re of interest:
1. Fwiw in the first place I arrived at “” by thinking about what the algorithm steps should be in the modelling-my-prisoners-dilemma-opponent. I was very bothered by the part where we get to “I notice they are checking ” and then fail to shorcut! That’s a quantity directly under our control, why not counterfact on it? I was then kinda surprised to find this directly encodes in modal provability logic with reasonable behaviour even though “” feels very much like a poor imitation of a counterfactual.
  
  ...I thought that then obviously we should be able do the “real” PrudentBot. The naive encoding for that would be like “defect if , otherwise cooperate if , otherwise defect”. (This is a little different than your setup I think. I’m trying to ask “does defecting lead to (defect, cooperate)? if so defect; otherwise if cooperating leads to (cooperate, cooperate) then cooperate; otherwise defect”.) But yep this runs directly afoul of the consistency problem, inner “counterfactuals” do nothing to save it from that problem. And to be clear the problem is that it’s hard for itself and other agents to model that it won’t find some clever reason to activate the defect branch. And the OG PrudentBot is… uh… carefully skirting around this. I still expect there’s a nice answer waiting here; provability should be expressive enough to capture the limit of a relevant algorithm in a clean way. The translation is still eluding me.
2. When I think about the ontology that underlies the way the boxes work in the provability bot setting, I get a picture like the following:
  - Every box like in refers to one day earlier in time
  - From the perspective at the beginning of time, everything is “true”, i.e. holds
  - Things yesterday’s perspective considers true are must also be considered true on the day before by internal necessitation (ala ). the contrapositive () tells us that anything that becomes false stays false forever.
  - Lob’s therom says that , which I read as “the things that our perspective says must remain true are exactly the things that our perspective considers true”, so what ends up being provable/”considered true” are exactly the things that we can see must remain true
  
  So this suggests a picture where the modal agents when interacting with each other are popping open a dimension of “discursive time” (thanks Sam Eisenstat for that term) along which to have a dialogue about what actions they’ll take. The particular chosen dimension is this peculiar provability box, with the property that on a given day you “might” have , or just one or the other, or none. Then the agents are set up with rules like “I will cooperate today if you cooperated with me yesterday” (FairBot) or “I will cooperate today if you didn’t cooperate with DefectBot yesterday and, if yesterday wasn’t the beginning of time, you cooperated with me” (OG PrudentBot).
  
  Then the stability machinery runs to determine the actual output. And while my story is a bit incomplete it still clarifies some of the mechanics for me.
  
  In likelihood though you may recognize the stability-over-time picture as performing the same/isomorphic movements as the Kripke analysis! Which yeah I was not familiar with Kripke frames at the time. But still.
  
  My make takeaway from this is that it’s cool that the provability agents are able to pop open a dimension along which they can discuss and potentially bargain, without needing a physical clock, with its own pecularial lattice dynamics (you keep shrinking the space until it’s stable) that are handy in some ways but abysmal in others lol. So there’s a wide open question about what other alternatives there are, do those alternatives compose with each other (like does your alternative discursive time behave nicely if put up against a proof-search algorithm?), and what those alternatives look like described in logic.

James Payor 23 Jun 2026 6:15 UTC
2 points
0
in reply to: Sharada Mohanty’s comment on: Announcing the ARC White-Box Estimation Challenge
Thank you for this timing fix btw! It’s been a great quality of life improvement :)

James Payor 19 Jun 2026 19:42 UTC
2 points
0
in reply to: Sharada Mohanty’s comment on: Announcing the ARC White-Box Estimation Challenge
Thanks for the updates, appreciate it!
Fwiw on numy, I don’t find particular need for it; it is convenient for some precomputation or numeric helpers, nothing that can’t be done with lists, and we should use flopscope for heavy lifting anyway.

It might help to have a pared-down uv env file that more closely tracks the grader, with instructions for validation / local smoke test to use that? But again not a big deal, grader smoke test reveals any compatibility issues pretty quickly.

James Payor 17 Jun 2026 16:51 UTC
LW: 2 AF: 1
0
AF
in reply to: paulfchristiano’s comment on: Announcing the ARC White-Box Estimation Challenge
Great, thank you! Timing seems updated, but also fwiw flops.symmetrize is still not there: https://www.aicrowd.com/challenges/arc-white-box-estimation-challenge-2026/submissions/310914

James Payor 12 Jun 2026 15:47 UTC
LW: 2 AF: 1
0
AF
in reply to: James Payor’s comment on: Announcing the ARC White-Box Estimation Challenge
Okay another thing (let me know if there’s a better spot for reports like this?), the “symmetrize” functionality seems unavailable on the grader. For that matter, the grader doesn’t have stock numpy and some other deps that are included in the starterkit env. See https://www.aicrowd.com/challenges/arc-white-box-estimation-challenge-2026/submissions/310565 re symmetrize not being there.

James Payor 11 Jun 2026 6:47 UTC
LW: 2 AF: 1
1
AF
in reply to: James Payor’s comment on: Announcing the ARC White-Box Estimation Challenge
Another point coming up: it seems like remote-dispatch (and perhaps array-copying) are currently being billed as “excess wall time” rather than flopscope time. This shows up when one dispatches ops with a lot of state, which is getting penalized excessively (I think).

If I have this right, I’d vote for flopscope time to be counted on the op boundaries (which would absorb time spent dispatching to the remote) rather than time the remote actually spends on array crunching.

(Also if the arrays are being copied back and forth that seems a bit excessive and causing the grader to be way slower than necessary! I’d try to have the remote endpoint have durable custody of arrays within a session, and only transfer data on creation and when-demanded.)

EDIT: This issue I just found seems related and probably complementary.

James Payor 4 Jun 2026 17:44 UTC
LW: 2 AF: 1
0
AF
on: Announcing the ARC White-Box Estimation Challenge
It looks to me that the pre-grading “smoke-test” has a flop cap that is well below 6.8e10, it’s perhaps at 1e10? So for me submissions at 0.1x the cap are getting through but larger flop counts are failing.

(Update: it’s now fixed, thanks!)

James Payor 22 May 2026 4:34 UTC
2 points
0
in reply to: leogao’s comment on: leogao’s Shortform
This helps me appreciate the mood of where you are coming from thanks! But uh I have objections also, mostly due to our spot in the thread.
I would second CronoDas’ point that the mechanics of change aren’t quite that simple. And I’d like to complain that this is not an example of a thing that is helped by people taking actions they don’t feel hope in!
I acknowledge than the secret police setup seems like it does well at bringing in the “you can’t communicate and build plans together” aspect that “coordination problem”/game-theory seems to typically evoke, I’ll note though that you still have a lot of communication/observation channels (including costly ones like protesting and being taken away or killed).
More importantly it seems like the robust way out of the situation is to try to build more infrastructure for being able to act with a coalition of peers in a constructive manner. Game theory as typically thrown around seems a poor model for this imo.

James Payor 18 May 2026 15:16 UTC
5 points
3
in reply to: lc’s comment on: leogao’s Shortform
...what sort of “coordination problems” does one “solve” by doing things you don’t have hope in? I really don’t get it and am perplexed. This photo is swellingly full of hope, and presumably we got there through people that had hope in their actions. Perhaps there’s detail in the history you’re referencing that’s going over my head.

James Payor 17 May 2026 19:33 UTC
54 points
−1
in reply to: leogao’s comment on: leogao’s Shortform
I do find it poetic, but in seriousness I think if folks don’t actually feel hopeful about what they’re doing then they should do something else—leave the work / research direction / engineering / comms / whatnot to whoever actually feels hope about it...
To elaborate, the thing that’s poetic for me about “our hopeless cause” is because I have hope that is not cleanly legible to the outside, easy to write off as “hopeless”. And it’s important to stay in tune with your own knowings about this stuff. I think there are very deleterious effects from throwing energy into things one doesn’t have hope in.

(...And to elaborate further, mostly I think the bad stuff happens by lending support to corrupt things. And imo being pushed to work on X while you lack hope in X is a solid flag of corruption.)

James Payor 12 Mar 2026 9:25 UTC
2 points
0
in reply to: programjames’s comment on: Payorian cooperation is easy with Kripke frames
Thanks, this is a solid point that choosing the defect-cooperate outcome should really be based on constructive knowledge, and the fallback shouldn’t be “definitely defect!” if you can’t obtain that.
So that makes me more think that is the sort of term that is legit in a PrudentBot, since “we can’t find a proof that if we provably defect then they cooperate” is what we actually wanted. And sure the naive translation into provability logic won’t like this, and I think this means we should look for a nicer translation, probably starting with getting a bounded proof-search bot correct.
I’m wondering about what the SharkBot term is doing, compared to my (probably broken) idea of using . If I converting this one to diamonds I get:
- (“it’s possible that I necessarily defect while my opponent defects”)
- (“it’s possible that my opponent defects while it is not possible that I cooperate”)
This seems potentially more restrictive than it needs to be? Well, restrictive can be good here, since what we are targeting is defecting as often as we can while guaranteeing a defect-cooperate outcome. But also as pointed out elsethread this PrudentBot approach may have trouble cooperating with itself.
Anyway this leaves me more ready to unpack the SharkBot condition, which is . This is a bit different than what I landed at, reading “it’s possible that my opponent defects or it is possible that I cooperate”.
Converting the SharkBot condition back to squares I get:
Ah okay, so I think this is caching out the version that’s like “I defect if it is stable that my opponent cooperates while I provably defect”. (I think this may be equivalent in strength to my condition under Lob’s theorem?)
Btw on the theory that “” was a bit of a mistake relative to “” as a basic approach, we could simplify a bunch of the language above to “it’s possible that my opponent defects while I defect” (my one) or “it’s possible that my opponent defects or I defect” (SharkBot). I’m taking this as some evidence that it would be better to go with the forms.

James Payor 12 Mar 2026 8:34 UTC
LW: 6 AF: 3
0
AF
in reply to: transhumanist_atom_understander’s comment on: Payorian cooperation is easy with Kripke frames
(Thanks for the post btw! My comments on this comment below.)
This PrudentBot def does feel in-spirit to me. I also agree with your analysis that it doesn’t fall for the pitfall of “I assumed in my hypothetical that you thought I would unconditionally cooperate with you, but then no fair you defected on me in my hypothetical! I defect!”
For instance this happens if you try, as I was inclined to try, . This quickly becomes “false” i.e. “defect” if you have a lying around.
With your definition, we have , and assuming we have a around I think that simplifies to . (I’m not sure I have that right!)
That’s… better behaved! It will run into the same Lobian difficulties as usual if we directly translate this as a provability bot. So imo we still need some better answer as to how we should interpret these as programs that more faithfully represents what we want in terms of their reasoning.
Also fwiw, I run with the model that the simulated opponent has access to both a description of your full behaviour and a shortcut proof that you cooperate with just them. (In the cooperation branch hypothetical.) So when your simulations have the cooperation proof in scope, you’re just saying that the behaviour is equivalent to CooperateBot in this very matchup. And it happens that this is often a sufficient condition for cooperation. (And if your opponent runs you on a different bot like DefectBot, the cooperation proof doesn’t apply there.)
So there’s some question here like “well what’s the point of simulating under the assumption of the narrow cooperation proof if we’re keeping the full description as a fallback?”
Part of what your post points at is an answer which is that, well, it works out more cleanly if you do! Specifically that reasoning is easier if, for the purposes of deciding to cooperate, you assume the decision is made and check that it’s stable/good.
As for why that is the case, I have a more philosophical take that it’s about encoding choice, and the better mechanics we see are downstream of encoding this better. I’ll go into more detail in this collapsed section (you may well wish to gather your own thoughts first).
on encoding choice
We can think of the bot as containing some mechanisms that lead to certain actions, with conditions for firing.
The FairBot variant described in this post is a simple instance, with a default action of defection and a single mechanism that leads to coooperation when it fires.
I have some idea that the choice-y mechanisms are of the form “if I activate, will the outcome satisfy some property? if so I activate”.
So for a given mechanism we can have a widget with a goal target . We want a way of saying “if were to fire this would lead to ” as our condition for . And there’s a question of how to encode this in a program or in our modal logic setting.
Straight up trying won’t work, since and you won’t be able to form a statement like this. This was omitting the part where is using some model of consequences to make its choice.
So next up is , in which fires if our modelling shows that if fires this will lead to . You can in fact show that here!
The other option is ; this doesn’t assume ” fires” but rather “someone has a model that fires”, and is actually a bit more brittle. The reason this form can be preferable in the modal logic bots is that you are sidestepping the complexity of showing that you will need to check your opponent can tell that has fired. But I think this may be a bit of a hack relative to the version.
My reasoning here suggests for that if you have multiple conditions for action, you may factor out // that are each self-referential, and build your high level agent as trying each in sequence. I haven’t fully cached out yet what this looks like, and whether it e.g. gives you a meaningfully different PrudentBot.
I note that this picture doesn’t provide a convenient answer for the problem where in provability land your action “defect” may cause someone to need to prove . But as I guess I keep mentioning I view this as a bug in the provability encoding.

James Payor 10 Mar 2026 14:39 UTC
LW: 11 AF: 3
0
AF
in reply to: joseph_c’s comment on: Payorian cooperation is easy with Kripke frames
(EDIT: focusing here on PrudentBot. Fwiw I like the idea you have in SharkBot use the weaker diamond when evaluating whether to defect! I’m less well equipped to analyze it at this time, still just grappling with how to handle diamonds at all in my usual proof search ontology.)

I’m less familiar with the diamond modality; but if I can correctly translate “exists some world that satisfies X” as “not every world satisfies not-X”, we get the following:
This will run into some trouble in the unbounded-proof-length provability logic model; since generally you can’t prove in some fixed length that there will be no proofs of any length of , on pain of unsoundness via a Lobian fixed point. So the version in the paper runs (I believe) as follows:
This is handling the part where might be FairBot and require “proving that there are no proofs” by checking the defection part under the assumption of one level of soundness. (I remember this gotcha because when I try to write down the PrudentBot that lives in my heart I end up with something closer to your version, but then the paper was doing this carefully different thing...)
Insofar as it’s really the proof lengths fighting that is the problem, I think something like the following might just work instead:
Anyway this is all to say that:
1. I agree that the original PrudentBot definition seems hacky, both in the DefectBot aspect (we can replace this with “doesn’t cooperate with every bot” if that helps), and also in the assuming exactly one level of soundness regard. It is ofc a nice POC given that it does manage to cooperate with itself with this machinery.
2. In terms of caching the diamond definitions out into provability logic some care seems to be required, and I’m interested in a well-behaved translation. The “not box not” translation doesn’t actually stay very faithful to the intent.
3. It might be that messing around with proof bounds is sufficient to get there, so you don’t end up with an inner proof length being able to diagonalize an outer proof length.

James Payor 28 Feb 2026 21:23 UTC
2 points
2
in reply to: lc’s comment on: Sam Altman says OpenAI shares Anthropic’s red lines in Pentagon fight
Yep, on my read no supposed “redlines” are not actually in the contract language they have shared, e.g. consider whether this part in fact names a “redline”:
https://x.com/i/status/2027846481021980914

James Payor 1 Feb 2026 14:34 UTC
4 points
2
in reply to: Buck’s comment on: Buck’s Shortform
What interesting things do y’all think are up with AI lab politics these days? Also why is everyone (or just many people in these circle) going to Anthropic now?
Any changes in how things seem for control plans based on vibes and awareness present in more recent models? (GPT-5 series may not count here; I’m mostly interested in visiblity on the next generation that are coming, of which I think Opus 4.5 is a preview but I’m fairly unsure.)
Anything generally striking about how things look in the landscape and models versus a year ago?

James Payor 31 Jan 2026 0:43 UTC
4 points
0
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
I would contest the frame here. In particular I think it won’t hold up because things won’t stay as capital-bound as they are now, and that seriously messes with the continuity required for today’s investments to maintain their relative portion of the pie. What do you think of this part?
(EDIT: Okay I think you are referencing this sort of thing with “most people can’t invest in assets that grow at the average rate”, but I still take issue with some picture in which everything is apportioned to assets that grow or something like that.)
To expand: I think today’s capital earns its gains mostly because it is a required input, which thereby gives it a lot of negotiating power. And I think this falls off pretty sharply in time, in the limit of technological development.
(I should say, so long as it remains the case that we need lots of coordinated work to build compute engines to run intelligence, “capital” seems to remain meaningful in the old ways. But a lot of things seem possible with a nanofactory and the right information about how to use it, and at a point like that eld-capital isn’t a relevant bottleneck.)

And while we can imagine some way that neo-capital continues to project its force in a profit-grabbing way in the future, I think the mechanism is pretty different than today, and probably has to involve more literal force, and is unlikely to have solid continuity with today-capital.

James Payor 3 Jan 2026 20:56 UTC
11 points
6
on: What’s going on at CFAR? (Updates and Fundraiser)
I would like to thank you Anna for this write-up! I love the thoughtfulness and ideas on institutional design, relationship to donors and EA, and others. Much of this post was quite viscerally relaxing for me to read, in a “finally it feels possible to jointly understand X and Y and Z” kind of way, which I’m quite glad for.

James Payor 3 Jan 2026 19:43 UTC
120 points
67
on: The Weirdness of Dating/Mating: Deep Nonconsent Preference
On the overall note of the post:
I claim that most women have a “deep” preference for nonconsent in dating/mating. It’s not just a kink; from the first approach to a date to sex, women typically want to not have to consent to what’s happening.

...I would like to say that, in my capacity as just-some-guy, this has not been my lived experience interacting with women, and I disagree with a lot of the framing in this post. (I don’t mean to contest what you’ve experienced; I would interpret these events pretty differently though.)
I’m writing this mostly because I’m finding it horrifying that it appears the LessWrong consensus reality is treating “nonconsent” stuff as a cornerstone of how it views women. This somehow seems to increasingly to be the case, seems wrong to me, and contributes to the erasure of some things I consider important.

James Payor 19 May 2025 3:15 UTC
2 points
0
in reply to: RogerDearnaley’s comment on: Working through a small tiling result
Meta note: Thanks for your comment! I failed to reply to this for a number of days, since I was confused about how to do that in the context of this post. Still though I think it’s relevant about probabilistic reasoning, and I’ve now offered my thoughts in the other replies.