I agree, though if we’re defining rationality as a preference for better methods, I think we ought to further disambiguate between “a decision theory that will dissolve apparent conflicts between what we currently want our future selves to do and what those future selves actually want to do” and “practical strategies for aligning our future incentives with our current ones”
Suppose someone tells you that they’ll offer you $100 tomorrow and $10,000 today if you make a good-faith effort to prevent yourself from accepting the $100 tomorrow. The best outcome would be to make a genuine attempt to disincentivize yourself from accepting the money tomorrow, but fail and accept the money anyway- however, you can’t actually try and make that happen without violating the terms of the deal.
if your effort to constrain your future self on day one does fail, I don’t think there’s a reasonable decision theory that would argue you should reject the money anyway. On day one, you’re being paid to temporarily adopt preferences misaligned with your preferences on day two. You can try to make that change in preferences permanent, or to build an incentive structure to enforce that preference, or maybe even strike an acausal bargain with your day two self, but if all of that fails, you ought to go ahead and accept the $100.
I think coordination problems are a lot like that. They reward you for adopting preferences genuinely at odds with those you may have later on. And what’s rational according to one set of preferences will be irrational according to another.
In the spirit of posting more on-the-ground impressions of capability: in my fairly simple front-end coding job, I’ve gone in the past year from writing maybe 50% of my code with AI to maybe 90%.
My job the past couple of months has been this: attending meetings to work out project requirements, breaking those requirements into a more specific sequence of tasks for the AI- often just three or four prompts with a couple of paragraphs of explanation each- then running through those in Cursor, reviewing the changes and making usually pretty minor edits, then testing- which almost never reveals errors introduced by the AI itself in recent weeks- and finally pushing out the code to repos.
Most of the edits I make have to do with the models’ reluctance to delete code- so, for example, if a block of code in function A needs to be moved into its own function so that functions B and C can call it, the AI will often just repeat the code block in B and C so that it doesn’t have to delete anything in A. It also sometimes comes up with strange excuses to avoid deleting code that’s become superfluous.
The models also occasionally have an issue where they’ll add fallbacks to prevent functions from returning an error even when they really should return an error, such as when a critical API call returns bad data.
So, in a way, the main bottleneck to the AI doing everything one-shot at this point seems to be alignment rather than capability- the models were trained to avoid errors and avoid deleting code, and they care more about those than producing good codebases. Though, that said, these issues almost never actually produce bugs, and dealing with them is arguably more stylistic than functional.
In my department, I think all of the other developers are using AI in the same way- judging by how the style of the code they’ve been deploying has changed recently- but nobody talks about it. It’s treated almost like an embarrassing open secret, like people watching YouTube videos while on the clock, and I think everyone’s afraid that if the project managers ever get a clear picture of how much the developers are acting like PMs for AI, the business will start cutting jobs.