Your economics are wrong for a few reasons. Let’s grant the hypothetical where all humans supply homogeneous labor at a uniform wage.
If AI is slightly cheaper than humans, what happen is that wages fall slightly. At the new, lower wages, there is more demand for labor (and more humans drop out of the labor force). At the same time, capital costs are bid up slightly. Eventually the price of AI and human labor is equal, and the quantity demanded is equal to the quantity supplied.
At the same time, you are increasing demand for labor to build the AI (right now labor is ultimately the main input to building all the stuff that goes in datacenters). If the social value of the AI is near zero, then the net increase in demand is almost the same as the net increase in supply. Lowering wages and increasing capital costs doesn’t offset the benefits of extra productive capacity, it just shifts value from laborers to capitalists.
The real fiscal issue in this scenario is that you are shifting output from labor to capital, and the tax rate on capital is lower than the tax on labor. (Moreover as you automate the economy there are further corporate reorganizations that would drive effect tax rates well below the on-paper capital gains rate). You’re doing that at the same time that you are potentially increasing spending, which is tough unless you are willing to adjust the tax code.
I’m inclined to agree with other commenters though that none of this seems like the most important issue. The fiscal issues can be overcome if the state cares, and my best guess is that growth will accelerate enough that it would be OK even if there was no political change.
People should have much bigger concerns about being completely materially disempowered: (i) the state may not continue to support them, either because they are politically disempowered or because the state itself is disempowered, and (ii) even if they are able to survive they will have no say over what the world looks like and that sucks in its own way.
Once you are talking about scaling to arbitrary inference costs, I think the relevant notion of time horizon is closer to: “For what T could you solve this task using an arbitrary supply of humans each of whom only works on the project for T hours?”
On this framing I think that lots of easily-checkable tasks have much shorter horizons than “amount of time they would take a human to do.” That is, I think that you could successfully solve them by delegating to a giant team of low-context humans each of whom makes marginal improvements, and the work from those humans would look quite similar to the AI work.
This is particularly true for reimplementation tasks with predefined tests + a reference implementation. In that setting a human can pick a specific failing test, use the reference implementation to understand the desired behavior, and then implement it while using tests to make sure they don’t break anything else. For the projects you are describing it seems plausible that a human with a few days could use that workflow to reliably make forward progress and get the project done in a moderate number of steps.
(I would expect a giant human team using that methodology to make a huge number of mistakes for anything not covered by the tests, and to write code that is low quality and hard to maintain. But I think we’re currently seeing roughly that pattern of failures from AI reimplementation.)
It’s also very true of vulnerability discovery, since different humans can look in different places and then iteratively zoom in on whatever leads are most promising.
So I’m not sure how much of the phenomenon you’re observe is “there is a way longer horizon for these tasks” rather than “a more careful definition of horizon is more important for these tasks.” Probably some of both, but It seems quite possible that for the tasks you are describing the horizon length is really more like a few days or a week than months or a year.
I think it’s helpful to try to pull these effects apart, particularly when reasoning about the likely effects of continuing RL schlep: I think that you are seeing good performance on these tasks partly because they are easy to decompose and delegate, and not entirely because AI developers are able to do RL on them. For the very “long horizon” tasks you cite, I think that’s the dominant effect. So I wouldn’t expect performance to be as impressive on tasks where it would be hard to delegate them to a giant team of low-context humans.
I still agree that recent public development seem like evidence for shorter timelines, and I’m not saying my bottom line numbers differ much from yours. Even a more conservative extrapolation of time horizons would suggest more like 2-4 years until you can strictly outperform humans for virtual projects (with human parity somewhat before that).