I think asking questions and coordination bottlenecks seem to apply equally to new human employees too.
core
I don’t believe it’s impossible, and I know that differences do exist, which is why I introduced a fictional character who disagrees with me to represent that view.
What the differences are, and how wide they are, and how likely or unlikely it is for us to make progress on those specific differences—these are the interesting question, and I feel like this post didn’t go into them enough, besides gesturing at the author’s intuition about context windows, which I don’t necessarily share.
To get me to share that intuition, I’d take (e.g.) an example of a software engineer project where breaking down tasks or goals into discrete 1M token context windows is too hard of a bottleneck, even for a hypothetical much smarter claude-mythos-8 model.
I agree on 1 and 2. My comment was to suggest that I think 3 has lots of the interesting detail in it, and is under-discussed in the original.
Why does 10-100M feel sufficient to you, where 1M does not?
My weakly held intuition is that, as a general intelligence myself, if I had my long-term memory reset every day, with the ability to store ‘only’ 1 million words of notes between days, I’d still be able to make a lot of progress on large, ambiguous tasks, like my current software engineering job.
I think this is an illustrating scenario:
Let’s say I’ve got a team of a dozen experienced software engineers, a product, and a codebase. Then, all but one of my experienced SWEs leaves the company. I’d like to hire replacements, ask the remaining employee to help bring them up to speed, and have those replacements take their time learning the system.
Then, someone sends me this post to explain to me why this is an impossible task. They say I’d never be able to get my remaining employee to teach their unknown knowns, that the new employees would never have access to product decisions from old in-person meetings, that developers often complain aboutcoding agentsnew hires just assuming things and going with these assumptions (and the alternative is to overload the remaining employee with requests for clarification), that I’d be causing cognitive decline in the last senior SWE, since they now ask the new employees to do the hands-on work.Is the person who sent me the article wrong? If so, why?
If it’s ‘the limited size of context windows, and how context windows are filled’ - do you have, say, a guess for how much larger the context window would have to be to mitigate that difference?
Scaling laws imply that we need exponentially more compute to achieve linear AI performance improvements
In the usual scaling law definitions of performance, it’s polynomially more compute, not exponentially more.
I think it’s worth trying to be much more specific about the mechanism and timeline of this “default outcome”. Does it mean “imminently, standards of living will go down, as we run out of resources and overpopulate and compete”? That seems unlikely to me. Acquiring new resources, or using those resources more efficiently to make our lives better, happens on much shorter time scales than making new humans.
Does it mean “in a few hundred thousand years, space mormons will overpopulate the galaxy and run out of resources, because they had a more effective meme to encourage reproduction”? That is significantly less concerning of a problem to me, and it presupposes that the space mormons can’t intelligently change course in order to protect their standard of living. Which brings us to another question:
Does the “nihilistic optimization” process under scarce resources bottleneck the well-being of general intelligences? Don’t we think that conscious cooperation lets us forecast, understand, and communicate to solve coordination problems? Hasn’t it done so over and over in the past, even when resources have been scarce?
AI x-risk can be expressed as a darwinian competition / gradual disempowerment problem, as can the risks of a permanent authoritarian state (like what chickens live under), but I don’t think it’s the best framing for those issues. e.g. chickens’ standard of living could be significantly improved with the right political coalition, and that coalition won’t be prevented from forming by fundamental darwinian aspects of reality.
Proof-of-stake and proof-of-work are both often implemented cryptographically, because in cryptographic domains, verification can be easier than generation. I think another option is to apply that principle to the problem directly, where possible. The best example: formalizing a math theorem in lean means it’s much easier to verify than it is to read. CS and ML papers can sometimes (if making software is now much easier) be implemented into toy examples, sized appropriately for reviewers (and reviewers’ AI instances) to check for hardcoding or cheating. Tough data analysis domains could be turned into raw data from a reputable source plus a minimal, non-steering prompt for reviewers’ models to re-discover what the author wanted to publish. Eventually, I think automated or simulated biology/chemistry labs could be funded to attempt to reproduce new papers’ results, putting their reputations on the line.
This is not sufficient to protect against motivated bullshitters, especially not in all domains, and I think in the near future institutions may be forced to fall back more to reputation. But I think it’s workable. Academic proof-of-work is only, at best, a proxy to avoid being overloaded with verification work, but I think we can make verification easier in other ways.
I feel that intuitively as well—but the hard question for me, is, how do I square “the maximal utility of existence is related to diversity and uniqueness” with “the utility of a probability distribution is the probability times the outcome, even when the existence(s) within that distribution aren’t diverse or unique”?
Either way, see source:
Because the two individuals created by transporter duplication are identical to the person who existed prior to beaming, the term “transporter clone” could apply to either of them
(I am confused about) Non-linear utilitarian scaling
Was so surprised that nobody’s raised this point that I made an account just to make it.
Large organizations and highly placed individuals who solve coordination problems can make lots of money for reasons other than market efficiency. Most obviously, because they are in the best position to be rent-seeking. Coordinators take advantage of network effects to make themselves indispensable, then have every incentive to enshittify—to use their position as a coordinator to extract rent and dictate what activities can be coordinated (picking up rideshare customers) and which cannot (coordinating contract negotiation, selling products the coordinator disapproves of, etc).
This isn’t so relevant to your point about freelancers, and I agree with the general point that coordination is enormously valuable to society. But coordination would also be well-paid and ‘taut’ in a world where coordination incentivizes net-negative economic efficiency via rent extraction.
LLMs are able to curate their notes, though? Compaction of conversations and multi-agent hierarchies already work. They might not work well enough according to some standard of performance you have in mind, but it’s an incorrect comparison to say that we (the LLM and I) can’t curate notes.