Figure 3: A future project might take ~42 days of wall-clock time, with ~8 hours of agent work (not counting running the evals) and 1000 serial hours of human IC work, evals execution, and review.
From experience I’m still skeptical it won’t be a proxy in practice once you actually implement the scalar virtuosity score and fine-tune LLMs against it, but instead of belaboring my skepticism I’ll wait for your preliminary results. I’d be keen to see something roughly as thorough as this but probably a bit much if you’re the sole person working on this.
The most common tell. A fake profundity wrapped in a neat contrast: “We’re not a company, we’re a movement.”“It’s not just a tool, it’s a journey.” Humans use this sparingly; AI uses it compulsively
The Punchline Em-Dash
Every section feels like it’s waiting for a big reveal—until the reveal is obvious or hollow
The Three-Item List
AI loves the rhythm of threes: “clarity, precision, and impact.” It’s a pattern baked deep into training data and reinforced in feedback
Mirrored Metaphors & Faux Gravitas
“We don’t chase trends — trends chase us.” They sound like aphorisms, but they’re cosplay; form without experience
Adverbial Bloat
“Importantly,” “remarkably,” “fundamentally,” “clearly.” Empty intensifiers meant to simulate significance
Mechanical Rhythm
Sentences marching in lockstep, each about the same length. Humans sprawl, stumble, cut themselves off. AI taps its digital foot to a metronome
Hedged Authority
The “at its core,” “in many ways,” “arguably.” A way of sounding wise without taking a stand
Latin Sidebar Syndrome
AI’s compulsive use of e.g. and i.e. often comes with a giveaway glitch: the period-comma doublet (“e.g.,” or “i.e.,”). Almost no human would punctuate this way. Once you’ve seen the “.,” pattern, you can’t unsee it
Closing Tautologies
Ending sections with empty recaps: “This shows why innovation matters.” It looks like a conclusion, but it’s just filler.
Please note that these are not foolproof. After all, AI claims it wrote not only the Declaration of Independence, but also my early, pre-Chat GPT writing.
I expect LLM writing to move past these tics while still being weirdly slippery for a while for increasingly hard-to-legibilize reasons.
Let’s say I know how to build / train a human-level (more specifically, John von Neumann level) AGI. And let’s say that we (and/or the AGI itself) have already spent a few years[1] on making the algorithm work better and more efficiently.
Question: How much compute will it take to run this AGI?
(NB: I said “running” an AGI, not training / programming an AGI. I’ll talk a bit about “training compute” at the very end.)
Answer: I don’t know. But that doesn’t seem to be stopping me from writing this post. ¯\_(ツ)_/¯ My current feeling—which I can easily imagine changing after discussion (which is a major reason I’m writing this!)—seems to be:
75%: One current (Jan 2023) high-end retail gaming PC (with an Nvidia GeForce RTX 4090 GPU) will be enough (or more than enough) for human-level human-speed AGI,
85%: One future high-end retail gaming PC, that will on sale in a decade (2033)[2], will be enough for human-level AGI, at ≥20% human speed.
The first commander of Talpiot, Dan Sharon, a former IDF paratrooper, recruited his friend, Felix Dothan, who had just completed his PhD dissertation at Hebrew University on the topic of “the development of thinking and how one could improve his or her own thinking.” Dothan’s aim for Talpiot wasn’t merely to confer technical knowledge or select the most intelligent; it aspired to teach people how to think and learn fast.
Together they wrote a memo that describes what they were looking for from recruits: “We need applicants with a high IQ. We are looking at the top 5 percent when it comes to intelligence, creative ability, the ability to focus, stable and pleasant personalities.” Furthermore, applicants must have “dedication to their homeland and the strong will to survive in the unit.” They wanted the smartest men (and in later years women) they could find at the age when they still believed anything is possible. But they wanted more than raw intelligence which created a difficult selection problem.
Dothan and others began working on a selection process that would isolate candidates who fit their criteria. Talpiot launched in 1979 and was initially designed for a cohort of 25 people selected from a pool of up to 10,000 test takers that would complete a bachelor’s degree in physics and mathematics—computer science was added in 1983—from Hebrew University with four years of content stuffed into three years. With the help of academic consultants, Talpiot’s leaders designed psychometric tests that would assess candidates’ cognitive ability and creativity. The two hundred or so candidates that were shortlisted underwent a taxing interview where they were subjected to logical puzzles designed to test their creativity and critical thinking skills and asked to explain physical phenomena that went well beyond what they had studied at school.
A flaw in the selection process was quickly discovered; highly creative technical minds often do not have “stable and pleasant personalities” and members of the first classes of Talpiot were finding it difficult to be team players. Additional personality tests were implemented where prospective recruits were put through intense simulations to determine their leadership and teamwork capability as well as motivation and “moral value.” Once selected for the program, Talpiot students spent nearly all their time together, which fostered an intense bond.
The training was designed to push students to their limits. A favored technique was to identify the problem-solving strengths of students and play against them, forcing them to learn and think in different and original ways. As David Kutasov, a theoretical physicist at the University of Chicago and a Talpiot alumnus put it “A lot of kids these days, even at top American universities, are too conventional and not original. At Talpiot they beat it out of you and push you towards originality.” The intensity of the program results in a high attrition rate; nearly 25% of those originally selected fail to complete the program. …
Talpiot alumni have been successful in the private sector as well, founding over one hundred companies worth over $50 billion. … Combined, the aerospace, defense, and cybersecurity industries account for roughly 15% of Israel’s annual exports. A large portion of Israel’s economic strength stems from industries whose leadership and technical expertise are drawn from Unit 8200 [Israel’s version of the NSA] and Talpiot alumni. Venture capital firms have been started by their alumni, including Glilot Capital Partners and Axon, whose strategy is to fund startups whose founding teams are fellow veterans of these programs. These alumni are increasingly venturing beyond their traditional domains and have founded companies in markets as diverse as transportation, healthcare, construction, agriculture, media, and more.
Talpiot alumni have also become some of the most renowned researchers in the world in fields ranging from theoretical physics to systems biology. Examples include Yoav Freund, who won the Godel Prize for his work in machine learning, and Elon Lindenstrauss, who received the Fields Medal—the “Nobel Prize of mathematics”—for his work in the area of dynamics. One former member received a technical Grammy for inventing an audio mixing technology. Unlike many other academically selective programs, they manage to avoid selecting for excessive conformity that impedes intellectual explorations. …
The success of Talpiot’s alumni is made more remarkable by their relatively small number; roughly 2000 people have graduated from the program in its lifetime, equivalent in size to the average annual freshman class at Harvard. The success of Talpiot has spurred the development of a similar program in South Korea which explicitly referenced Talpiot as an inspiration. While China has developed a similar program as well.
This essay argues that rational people don’t have goals, and that rational AIs shouldn’t have goals. Human actions are rational not because we direct them at some final ‘goals,’ but because we align actions to practices[1]: networks of actions, action-dispositions, action-evaluation criteria, and action-resources that structure, clarify, develop, and promote themselves. If we want AIs that can genuinely support, collaborate with, or even comply with human agency, AI agents’ deliberations must share a “type signature” with the practices-based logic we use to reflect and act.
I argue that these issues matter not just for aligning AI to grand ethical ideals like human flourishing, but also for aligning AI to core safety-properties like transparency, helpfulness, harmlessness, or corrigibility. Concepts like ’harmlessness’ or ‘corrigibility’ are unnatural—brittle, unstable, arbitrary—for agents who’d interpret them in terms of goals or rules, but natural for agents who’d interpret them as dynamics in networks of actions, action-dispositions, action-evaluation criteria, and action-resources.
While the issues this essay tackles tend to sprawl, one theme that reappears over and over is the relevance of the formula ‘promote xx-ingly.’ I argue that this formula captures something important about both meaningful human life-activity (art is the artistic promotion of art, romance is the romantic promotion of romance) and real human morality (to care about kindness is to promote kindness kindly, to care about honesty is to promote honesty honestly).
I start by asking: What follows for AI alignment if we take the concept of eudaimonia—active, rational human flourishing—seriously? I argue that the concept of eudaimonia doesn’t simply point to a desired state or trajectory of the world that we should set as an AI’s optimization target, but rather points to a structure of deliberation different from standard consequentialist[2] rationality. I then argue that this form of rational activity and valuing, which l call eudaimonic rationality[3], is a useful or even necessary framework for the agency and values of human-aligned AIs.
These arguments are based both on the dangers of a “type mismatch” between human flourishing as an optimization target and consequentialist optimization as a form, and on certain material advantages that eudaimonic rationality plausibly possesses in comparison to deontological and consequentialist agency with regard to stability and safety.
The concept of eudaimonia, I argue, suggests a form of rational activity without a strict distinction between means and ends, or between ‘instrumental’ and ‘terminal’ values. In this model of rational activity, a rational action is an element of a valued practice in roughly the same sense that a note is an element of a melody, a time-step is an element of a computation, and a moment in an organism’s cellular life is an element of that organism’s self-subsistence and self-development.[4]
My central claim is that our intuitions about the nature of human flourishing are implicitly intuitions that eudaimonic rationality can be functionally robust in a sense highly critical to AI alignment. More specifically, I argue that in light of our best intuitions about the nature of human flourishing it’s plausible that eudaimonic rationality is a natural form of agency, and that eudaimonic rationality is effective even by the light of certain consequentialist approximations of its values. I then argue that if our goal is to align AI in support of human flourishing, and if it is furthermore plausible that eudaimonic rationality is natural and efficacious, then many classical AI safety considerations and ‘paradoxes’ of AI alignment speak in favor of trying to instill AIs with eudaimonic rationality.
TL;DR - [When trying to casually inform oneself in areas one isn’t an expert in, via reading books (and often other pieces) directed at a general audience] I think the value of reading a book once (without active engagement) is awkwardly small, and the value of big time investments like reading a book several times—or actively engaging with even part of it—is awkwardly large compared to that. Also, the maximum amount of understanding you can get is awkwardly small.
That’s the summary; his argument:
Let’s say you’re interested in a 500-page serious nonfiction book, and you’re trying to decide whether to read it. I think most people imagine their choice something like this:
I see things more like this:
I’ve recently noticed this essay might have been somewhat of a bad influence on me. When I first saw it in 2021 I thought “yup seems correct”, and since then have regularly had the 2nd table come to mind to dissuade me when I was on the fence about reading a particular long nonfiction book, to the point where I now no longer have much patience for the doorstoppers I used to read with relish. So over the 4-ish years since I’ve probably engaged substantively with fewer differently free thinkers’ worldviews than I could have, content as I was with shallow engagement with more of them. I’ve done more of Holden’s last row if I replace “the book” with “a topic I care about / need to make a decision on”, which seems robustly good, but that’s not really attributable to this essay.
Not mine but Scott!2011′s, in case it’s also of wider interest:
I’ve gained most from reading Eliezer, Mencius Moldbug, Aleister Crowley, and G.K. Chesterton (links go to writing samples from each I consider particularly good); I’m currently making my way through Chesterton’s collected works pretty much with the sole aim of imprinting his writing style into my brain.
Stepping from the sublime to the ridiculous, I took a lot from reading Dave Barry when I was a child. He has a very observational sense of humor, the sort where instead of going out looking for jokes, he just writes about a topic and it ends up funny. It’s not hard to copy if you’re familiar enough with it.
Saving this exchange between Tyler Cowen and Peter Singer for my own future reference:
COWEN: Well, take the Bernard Williams question, which I think you’ve written about. Let’s say that aliens are coming to Earth, and they may do away with us, and we may have reason to believe they could be happier here on Earth than what we can do with Earth. I don’t think I know any utilitarians who would sign up to fight with the aliens, no matter what their moral theory would be.
SINGER: Okay, you’ve just met one.
COWEN: I’ve just met one. So, you would sign up to fight with the aliens?
SINGER: If the hypothesis is like that, that the aliens are wiser than we are, they know how to make the world a better place for everyone, they’re giving full weight to human interests, but they say, “Even though we’re giving full weight to human interests, not discounting your interests because you’re not a member of our species, as you do with animals, but unfortunately, it just works out that to produce a better world, you have to go,” I’ll say, “Okay, if your calculations are right, if that’s all right, I’m on your side.”
COWEN: You’re making them a little nicer. You’re calling them wise. They may or may not be wise. They’re just happier than we are. They have less stress, less depression. If they could rule over Earth, they would do a better go of it than we would. I would still side with the humans.
SINGER: I would not. What you’ve shown now is that their interest happens to coincide with the universal good. That’s the way to produce more happiness, full stop, not just more happiness for them. And if that’s the case, I’m on their side.
COWEN: How do we know there is a universal good? You’re selling out your fellow humans based on this belief in a universal good, which is quite abstract, right? The other smart humans you know mostly don’t agree with you, I think, I hope.
SINGER: But you’re using the kind of language that Bernard Williams used when he says, “Whose side are you on?” You said, “You’re selling out your fellow humans,” as if I owe loyalty to members of my species above loyalty to good in general, that is, to maximizing happiness and well-being for all of those affected by it. I don’t claim to have any particular loyalty for my species rather than the general good.
COWEN: If there’s not this common metric between us and the aliens, but you just measure — you hook people up to a scale, you measure. They have more of it than we do. Let them come in. If that doesn’t exist, what is the common good or universal good in this setting?
SINGER: I don’t know if that doesn’t exist, but you said they’re happier than we are, which suggests that there is a common metric of happiness, and that was the basis on which I answered your question. If there’s no common metric, I don’t really have an answer, or I would try to use the metric of overall happiness. I’m not sure why I wouldn’t be able to use that, but if we assume that I couldn’t, then I would just not know what to do.
COWEN: So you wouldn’t fight for our side. Even then, you’d throw up your hands or just not be sure what to do.
SINGER: No, this is not about a football team. You can give your loyalty to a football team and support them, even though you don’t really think that they’re somehow more morally worthy of winning than their opponents. But this is not a game like this. There’s everything at stake.
COWEN: To what extent for you is utilitarianism not only a good theory of outcomes but also a theory of obligation? I’m sure you know the Donald Regan literature, this “Oh, you prefer the outcome with more utility,” but “What should I do?” can still be a complex question.
SINGER: Well, it can be a complex question in the sense that it may be that we don’t want to directly aim at utility because we’re likely to get things wrong. If we can’t be confident in our calculations that we are doing the right thing, then I think the obligations that we have are to maximize utility. But it’s been argued that we’re more likely to make mistakes if we do that, and rather that our obligation should be to conform to certain principles or rules. I think that depends on how confident you are in your ability.
I certainly think we should follow rules of thumb sometimes, when we can’t be sure of what’s the right outcome, and we should do what generally is accepted. You go back to Sam Bankman-Fried. Obviously, I think that was his mistake. He was too confident that he could get things right and fix things and didn’t follow basic rules, or at least it’s alleged that he didn’t follow basic rules, like “Don’t steal your clients’ money.”
COWEN: Isn’t there a dilemma above and beyond the epistemic dilemma? Say, you, Peter Singer, you’re programming a driverless car and you’re in charge. Ideally, you would like to program the car to be a utilitarian and Benthamite car, that if it has to swerve, it would sooner kill one older person than two younger people, and so on.
Let’s say you also knew that if you programmed the driverless car to be Benthamite, basically, the law would shut it down, public opinion would rebel, you’d get in trouble, the automaker would get in trouble. How then would you program the car?
SINGER: Yes, I would program it to produce the best consequences that would not be prohibited by the government or the manufacturer. I’m all in favor of making compromises if you have to, to produce the most good that you possibly can in the circumstances in which you are.
COWEN: Doesn’t that then mean individuals should hold onto some moral theory that may be quite far from utilitarianism? It’s not just a compromise. You need to be very intuition driven, nonutilitarian just to get people to trust you, to work with you, to cooperate. In that sense, at the obligation level, you’re not so utilitarian at all.
SINGER: You may be. That will depend on your own nature, as to whether you think you’re going to be led astray if you’re not intuition driven. Or you may think that you can be self-aware about the risks that you’re going to go wrong. You’re not exactly intuition driven, but you’re driven by the thought that “I could be mistaken here, and it’s probably going to have more value if I don’t just directly think about how to produce the most utility.”
You already get arbitrarily high upper bounds with reversible computation, and waiting until the universe gets cooler can yield 10^30x additional computation, no need for weirder physics. Bostrom mentioned both above, he was pretty explicit about the 10^85 ops cosmic endowment being a conservative lower bound. Krauss & Starkman’s 1.35e120 ops bound derived from the observed acceleration of the universe is the non-weird physics upper bound AFAIK (h/t Wei Dai).
In addition to verifying obviousness by posting like Gwern mentioned, which in my experience is a frequent source of surprise, there’s the advanced version of this that e.g. Toby Ord has done most of his career by his own lights. This does require taste at picking topics “at the border of the trivial and the profound” to quote him, but taste is pretty clearly something you have aplenty.
Great post, I wish more longtermist grantmakers wrote about this.
To copy over some of my comments:
My general reaction is that while I agree that prioritization is more important between buckets (especially the “future-improvement” conversion) than within them, the BOTEC maximalism framework as illustrated with these examples is much less convincing between buckets than within, which seems like a mismatch. Founders Pledge’s climate grantmaking team elides this entirely while remaining quantitative by exploiting shared structure in uncertainties across interventions to make robust-ish judgments about relative comparisons without necessarily committing to any unit conversions and such, seems worth contrasting. (You did note that you do this too when cheap.)
You also do mention having a bunch of cached takes on how good various intermediate desiderata are and how those all cash out to future-improvement, but given that you can’t publish them I have to take your word for it, which I guess is fine. I would be keen to see others publish it though, since Nuno Sempere’s experiment to elicit valuations of research from Fin Moorhouse, Gavin Leech, Jaime Sevilla, Linch Zhang, Misha Yagudin, and Ozzie Gooen using a pairwise comparison-based utility function extractor app led to both inconsistent preferences and wide-ranging disagreements, which makes me think people should compare notes on this more:
I also find it kind of wild that you can find many decent opportunities in the 1-20x range on your bar since I’d anchored to Linch’s old $100M per bp (0.2-0.9x) being a bar at which he’d feel bullish about funding, which I guess means I still haven’t internalised how wide the the between-buckets range can get, or it could mean you’re more like Gavin above (widest valuation range) than Fin (narrowest), and I don’t really know where to land on this valuation range spectrum.
On the BOTEC maximalism and your bar, can you say more? I guess I’ve been a bit cluster-pilled, especially in practice given how bad the thinking I’ve seen is in many BOTECs, so if anyone else said this I’d be skeptical, but I respect your thinking and I thought Eric’s CEA of donating $1k to Alex Bores was good, so I’m intrigued.
The recent METR Research note: We spent 2 hours working in the future (quick take) gave a neat visual for this: