Weird frame. I can worry about more than one thing. Admittedly a lot of others can’t.
Eliezer Yudkowsky
Any bureaucracy can incentivize the creation of a 169-page report with equations, graphs, and modeling assumptions, and somehow go figure end up assigning a person to do it who will very sincerely have that 169-page report arrive at a conclusion that doesn’t shock the bureaucracy.
There’s an old story about a retired engineer who gets called back to fix a critical machine that’s endlessly failing, silently walks around observing for an hour, and then leaves a chalk mark in one place; the younger engineers unscrew that plate and find the root problem. The retired engineer submits a bill for $500, which back then was a lot of money; and the company thinks that’s excessive and asks him to itemize. The resulting bill reads:
Making one chalk mark: $5.
Knowing where to make one chalk mark: $495.
Ordaining a 169-page report is $5.
Having it arrive at a correct conclusion is $495.
And the kind of knowledge that goes into that part is different, and far more tacit and informal knowledge, than the sort of equations, graphs, and modeling that will pad out the impressive 169-page report reaching whichever conclusion.
If you were trying to convey the $495 knowledge—though that’s sort of a doomed project in the first place—you might write a dialogue with your younger self about that, trying to show where previous people had gone wrong and what lessons were learned there. There’s other possible formats—usually apprenticeships, in real life, if you want the knowledge transfer to actually work. But definitely, it couldn’t look like a report with 169 pages and graphs; that would just sound crazy, if you knew what shape an instance of $495 knowledge usually has.
The mind-shape that viscerally feels the 169-page report looks more impressive than the dialogue is part of the central problem here. And it’s very bad if even after seeing the report be wrong and the dialogue be right, the $5 report still just feels more impressive than that wacky dialogue trying to convey a piece of the subtle hidden art of chalk mark location. You have really got to get yourself into the mindset where 169 pages counts for nothing good and a dialogue format counts for nothing bad and the only only only thing is stepwise validity and final accuracy, if you ever want to end up accurate yourself someday. Prize anything other than final accuracy and you will attain that other thing instead.
Poultry and pork rinds will get you a bunch of linoleic fatty acid, which is a whole separate dietary villain.
Dropping a quick note that, depending on how it is being interpreted, I disagree with this attempted helpful restatement of logical decision theory’s central principle.
Dispute over decision theory generally, in every case, can be said to be a dispute over what algorithm is normative. CDTers think that a CDT algorithm is normative, although of course CDT itself endorses a different algorithm Son-of-CDT as being most useful. EDTers think an EDT algorithm is normative and LDTers think an LDT algorithm is normative. Outside of the smallest toy examples with very tame fully described tiny environments, none of them can of course exhibit the true source code of a sapient being; but a CDT proponent thinks that for this unspecified sapient being to have some CDT algorithm, rather than an EDT or LDT algorithm, would make it be most rational. As for the algorithm that a CDT proponent thinks would be most useful to its own goals to possess, that is of course the different entity Son-of-CDT—or at least, it is Son-of-CDT across most ordinary cases where your payoff in decision problems depends only on your disposition to different kinds of decisions. If instead of Omega you are about to face Alpha who will look at your source code rather than your decisions, who will reward you with Heaven only for source code that chooses on the basis of computing the first option in alphabetical order, and will punish other agents who arrive at the same output and choice but by a different computation, then the most useful algorithm to have in the face of Alpha is Alphabetizing Decision Theory.
LDT thinks that the algorithm you can have, which would make you most rational, would be an LDT one. As for the most LDT-useful algorithm, this will itself be LDT across a much wider range of problems where the decision problems treat you entirely on the basis of your disposition. Still, when facing Alpha rather than Omega, LDT will agree that ADT is the useful, irrational algorithm to have. LDT is self-endorsing for the class of problems where your payoffs depend on which choices you are disposed to make, and not on which exact algorithm makes them.
LDT does not in the moment choose which algorithm to have. It chooses which choices are the output of its own, fixed algorithm. It potentially views this as controlling, but not changing or physically causing, events that have already happened. For example, suppose Omega surprises you with the information that It will reward you with a $1M if the state of the universe one minute ago had the property of logically implying that you raise your hand right now, and separately, somebody will be physically caused to pay you $1000 if you keep your hand still. LDT thinks it possesses the power to make it be the case that the state of the universe one minute ago logically implied, by way of the regular and reliable outputs of the algorithm it is now running, that it will raise its hand; and chooses to raise its hand, thereby controlling and determing, but not changing, a complicated logical fact that was physically true about the universe one minute earlier. CDT keeps its hand low for $1000, unless of course it has before that one-minute time horizon been given a chance to change its algorithm to Son-of-CDT; afterwards CDT thinks it is too late.
LDT does not see itself as changing its algorithm. Its source code remains the same. Its decision rule remains the same. It is determining this logical fact about the fixed physical past by way of its algorithm running to determine what its own output will be, in a cognitive process that takes into account the promised payoffs along the way to computing for different choices what payoffs would result from which logical facts ending up true, and so finally ends up determining that it will raise its hand—as was then, of course, true all along. But it is true logically-because that is what gives the LDT agent the highest payoff, among all the logical facts that could’ve been. LDT does not change what is physically true about the past, it does not change logical facts between one time and another, it is just that logical facts with a dependence on the outputs of the LDT algorithm end up being determined by what is best for LDT, under a fixed and unchanging LDT algorithm.
I’ve already tried, when I was younger and better able to learn and less tired. I have no reason to believe things go better on the 13th try.
Using gradient descent was figured out a couple of decades after Perceptrons were invented, according to my memories and according to asking Gemini.
Noted. I think you are overlooking some of the dynamics of the weird dance that a bureaucratic institution does around pretending to be daring while their opinions are in fact insufficiently extreme; eg, why when OpenPhil ran a “change our views” contest, they predictably awarded all of the money to critiques arguing for longer timelines and lower risk, even though reality was in the opposite direction of their opinions from that. Just like OpenPhil predictably gave all the money to “we need two Stalins” critiques of them in the contest, OpenPhil might have managed to communicate to the ‘superforecasters’ or their institutions that the demanded apparent disagreement with OpenPhil’s overt forecast was in the “we need two Stalins” direction of longer timelines and lower risks.
Or to rephrase: If I can look at the organizational dynamics and see it as obvious in advance that OpenPhil’s “challenge our worldviews” contest would award all the money to people arguing for longer timelines and lower risk, (despite reality lying in the opposite direction, according to those people’s own later updates, even); then maybe the people advertising themselves as producing superforecaster reports, can successfully read OpenPhil’s mind about what direction of superforecaster disagreement is being secretly demanded.
But, sure, fair enough, I should also update somewhat in favor of the average superforecaster being even worse at AI than OpenPhil and them delivering an honest terrible report. I guess it’s just surprising to me because I would’ve expected the key maneuver here to be saying “I dunno” and not throwing around extreme opinions or numbers, and I would’ve thought superforecasters able to do that better than OpenPhil… but eh, idk, maybe they just straight up couldn’t tell the difference between the usually good rule “nothing ever happens” and “AGI in particular never happens”, and also didn’t know themselves for overconfident or incompetent at being able to apply the rule.
If so, it would speak correspondingly poorly of those EAs who stood around gesturing at the superforecasters and saying, “Why believe MIRI when you could believe these great certified experts?”
I guess it seems pretty weird to me that superforecasters would do that much worse than prediction markets without some selection or bias, but I’ll mark it down as a reasonable alternative hypothesis. (“Actually superforecasting just generalizes really poorly to this admittedly special domain, and random superforecasters do way worse in it than prediction markets by default.”)
It cannot be answered that simply to the Earthlings, because if you answer “Because I don’t expect that to actually work or help”, some of them and especially the more evil ones will pounce in reply, “Aha, so you’re not replying, ‘I’d never do that because it would be wrong and against the law’, what a terrible person you must be!”
Super upvoted.
With that said, why is the optimal amount of woo not zero?
Also I think nonaccomodationist vegans have tended to be among the crazier people, so maybe you want enough vegetables for the accommodationists but also beef from moderately less tortured cows.
I just saw one recently on the EA forum to the effect that EAs who shortened their timelines only after chatGPT had the intelligence of a houseplant.
Somebody asked if people got credit for <30 year timelines posted in 2025. I replied that this only demonstrated more intelligence than a potted plant.
If you do not understand how this is drastically different from the thing you said I said, ask an LLM to explain it to you; they’re now okay at LSAT-style questions if provided sufficient context.
In reply to your larger question, being very polite about the house burning down wasn’t working. Possibly being less polite doesn’t work either, of course, but it takes less time. In any case, as several commenters have noted, the main plan is to have people who aren’t me do the talking to those sorts of audiences. As several other commenters have noted, there’s a plausible benefit to having one person say it straight. As further commenters have noted, I’m tired, so you don’t really have an option of continuing to hear from a polite Eliezer; I’d just stop talking instead.
Noted as a possible error on my part.
I looked at “AI 2027” as a title and shook my head about how that was sacrificing credibility come 2027 on the altar of pretending to be a prophet and picking up some short-term gains at the expense of more cooperative actors. I didn’t bother pushing back because I didn’t expect that to have any effect. I have been yelling at people to shut up about trading their stupid little timelines as if they were astrological signs for as long as that’s been a practice (it has now been replaced by trading made-up numbers for p(doom)).
When somebody at least pretending to humility says, “Well, I think this here estimator is the best thing we have for anchoring a median estimate”, and I stroll over and proclaim, “Well I think that’s invalid”, I do think there is a certain justice in them demanding of me, “Well, would you at least like to say then in what direction my expectation seems to you to be predictably mistaken?”
If you can get that or 2050 equally well off yelling “Biological Anchoring”, why not admit that the intuition comes first and then you hunt around for parameters you like? This doesn’t sound like good methodology to me.
Is your take “Use these different parameters and you get AGI in 2028 with the current methods”?
I think OpenPhil was guided by Cotra’s estimate and promoted that estimate. If they’d labeled it: “Epistemic status: Obviously wrong but maybe somebody builds on it someday” then it would have had a different impact and probably not one I found objectionable.
Separately, I can’t imagine how you could build something not-BS on that foundation and if people are using it to advocate for short timelines then I probably regard that argument as BS and invalid as well.
Will MacAskill could serve as exemplar. More broadly I’m thinking of people who might have called themselves ‘longtermists’ or who hybridized Bostrom with Peter Singer.
I again don’t consider this a helpful thing to say on a sinking ship when somebody is trying to organize passengers getting to the lifeboats.
Especially if your definition of “AI takeover” is such as to include lots of good possibilities as well as bad ones; maybe the iceberg rockets your ship to the destination sooner and provides all the passengers with free iced drinks, who can say?
You could say the same about a trained immortal dog implementing an LLM. If so, the LLM’s state is what has understood the subject matter, not the dog.
At the very least, there can be thoughts too large to fit inside any human brain.
It’s an open question, and one I’m reluctant to fight about for Overton reasons, whether there’s any concept that John von Neumann’s brain can write into itself, such that nothing an IQ 90 brain can learn in any amount of time or experience will ever write that concept. (The case for ‘no’ being that maybe all human brains use the same knowledge representation and if so the IQ 90 brain will eventually write the same concept into storage.)
I rather expect there’s plenty of concepts that human brains don’t represent for reasons other than being too large—maybe, like, equivalents of spatial concepts in 20 dimensions, where sure you can deal with them using pen and paper but you’ll never see it in your head. But these of course are harder to exhibit to you.
But can every intellectual accomplishment finally be made by an average person with a 90 IQ and paper? Sure, in the limit. They just need to simulate training a large-enough LLM and turn the problem over to the LLM. Of course they might have to invent LLMs first, and the concept of Turing computability. Are we allowing them to start with that, or supposing that an immortal hunter-gatherer gets there eventually, or are we saying that IQ 90 people only become general intelligences after they’ve learned enough background knowledge that an immortal version of them will someday build up to calculus and gradient descent and transformers?