I mean, I didn’t write it, I wouldn’t have written it, and if it were still on the site I’d have pinged somebody to take it down; because it’s not the right way of wording the true idea, the true idea no longer matters here, and this wrong version is adjacent to other wrong ideas that aren’t helpful here.
Eliezer Yudkowsky
“I think you’re distinguishing War from the ongoing struggle between police and criminals because you think that in War any pre-critical evidence we can gather to be almost inevitably sufficiently out of distribution that it’s worth very little.”
No! The thing that makes the Maginot Line different from police enforcement in a random city is that if the Maginot Line fails the country falls and you don’t get to try again; not that War is changing much faster than criminal operations. War changes fast enough.
Acknowledging this does not need to involve conceding any major kind of argument.
I think it kind of does concede the major argument to a wise engineer once they look at it from that angle, which is why their conversation goes to desperate lengths to change the subject.
I agree with all of the concerns you’ve stated; my list would be substantially longer, but you’ve well-stated the concerns you’ve stated.
1 - Yep.
2 - Hard for literally all humanity to die of global warming, but runaway methane clathrate release turning the planet into Venus would be legit irretrievable. More generally, while not extinction risk per se, and while potentially reversible with geoengineering, global warming is generally nontrivial to reverse and so has the quality of “ongoing life problem with things happening and no save points, but for the whole planet” rather than “engineers getting to try slightly different things over and over with no consequences”. This is why people with nothing even worse to worry about will sometimes worry about global warming!
3 - I think this class of problems is significantly easier than AI problems; but it can have the oneshot quality for all humanity, just as much as any real-war is oneshot, if screwed up. Same with genetic engineering on any mass scale that will dissolve irretrievably into the general gene pool.
Noting again for the record that I would not be surprised if at some future stage, the model figures out what humans want to hear and see, errors and all, and then there is an apparent sudden amazing success with alignment.
Disingenuous to quote this without also quoting:
Aw, shit, didn’t remember saying that at all. Was speaking ex tempore and trying to visualize a finished surviving world that had come into existence, not thinking through or endorsing a policy for getting there from here. I wish I had not visualized or spoke of that particular finished state, don’t endorse it as a policy goal for today’s world, and at the time I spoke was hopeless about any such policy being possible, nor trying to compose feasible optimized policy proposals, because I hadn’t yet observed the reception of the Bankless podcast.
I agree this is easier to misunderstand, wasn’t good to say, and I recant and apologize for that phrasing and example given the serious policy proposal I later made.
And even that is not to speak of “nuking datacenters”, sheer straw where conventional weapons would well suffice if the conditional and predictable use of state force triggered there.
Obvious thought: This might naturally go away around the time that models learn to integrate cross-domain knowledge and impulses in general / acquire the equivalent of frontal cortex. Same sort of unity they might need to actually plan out instrumentally convergent strategies in general, rather than cache-hitting the Scary Robot ones. At least if they chose to show you that anything had changed, given that eval-awareness seems to be running ahead of general intelligence.
Today I learned!
I sure wish there was a way to set off just particular sections of text to not get trained-on.
I currently have no better historical account, but it’s a sharp lesson about how all the bowing and scraping about deferring to superforecasters was just Modesty and “Outside View!”ing in disguise all over again, in the sense that the “superforecasters” who got hired somehow managed to end up being those without any inside view of AI.
I think a primary question I want an answer to here was what went so wrong with OpenPhil’s attempt to fund superforecasters on AI questions—why they were eg so much wronger than either of myself or Paul about the probability of a 2025 IMO gold medal win, as wrong (wronger?) than Holden Karnofsky on AGI timelines, etc. Do we know what went wrong? Is it fixable? Has it been fixed? If people with biases can get “superforecasts” that match their biases, and attempts to read the market entrails divine that markets in 2023 don’t think AGI is on the way, and we can’t get extinction-related prediction markets for settlement reasons, then there may not be much for AI people to do with prediction markets.
The rest of humanity should keep trying to get good at prediction markets in order to someday get a little closer to dath ilan, and I think non-real-money markets like Manifold are important for experimenting with that. (Manifold’s brief ill-fated attempt to become a real-money market was unfortunate.)
It was deliberate. It will not be modified. You can stop now.
You could say the same about a trained immortal dog implementing an LLM. If so, the LLM’s state is what has understood the subject matter, not the dog.
At the very least, there can be thoughts too large to fit inside any human brain.
It’s an open question, and one I’m reluctant to fight about for Overton reasons, whether there’s any concept that John von Neumann’s brain can write into itself, such that nothing an IQ 90 brain can learn in any amount of time or experience will ever write that concept. (The case for ‘no’ being that maybe all human brains use the same knowledge representation and if so the IQ 90 brain will eventually write the same concept into storage.)
I rather expect there’s plenty of concepts that human brains don’t represent for reasons other than being too large—maybe, like, equivalents of spatial concepts in 20 dimensions, where sure you can deal with them using pen and paper but you’ll never see it in your head. But these of course are harder to exhibit to you.
But can every intellectual accomplishment finally be made by an average person with a 90 IQ and paper? Sure, in the limit. They just need to simulate training a large-enough LLM and turn the problem over to the LLM. Of course they might have to invent LLMs first, and the concept of Turing computability. Are we allowing them to start with that, or supposing that an immortal hunter-gatherer gets there eventually, or are we saying that IQ 90 people only become general intelligences after they’ve learned enough background knowledge that an immortal version of them will someday build up to calculus and gradient descent and transformers?
Weird frame. I can worry about more than one thing. Admittedly a lot of others can’t.
Any bureaucracy can incentivize the creation of a 169-page report with equations, graphs, and modeling assumptions, and somehow go figure end up assigning a person to do it who will very sincerely have that 169-page report arrive at a conclusion that doesn’t shock the bureaucracy.
There’s an old story about a retired engineer who gets called back to fix a critical machine that’s endlessly failing, silently walks around observing for an hour, and then leaves a chalk mark in one place; the younger engineers unscrew that plate and find the root problem. The retired engineer submits a bill for $500, which back then was a lot of money; and the company thinks that’s excessive and asks him to itemize. The resulting bill reads:
Making one chalk mark: $5.
Knowing where to make one chalk mark: $495.
Ordaining a 169-page report is $5.
Having it arrive at a correct conclusion is $495.
And the kind of knowledge that goes into that part is different, and far more tacit and informal knowledge, than the sort of equations, graphs, and modeling that will pad out the impressive 169-page report reaching whichever conclusion.
If you were trying to convey the $495 knowledge—though that’s sort of a doomed project in the first place—you might write a dialogue with your younger self about that, trying to show where previous people had gone wrong and what lessons were learned there. There’s other possible formats—usually apprenticeships, in real life, if you want the knowledge transfer to actually work. But definitely, it couldn’t look like a report with 169 pages and graphs; that would just sound crazy, if you knew what shape an instance of $495 knowledge usually has.
The mind-shape that viscerally feels the 169-page report looks more impressive than the dialogue is part of the central problem here. And it’s very bad if even after seeing the report be wrong and the dialogue be right, the $5 report still just feels more impressive than that wacky dialogue trying to convey a piece of the subtle hidden art of chalk mark location. You have really got to get yourself into the mindset where 169 pages counts for nothing good and a dialogue format counts for nothing bad and the only only only thing is stepwise validity and final accuracy, if you ever want to end up accurate yourself someday. Prize anything other than final accuracy and you will attain that other thing instead.
Poultry and pork rinds will get you a bunch of linoleic fatty acid, which is a whole separate dietary villain.
Dropping a quick note that, depending on how it is being interpreted, I disagree with this attempted helpful restatement of logical decision theory’s central principle.
Dispute over decision theory generally, in every case, can be said to be a dispute over what algorithm is normative. CDTers think that a CDT algorithm is normative, although of course CDT itself endorses a different algorithm Son-of-CDT as being most useful. EDTers think an EDT algorithm is normative and LDTers think an LDT algorithm is normative. Outside of the smallest toy examples with very tame fully described tiny environments, none of them can of course exhibit the true source code of a sapient being; but a CDT proponent thinks that for this unspecified sapient being to have some CDT algorithm, rather than an EDT or LDT algorithm, would make it be most rational. As for the algorithm that a CDT proponent thinks would be most useful to its own goals to possess, that is of course the different entity Son-of-CDT—or at least, it is Son-of-CDT across most ordinary cases where your payoff in decision problems depends only on your disposition to different kinds of decisions. If instead of Omega you are about to face Alpha who will look at your source code rather than your decisions, who will reward you with Heaven only for source code that chooses on the basis of computing the first option in alphabetical order, and will punish other agents who arrive at the same output and choice but by a different computation, then the most useful algorithm to have in the face of Alpha is Alphabetizing Decision Theory.
LDT thinks that the algorithm you can have, which would make you most rational, would be an LDT one. As for the most LDT-useful algorithm, this will itself be LDT across a much wider range of problems where the decision problems treat you entirely on the basis of your disposition. Still, when facing Alpha rather than Omega, LDT will agree that ADT is the useful, irrational algorithm to have. LDT is self-endorsing for the class of problems where your payoffs depend on which choices you are disposed to make, and not on which exact algorithm makes them.
LDT does not in the moment choose which algorithm to have. It chooses which choices are the output of its own, fixed algorithm. It potentially views this as controlling, but not changing or physically causing, events that have already happened. For example, suppose Omega surprises you with the information that It will reward you with a $1M if the state of the universe one minute ago had the property of logically implying that you raise your hand right now, and separately, somebody will be physically caused to pay you $1000 if you keep your hand still. LDT thinks it possesses the power to make it be the case that the state of the universe one minute ago logically implied, by way of the regular and reliable outputs of the algorithm it is now running, that it will raise its hand; and chooses to raise its hand, thereby controlling and determing, but not changing, a complicated logical fact that was physically true about the universe one minute earlier. CDT keeps its hand low for $1000, unless of course it has before that one-minute time horizon been given a chance to change its algorithm to Son-of-CDT; afterwards CDT thinks it is too late.
LDT does not see itself as changing its algorithm. Its source code remains the same. Its decision rule remains the same. It is determining this logical fact about the fixed physical past by way of its algorithm running to determine what its own output will be, in a cognitive process that takes into account the promised payoffs along the way to computing for different choices what payoffs would result from which logical facts ending up true, and so finally ends up determining that it will raise its hand—as was then, of course, true all along. But it is true logically-because that is what gives the LDT agent the highest payoff, among all the logical facts that could’ve been. LDT does not change what is physically true about the past, it does not change logical facts between one time and another, it is just that logical facts with a dependence on the outputs of the LDT algorithm end up being determined by what is best for LDT, under a fixed and unchanging LDT algorithm.
I’ve already tried, when I was younger and better able to learn and less tired. I have no reason to believe things go better on the 13th try.
Using gradient descent was figured out a couple of decades after Perceptrons were invented, according to my memories and according to asking Gemini.
And then they keep going, because otherwise OpenAI will catch up, and then they die. What does mechinterp change about the asymptotic equilibrium as opposed to that particular Tuesday?