I have not read the book but I think this is exactly wrong, in that what happens after the ??? step is that shareholder value is not maximized.
I think you misinterpreted the book review: Caroline was almost surely making a Underpants Gnomes reference, which is used to indicate that the last thing does not follow in any way from the preceding.
This is honestly some of the most significant alignment work I’ve seen in recent years (for reasons I plan to post on shortly), thank you for going to all this length!
Typo: “Thoughout this process test loss remains low—even a partial memorising solution still performs extremely badly on unseen data!”, ‘low’ should be ‘high’ (and ‘throughout’ is misspelled too).
So I would argue that all of the main contenders are very training data efficient compared to artificial neural nets. I’m not going to go into detail on that argument, unless people let me know that that seems cruxy to them and they’d like more detail.
I’m not sure I get this enough for it to even be a crux, but what’s the intuition behind this?
My guess for your argument is that you see it as analogous to the way a CNN beats out a fully-connected one at image recognition, because it cuts down massively on the number of possible models, compatibly with the known structure of the problem.
But that raises the question, why are these biology-inspired networks more likely to be better representations of general intelligence than something like transformers? Genuinely curious what you’ll say here.
(Wisdom of evolution only carries so much weight for me, because the human brain is under constraints like collocation of neurons that prevent evolution from building things that artificial architectures can do.)
Has any serious AI Safety research org thought about situating themselves so that they could continue to function after a nuclear war?
Wait, hear me out.
A global thermonuclear war would set AI timelines back by at least a decade, for all of the obvious reasons. So an AI Safety org that survived would have additional precious years to work on the alignment problem, compared to orgs in the worlds where we avoid that war.
So it seems to me that at least one org with short timelines ought to move to New Zealand or at least move farther away from cities.
(Yes, I know MIRI was pondering leaving the Bay Area for underspecified reasons. I’d love to know what their thinking was regarding this effect, but I don’t expect they’d reveal it.)
The distinction between your post and Eliezer’s is more or less that he doesn’t trust anyone to identify or think sanely about [plans that they admit have negative expected value in terms of log odds but believe possess a compensatory advantage in probability of success conditional on some assumption].
Such plans are very likely to hurt the remaining opportunities in the worlds where the assumption doesn’t hold, which makes it especially bad if different actors are committing to different plans. And he thinks that even if a plan’s assumptions hold, the odds of its success are far lower than the planner envisioned.
Eliezer’s preferred strategy at this point is to continue doing the kind of AI Safety work that doesn’t blow up if assumptions aren’t met, and if enough of that work is complete and there’s an unexpected affordance for applying that kind of work to realistic AIs, then there’s a theoretical possibility of capitalizing on it. (But, well, you see how pessimistic he’s become if he thinks that’s both the best shot we have and also probability ~0.)
And he wanted to put a roadblock in front of this specific well-intentioned framing, not least because it is way too easy for some readers to round into support for Leeroy Jenkins strategies.
In principle, I was imagining talking about two AIs.
In practice, there are quite a few preferences I feel confident a random person would have, even if the details differ between people and even though there’s no canonical way to rectify our preferences into a utility function. I believe that the argument carries through practically with a decent amount of noise; I certainly treat it as some evidence for X when a thinker I respect believes X.
Ah, that makes more sense.
Identifying someone else’s beliefs requires you to separate a person’s value function from their beliefs, which is impossible.
I think it’s unfair to raise this objection here while treating beliefs about probability as fundamental throughout the remainder of the post.
If you instead want to talk about the probability-utility mix that can be extracted from seeing another agent’s actions even while treating them as a black box… two Bayesian utility-maximizers with relatively simple utility functions in a rich environment will indeed start inferring Bayesian structure in each others’ actions (via things like absence of Dutch booking w.r.t. instrumental resources); they will therefore start treating each others’ actions as a source of evidence about the world, even without being confident about each others’ exact belief/value split.
If you want to argue their beliefs won’t converge, you’ll have to give a good example.
This fails to engage with Eli’s above comment, which focuses on Elon Musk, and is a counterargument to the very thing you’re saying.
Probable typos: the Qs switch from Q4 to Q5 before the bolded Q5 question.
and they, I’m afraid, will be PrudentBot, not FairBot.
This shouldn’t matter for anyone besides me, but there’s something personally heartbreaking about seeing the one bit of research for which I feel comfortable claiming a fraction of a point of dignity, being mentioned validly to argue why decision theory won’t save us.
(Modal bargaining agents didn’t turn out to be helpful, but given the state of knowledge at that time, it was worth doing.)
I see Biden as having cogent things he intends to communicate but sometimes failing to speak them coherently, while Trump is a pure stream of consciousness sometimes, stringing together loosely related concepts like a GPT.
(This isn’t the same as cognitive capacity, mind you. Trump is certainly more intelligent than many people who speak more legibly.)
I haven’t seen a “word salad” from Biden where I can’t go “okay, here’s the content he intended to communicate”, but there are plenty from Trump where I can’t reconstruct anything more than sentiment and gestures at disconnected facts.
Oh huh, and it also pairs well with my later Choosing the Zero Point post!
“How” questions are less amenable to lucky guesses than “what” questions. Especially planning questions, e.g. “how would you make a good hat out of food?”
As Anisha said, GPT can pick something workable from a top-100-most-common menu with just a bit of luck, but engineering a plan for a nonstandard task seems beyond its capacity.
Thanks for drawing distinctions—I mean #1 only.
Is there already a concept handle for the notion of a Problem Where The Intuitive Solution Actually Makes It Worse But Makes You Want To Use Even More Dakka On It?
My most salient example is the way that political progressives in the Bay Area tried using restrictive zoning and rent control in order to prevent displacement… but this made for a housing shortage and made the existing housing stock skyrocket in value… which led to displacement happening by other (often cruel and/or backhanded) methods… which led to progressives concluding that their rules weren’t restrictive enough.
Another example is that treating a chunk of the population with contempt makes a good number of people in that chunk become even more opposed to you, which makes you want to show even more contempt for them, etc. (Which is not to say their ideas are correct or even worthy of serious consideration—but the people are always worthy of respect.)
That sort of dynamic is how you can get an absolutely fucked-up self-reinforcing situation, an inadequate quasi-equilibrium that’s not even a Nash equilibrium, but exists because at least one party is completely wrong about its incentives.
(And before you get cynical, of course there are disingenuous people whose preferences are perfectly well served in that quasi-equilibrium. But most activists do care about the outcomes, and would change their actions if they were genuinely convinced the outcomes would be different.)
You can see my other reviews from this and past years, and check that I don’t generally say this sort of thing:
This was the best post I’ve written in years. I think it distilled an idea that’s perennially sorely needed in the EA community, and presented it well. I fully endorse it word-for-word today.
The only edit I’d consider making is to have the “Denial” reaction explicitly say “that pit over there doesn’t really exist”.
(Yeah, I know, not an especially informative review—just that the upvote to my past self is an exceptionally strong one.)
Re: your second paragraph, I was (and am) of the opinion that, given the first sentence, readers were in danger of being sucked down into their thoughts on the object-level topic before they would even reach the meta-level point. So I gave a hard disclaimer then and there.
Your mileage varied, of course, but I model more people as having been saved by the warning lights than blinded by them.
There are some posts with perennial value, and some which depend heavily on their surrounding context. This post is of the latter type. I think it was pretty worthwhile in its day (and in particular, the analogy between GPT upgrades and developmental stages is one I still find interesting), but I leave it to you whether the book should include time capsules like this.
It’s also worth noting that, in the recent discussions, Eliezer has pointed to the GPT architecture as an example that scaling up has worked better than expected, but he diverges from the thesis of this post on a practical level:
I suspect that you cannot get this out of small large amounts of gradient descent on small large layered transformers, and therefore I suspect that GPT-N does not approach superintelligence before the world is ended by systems that look differently, but I could be wrong about that.
I unpack this as the claim that someone will always be working on directly goal-oriented AI development, and that inner optimizers in an only-indirectly-goal-oriented architecture like GPT-N will take enough hardware that someone else will have already built an outer optimizer by the time it happens.
That sounds reasonable, it’s a consideration I’d missed at the time, and I’m sure that OpenAI-sized amounts of money will be paid into more goal-oriented natural language projects adapted to whatever paradigm is prominent at the time. But I still agree with Eliezer’s “but I could be wrong” here.
Fighting is different from trying. To fight harder for X is more externally verifiable than to try harder for X.
It’s one thing to acknowledge that the game appears to be unwinnable. It’s another thing to fight any less hard on that account.