Thanks for compiling the Metaculus predictions! Seems like on 4⁄6 the community updated their timelines to be sooner. Also notable that Matthew Barnett just conceded a short timelines bet early! He says he actually updated his timelines a few months ago, partially due to ChatGPT.
Robert_AIZI
Earlier this month PALM-E gives a hint of one way to incorporate vision into LLMs (statement, paper) though obviously its a different company so GPT-4 might have taken a different approach. Choice quote from the paper:
Inputs such as images and state estimates are embedded into the same latent embedding as language tokens and processed by the self-attention layers of a Transformer-based LLM in the same way as text
I object to such a [change in metric]/[change in time] calculation, in which case I’m still at fault for my phrasing using the terminology of speed. Maybe I should have said “is continuing without hitting a wall”.
My main objection, as described by yourself in other comments, is that the choice of metric matters a great deal. In particular, even if log(KL divergence) continues (sub)linearly, the metrics we actually care about, like “is it smarter than a human” or “how much economic activity can be done by this AI” may be a nonlinear function of log(KL divergence) and may not be slowing down.
I think if I’m honest with myself, I made that statement based on the very non-rigorous metric “how many years do I feel like we have left until AGI”, and my estimate of that has continued to decrease rapidly.
In transformers the compute cost for context length n of a part of the attention mechanism, which itself is only a part of the transformer architecture, grows at O(n^2), so for the transformer itself this is only true in the limit.
This is true, and a useful corrective. I’ll edit the post to make this clear.
In fact, I think that as models are scaled, the attention mechanism becomes an ever smaller part of the overall compute cost (empirically, i.e. I saw a table to that effect, you could certainly scale differently), so with model scaling you get more and more leeway to increase the context length without impacting compute (both training and inference) cost too much.
I’d love to learn more about this, do you remember where you saw that table?
That’s true, but for the long run behavior, the more expensive dense attention layers should still dominate, I think.
Yep, exactly as you explain in your edit!
GPT-4: What we (I) know about it
This non-news seems like it might be the biggest news in the announcement? OpenAI is saying “oops publishing everything was too open, its gonna be more of a black box now”.
I think these meet your criterion of starting solely with anti-good characters:
Cecil from FF4 starts as a literal dark knight before realizing he’s working for an evil empire, becoming a paladin, and saving the world.
John Preston from Equilibrium (the protagonist, played by Christian Bale) is a fascist secret police agent until he accidentally feels emotion, then realizes that anti-emotion fascism is bad and overthrows it.
Megamind from Megamind is a supervillain who realizes that actually he should be a hero. (Maybe this shouldn’t count because there’s initially a superhero? But the protagonist is Megamind throughout.)
Grace from Infinity Train season 3 starts as a cult leader trying to maximize the in-universe utility function (literally!), but got the sign wrong so she’s absolutely terrible. But she meets a small child and realizes she’s terrible and works to overcome that.
Gru from Despicable Me starts out a supervillain but eventually becomes a loving father and member of the “Anti-Villain League”.
Joel from The Last of Usis a murderer in the post-apocalypse who is redeemed by finding a surrogate daughter figure and at the end of the story…I have been advised this is not a suitable role-model for an AI, please disregard.
Some themes of such redemption stories (safety implications left to the reader):
Adopting one or more children (1, 4, 5, 6)
Having an even eviler version of yourself to oppose (2, 3, 4, 5)
So it’s still not clear to me how much they delayed bc they had to, versus how much (if at all) they did due to the forecasters and/or acceleration considerations.
Yeah, completely agree.
I think “finished training” is the next-token prediction pre-training, and what they did since August is the fine-tuning and the RLHF + other stuff.
This seems most likely? But if so, I wish openai had used a different phrase, fine-tuning/RLHF/other stuff is also part of training (unless I’m badly mistaken), and we have this lovely phrase “pre-training” that they could have used instead.
On page 2 of the system card it says:
Since it [GPT-4] finished training in August of 2022, we have been evaluating, adversarially testing, and iteratively improving the model and the system-level mitigations around it.
(Emphasis added.) This coincides with the “eight months” of safety research they mention. I wasn’t aware of this when I made my original post so I’ll edit it to be fairer.
But this itself is surprising: GPT-4 was “finished training” in August 2022, before ChatGPT was even released! I am unsure of what “finished training” means here—is the released model weight-for-weight identical to the 2022 version? Did they do RLHF since then?
Gonna pull out one bit from the technical report, section 2.12:
2.12 Acceleration
OpenAI has been concerned with how development and deployment of state-of-the-art systems like GPT-4 could affect the broader AI research and development ecosystem.23 One concern of particular importance to OpenAI is the risk of racing dynamics leading to a decline in safety standards, the diffusion of bad norms, and accelerated AI timelines, each of which heighten societal risks associated with AI. We refer to these here as acceleration risk.”24 This was one of the reasons we spent eight months on safety research, risk assessment, and iteration prior to launching GPT-4. In order to specifically better understand acceleration risk from the deployment of GPT-4, we recruited expert forecasters25 to predict how tweaking various features of the GPT-4 deployment (e.g., timing, communication strategy, and method of commercialization) might affect (concrete indicators of) acceleration risk. Forecasters predicted several things would reduce acceleration, including delaying deployment of GPT-4 by a further six months and taking a quieter communications strategy around the GPT-4 deployment (as compared to the GPT-3 deployment). We also learned from recent deployments that the effectiveness of quiet communications strategy in mitigating acceleration risk can be limited, in particular when novel accessible capabilities are concerned.We also conducted an evaluation to measure GPT-4’s impact on international stability and to identify the structural factors that intensify AI acceleration. We found that GPT-4’s international impact is most likely to materialize through an increase in demand for competitor products in other countries. Our analysis identified a lengthy list of structural factors that can be accelerants, including government innovation policies, informal state alliances, tacit knowledge transfer between scientists, and existing formal export control agreements.
Our approach to forecasting acceleration is still experimental and we are working on researching and developing more reliable acceleration estimates.
My analysis:
They’re very aware of arms races conceptually, and say they dislike arms races for all the right reasons (“One concern of particular importance to OpenAI is the risk of racing dynamics leading to a decline in safety standards, the diffusion of bad norms, and accelerated AI timelines, each of which heighten societal risks associated with AI.”)
They considered two mitigations to race dynamics with respect to releasing GPT-4:
“Quiet communications”, which they didn’t pursue because that didn’t work for ChatGPT (“We also learned from recent deployments that the effectiveness of quiet communications strategy in mitigating acceleration risk can be limited, in particular when novel accessible capabilities are concerned.”)
“Delaying deployment of GPT-4 by a further six months” which they didn’t pursue because ???? [edit: I mean to say they don’t explain why this option wasn’t chosen, unlike the justification given for not pursuing the “quiet communications” strategy. If I had to guess it was reasoning like “well we already waited 8 months, waiting another 6 offers a small benefit, but the marginal returns to delaying are small.”]
There’s a very obvious gap here between what they are saying they are concerned about in terms of accelerating potentially-dangerous AI capabilities, and what they are actually doing.
I do think “most people don’t understand lovecraftian mythology and are likely to be misunderstanding this meme” is totally a reasonable argument.
I think I’ll retreat to this since I haven’t actually read the original lovecraft work. But also, once enough people have a misconception, it can be a bad medium for communication. (Shoggoths are also public domain now, so don’t force my hand.)
There is a “true nature” – it’s “whatever processes turn out to be predicting the next token”.
I’d agree to this in the same way the true nature of our universe is the laws of physics. (Would you consider the laws of physics a shoggoth?) My concern when people jump from that to “oh so there’s a God (of Physics)”.
I think the crux of my issue is what our analogy says to answer the question “a powerful LLM could coexist with humanity” (I lean towards yes). When people read shoggoths in the non-canonical way as “a weird alien” I think they conclude no. But if its more like a physics simulator or a pile of masks, then as long as you have it simulating benign things or wearing friendly masks, the answer is yes. I’ll leave it to someone who actually read lovecraft to say whether humanity could coexist with canonical shoggoths :)
But I’m asking if an LLM even has a “true nature”, if (as Yudkowsky says here) there’s an “actual shoggoth” with “a motivation Z”. Do we have evidence there is such a true nature or underlying motivation? In the alternative “pile of masks” analogy its clear that there is no privileged identity in the LLM which is the “true nature”, whereas the “shoggoth wearing a mask” analogy makes it seem like there is some singular entity behind it all.
To be clear, I think “shoggoth with a mask” is still a good analogy in that it gets you a lot of understanding in few words, I’m just trying to challenge one implication of the analogy that I don’t think people have actually debated.
I appreciate the clarification, and I’ll try to keep that distinction in mind going forward! To rephrase my claim in this language, I’d say that an LLM as a whole does not have a behavioral goal except for “predict the next token”, which is not a sufficiently descriptive as a behavioral goal to answer a lot of questions AI researchers care about (like “is the AI safe?”). In contrast, the simulacra the model produces can be much better described by more precise behavioral goals. For instance, one might say ChatGPT (with the hidden prompt we aren’t shown) has a behavioral goal of being a helpful assistant, or an LLM roleplaying as a paperclip maximizer has the behavioral goal of producing a lot of paperclips. But an LLM as a whole could contain simulacra that have all those behavioral goals and many more, and because of that diversity they can’t be well-described by any behavioral goal more precise than “predict the next token”.
I think we are pretty much on the same page! Thanks for the example of the ball-moving AI, that was helpful. I think I only have two things to add:
Reward is not the optimization target, and in particular just because an LLM was trained by changing it to predict the next token better, doesn’t mean the LLM will pursue that as a terminal goal. During operation an LLM is completely divorced from the training-time reward function, it just does the calculations and reads out the logits. This differs from a proper “goal” because we don’t need to worry about the LLM trying to wirehead by feeding itself easy predictions. In contrast, if we call up
To the extent we do say the LLM’s goal is next token prediction, that goal maps very unclearly onto human-relevant questions such as “is the AI safe?”. Next-token prediction contains multitudes, and in OP I wanted to push people towards “the LLM by itself can’t be divorced from how it’s prompted”.
On the question of whether it’s really a mind, I’m not sure how to tell. I know it gets really low loss on this really weird and hard task and does it better than I do. I also know the task is fairly universal in the sense that we could represent just about any task in terms of the task it is good at. Is that an intelligence? Idk, maybe not? I’m not worried about current LLMs doing planning. It’s more like I have a human connectnome and I can do one forward pass through it with an input set of nerve activations. Is that an intelligence? Idk, maybe not?
I think we’re largely on the same page here because I’m also unsure of how to tell! I think I’m asking for someone to say what it means for the model itself to have a goal separate from the masks it is wearing, and show evidence that this is the case (rather than the model “fully being the mask”). For example, one could imagine an AI with the secret goal “maximize paperclips” which would pretend to be other characters but always be nudging the world towards paperclipping, or human actors who perform in a way supporting the goal “make my real self become famous/well-paid/etc” regardless of which character they play. Can someone show evidence for the LLMs having a “real self” or a “real goal” that they work towards across all the characters they play?
I think I don’t understand your last question.
I suppose I’m trying to make a hypothetical AI that would frustrate any sense of “real self” and therefore disprove the claim “all LLMs have a coherent goal that is consistent across characters”. In this case, the AI could play the “benevolent sovereign” character or the “paperclip maximizer” character, so if one claimed there was a coherent underlying goal I think the best you could say about it is “it is trying to either be a benevolent sovereign or maximize paperclips”. But if your underlying goal can cross such a wide range of behaviors it is practically meaningless! (I suppose these two characters do share some goals like gaining power, but we could always add more modes to the AI like “immediately delete itself” which shrinks the intersection of all the characters’ goals.)
I agree that’s basically what happened, I just wanted to cleave off one way in which the shoggoth analogy breaks down!
I agree that to the extent there is a shoggoth, it is very different than the characters it plays, and an attempted shoggoth character would not be “the real shoggoth”. But is it even helpful to think of the shoggoth as being an intelligence with goals and values? Some people are thinking in those terms, e.g. Eliezer Yudkowsky saying that “the actual shoggoth has a motivation Z”. To what extent is the shoggoth really a mind or an intelligence, rather than being the substrate on which intelligences can emerge? And to get back to the point I was trying to make in OP, what evidence do we have that favors the shoggoth being a separate intelligence?
To rephrase: behavior is a function of the LLM and prompt (the “mask”), and with the correct LLM and prompt together we can get an intelligence which seems to have goals and values. But is it reasonable to “average over the masks” to get the “true behavior” of the LLM alone? I don’t think that’s necessarily meaningful since it would be so dependent on the weighting of the average. For instance, if there’s an LLM-based superintelligence that becomes a benevolent sovereign (respectively, paperclips the world) if the first word of its prompt has an even (respectively, odd) number of letters, what would be the shoggoth there?
A few comments:
Thanks for writing this! I’ve had some similar ideas kicking around my head but it’s helpful to see someone write them up like this.
I think token deletion is a good thing to keep in mind, but I think it’s not correct to say you’re always deleting the token in position 1. In the predict-next-token loop it would be trivial to keep some prefix around, e.g. 1,2,3,4 → 1,2,4,5 → 1, 2, 5, 6 → etc. I assume that’s what ChatGPT does, since they have a hidden prompt and if you could jailbreak it by just overflowing the dialogue box, the DAN people would presumably have found that exploit already. While this is on some level equivalent to rolling the static prefix into the next token prediction term, I think the distinction is important because it means we could actually be dealing with a range of dynamics depending on the prefix.
Editing to provide an example: in your {F, U, N} example, add another token L (for Luigi), which is never produced by the LLM but if L is ever in the context window the AI behaves as Luigi and predicts F 100% of the time. If you trim the context window as you describe, any L will eventually fall out of the context window and the AI will then tend towards Waluigi as you describe. But if you trim from the second token, the sequence (L, F, F, F, F, …, F) is stable. Perhaps the L token could be analogous to the <|good|> token described here.
Nitpicking: Your “00000000000000000....” prompt doesn’t actually max out the prompt window because sixteen 0s can combine into a single token. You can see this at the GPT tokenizer.