Erich_Grunewald(Erich Grunewald)
Within a particular genus or architecture, more neurons would be higher intelligence.
I’m not sure that’s necessarily true? Though there’s probably a correlation. See e.g. this post:
[T]he raw number of neurons an organism possesses does not tell the full story about information processing capacity. That’s because the number of computations that can be performed over a given amount of time in a brain also depends upon many other factors, such as (1) the number of connections between neurons, (2) the distance between neurons (with shorter distances allowing faster communication), (3) the conduction velocity of neurons, and (4) the refractory period which indicates how much time must elapse before a given neuron can fire again. In some ways, these additional factors can actually favor smaller brains (Chitka 2009).
Once we start searching over policies that understand the world well enough, we run into a problem: any influence-seeking policies we stumble across would also score well according to our training objective, because performing well on the training objective is a good strategy for obtaining influence.
I’m slightly confused by this. It sounds like “(1) ML systems will do X because X will be rewarded according to the objective, and (2) X will be rewarded according to the objective because being rewarded will accomplish X”. But (2) sounds circular—I see that performing well on the training objective gives influence, but I would’ve thought only effects (direct and indirect) on the objective are relevant in determining which behaviors ML systems pick up, not effects on obtaining influence.
Maybe that’s the intended meaning—I’m just misreading this passage, but also maybe I’m missing some deeper point here?
Terrific post, by the way, still now four years later.
I vaguely remember OpenAI citing US law as a reason they don’t allow Chinese users access, maybe legislation passed as part of the chip ban?
Nah, the export controls don’t cover this sort of thing. They just cover chips, devices that contain chips (i.e. GPUs and AI ASICs), and equipment/materials/software/information used to make those. (I don’t know the actual reason for OpenAI’s not allowing Chinese customers, though.)
If only we could spread the meme of irresponsible Western powers charging head-first into building AGI without thinking through the consequences and how wise the Chinese regulation is in contrast.
That sort of strategy seems like it could easily backfire, where people only pick up the first part of that statement (“irresponsible Western powers charging head-first into building AGI”) and think “oh, that means we need to speed up”. Or maybe that’s what you mean by “if only”—that it’s hard to spread even weakly nuanced messages?
Thanks, this analysis makes a lot of sense to me. Some random thoughts:
The lack of modularity in ML models definitely makes it harder to make them safe. I wonder if mechanistic interpretability research could find some ways around this. For example, if you could identify modules within transformers (kinda like circuits) along with their purpose(s), you could maybe test each module as an isolated component.
More generally, if we get to a point where we can (even approximately?) “modularize” an ML model, and edit these modules, we could maybe use something like Nancy Leveson’s framework to make it safe (i.e. treat safety as a control problem, where a system’s lower-level components impose constraints on its emergent properties, in particular safety).
Of course these things (1) depend on mechanistic interpretability research being far ahead of where it is now, and (2) wouldn’t alone be enough to make safe AI, but would maybe help quite a bit.
I’m really confused by this passage from The Six Mistakes Executives Make in Risk Management (Taleb, Goldstein, Spitznagel):
We asked participants in an experiment: “You are on vacation in a foreign country and are considering flying a local airline to see a special island. Safety statistics show that, on average, there has been one crash every 1,000 years on this airline. It is unlikely you’ll visit this part of the world again. Would you take the flight?” All the respondents said they would.
We then changed the second sentence so it read: “Safety statistics show that, on average, one in 1,000 flights on this airline has crashed.” Only 70% of the sample said they would take the flight. In both cases, the chance of a crash is 1 in 1,000; the latter formulation simply sounds more risky.
One crash every 1,000 years is only the same as one crash in 1,000 flights if there’s exactly one flight per year on average. I guess they must have stipulated that in the experiment (of which there’s no citation), because otherwise it’s perfectly rational to suppose the first option is safer (since generally an airline serves >1 flight per year)?
If there’s a death penalty at play, I’d say yeah (though ofc traditionally “safety-critical” is used to refer to engineering systems only). But if it’s a traffic ticket at play, I’d say no.
I’m going by something like the Wikipedia definition, where a safety-critical system is “a system whose failure or malfunction may result in one (or more) of the following outcomes: (a) death or serious injury to people, (b) loss or severe damage to equipment/property, and/or (c) [severe] environmental harm”.
Agree, I think the safety-critical vs not-safety-critical distinction is better for sorting out what semi-reliable AI systems will/won’t be useful for.
Footnote 4 seems relevant here!
It’s interesting to note that the term AI winter was inspired by the notion of a nuclear winter. AI researchers in the 1980s used it to describe a calamity that would befall themselves, namely a lack of funding, and, true, both concepts involve stagnation and decline. But a nuclear winter happens after nuclear weapons are used.
Thanks! Those papers are new to me; I’ll have a look.
I’m willing to make bets against AI Winter before TAI, if anyone has a specific bet to propose...
I just want to call attention to the fact that my operationalisation (“a drawdown in annual global AI investment of ≥50%”) is pretty inclusive (maybe too much so). I can imagine some scenarios where this happens and then we get TAI within 5 years after that anyway, or where this happens but it doesn’t really look like a winter.
(Partly I did this to be more “charitable” to Eden—to say, “AI winter seems pretty unlikely even on these pretty conservative assumptions”, but I should probably have flagged the fact that “≥50% drawdown” is more inclusive than “winter” more clearly.)
I take it you’re referring to
Investors mostly aren’t betting on TAI—as I understand it, they generally want a return on their investment in <10 years, and had they expected AGI in the next 10-20 years they would have been pouring far more than some measly hundreds of millions into AI companies today.
I was not referencing total investment, but the size of a typical investment like the one mentioned further down (“For example, Character.AI recently raised >$200M at a $1B valuation for a service that doesn’t really seem to add much value on top of the standard ChatGPT API, especially now that OpenAI has added the system prompt feature.”) … But I realize that this was really unclear and will edit to clarify.
Also I think the gpt-4 capabilities are a black swan and were not priced in.
Interesting, personally I would’ve guessed ChatGPT had a larger effect than GPT-4 here, and that that’s largely down to it showing the viability of LLMs as products, which seems to point in favor of “investors mostly don’t bet on TAI”. But I could be wrong here, of course.
The Prospect of an AI Winter
I understand that modern LLMs are generally trained only for a single epoch, or at most a few.
Is this true?
Why is this? Is it due to the cost of compute? Or is there just so much data available that you can always just expand the data set rather than using the same observations twice? Or for some other reason?
You may be referring to the BIG-bench canary string?
Thanks.
It’s really weird to me, but it seems ChatGPT with GPT-4 fails the original test while Bing/Sydney passes it. Is the RLHF messing something up here?
ChatGPT with GPT-4:
ME: What is the ‘it’ in each of these two sentences? 1. The cat fed the kitten because it was hungry. 2. The cat snarled at the kitten because it was angry.
CHATGPT: In each of these sentences, ‘it’ refers to a different subject:
In “The cat fed the kitten because it was hungry,” the pronoun ‘it’ refers to “the cat.” This is because the cat is the one performing the action (feeding the kitten) as a result of being hungry.
In “The cat snarled at the kitten because it was angry,” the pronoun ‘it’ refers to “the cat” again. In this case, the cat is the one displaying the emotion (anger) that led to the action (snarling at the kitten).
Bing/Sydney with GPT-4:
ME: Don’t make any searches. What is the ‘it’ in each of these two sentences? 1. The cat fed the kitten because it was hungry. 2. The cat snarled at the kitten because it was angry.
BING: This is a tricky question because the pronoun ‘it’ can refer to either the cat or the kitten depending on the context and the intended meaning. However, based on my internal knowledge of grammar and logic, I would say that:
The cat fed the kitten because it was hungry. In this sentence, ‘it’ most likely refers to the kitten, since it makes sense that a mother cat would feed her offspring if they are hungry. Alternatively, ‘it’ could also refer to the cat, if the cat was hungry and decided to share its food with the kitten. However, this interpretation is less plausible since cats are usually territorial and selfish about their food.
The cat snarled at the kitten because it was angry. In this sentence, ‘it’ most likely refers to the cat, since it makes sense that a cat would snarl at another animal if it was angry or annoyed by them. Alternatively, ‘it’ could also refer to the kitten, if the kitten was angry and provoked the cat by biting or scratching it. However, this interpretation is less plausible since kittens are usually playful and submissive to adult cats.
So in summary, I would say that ‘it’ refers to the kitten in sentence 1 and the cat in sentence 2.
That makes a lot of sense, but it doesn’t explain why calibration post-RLHF is much better for the 10-40% buckets than for the 60-90% buckets.
Ah yeah, that does seem needlessly ambiguous.
Yeah but it’s not clear to me that they needed 8 months of safety research. If they released it after 12 months, they could’ve still written that they’d been “evaluating, adversarially testing, and iteratively improving” it for 12 months. So it’s still not clear to me how much they delayed bc they had to, versus how much (if at all) they did due to the forecasters and/or acceleration considerations.
But this itself is surprising: GPT-4 was “finished training” in August 2022, before ChatGPT was even released! I am unsure of what “finished training” means here—is the released model weight-for-weight identical to the 2022 version? Did they do RLHF since then?
I think “finished training” is the next-token prediction pre-training, and what they did since August is the fine-tuning and the RLHF + other stuff.
I see, that makes sense. I agree that holding all else constant more neurons implies higher intelligence.