Engineer at CoinList.co. Donor to LW 2.0.
I’m not necessarily putting a lot of stock in my specific explanations but it would be a pretty big surprise to learn that it turns out they’re really the same.
Does it seem to you that the kinds of people who are good at science vs good at philosophy (or the kinds of reasoning processes they use) are especially different?
In your own case, it seems to me like you’re someone who’s good at philosophy, but you’re also good at more “mundane” technical tasks like programming and cryptography. Do you think this is a coincidence?
I would guess that there’s a common factor of intelligence + being a careful thinker. Would you guess that we can mechanize the intelligence part but not the careful thinking part?
Happiness has been shown to increase with income up to a certain threshold ($ 200K per year now, roughly speaking), beyond which the effect tends to plateau.
Do you have a citation for this? My understanding is that it’s a logarithmic relationship — there’s no threshold. (See the Income & Happiness section here.)
Why antisocial? I think it’s great!
I would imagine one of the major factors explaining Tesla’s absence is that people are most worried about LLMs at the moment, and Tesla is not a leader in LLMs.
(I agree that people often seem to overlook Tesla as a leader in AI in general.)
I don’t know anything about the ‘evaluation platform developed by Scale AI—at the AI Village at DEFCON 31’.
Looks like it’s this.
Here are some predictions—mostly just based on my intuitions, but informed by the framework above. I predict with >50% credence that by the end of 2025 neural nets will:
To clarify, I think you mean that you predict each of these individually with >50% credence, not that you predict all of them jointly with >50% credence. Is that correct?
I’d like to see open-sourced evaluation and safety tools. Seems like a good thing to push on.
My model here is something like “even small differences in the rate at which systems are compounding power and/or intelligence lead to gigantic differences in absolute power and/or intelligence, given that the world is moving so fast.”Or maybe another way to say it: the speed at which a given system can compound it’s abilities is very fast, relative to the rate at which innovations diffuse through the economy, for other groups and other AIs to take advantage of.
My model here is something like “even small differences in the rate at which systems are compounding power and/or intelligence lead to gigantic differences in absolute power and/or intelligence, given that the world is moving so fast.”
Or maybe another way to say it: the speed at which a given system can compound it’s abilities is very fast, relative to the rate at which innovations diffuse through the economy, for other groups and other AIs to take advantage of.
I’m a bit skeptical of this. While I agree that small differences in growth rates can be very meaningful, I think it’s quite difficult to maintain a growth rate faster than the rest of the world for an extended period of time.
Growth and Trade
The reason is that: growth is way easier if you engage in trade. And assuming that gains from trade are shared evenly, the rest of the world profits just as much (in absolute terms) as you do from any trade. So you can only grow significantly faster than the rest of the world while you’re small relative to the size of the whole world.
To give a couple of illustrative examples:
The “Asian Tigers” saw their economies grow faster than GWP during the second half of the 20th century because they were engaged in “catch-up” growth. Once their GDP per capita got into the same ballpark as other developed countries, they slowed down to a similar growth rate to those countries.
Tesla has grown revenue at an average of 50% per year for 10 years. That’s been possible because they started out as a super small fraction of all car sales, and there were many orders of magnitude of growth available. I expect them to continue growing at something close to that rate for another 5-10 years, but then they’ll slow down because the global car market is only so big.
Growth without Trade
Now imagine that you’re a developing nation, or a nascent car company, and you want to try to grow your economy, or the number of cars you make, but you’re not allowed to trade with anyone else.
For a nation it sounds possible, but you’re playing on super hard mode. For a car company it sounds impossible.
This suggests to me the following hypotheses:
Any entity that tries to grow without engaging in trade is going to be outcompeted by those that do trade, but
Entities that grow via trade will have their absolute growth capped at the size of the absolute growth of the rest of the world, and thus their growth rate will max out at the same rate as the rest of the world, once they’re an appreciable fraction of the global economy.
I don’t think these hypotheses are necessarily true in every case, but it seems like they would tend to be true. So to me that makes a scenario where explosive growth enables an entity to pull away from the rest of the world seem a bit less likely.
Ah, good point!
I suppose a possible mistake in this analysis is that I’m treating Moore’s law as the limit on compute growth rates, and this may not hold once we have stronger AIs helping to design and fabricate chips.
Even so, I think there’s something to be said for trying to slowly close the compute overhang gap over time.
0.2 OOMs/year was the pre-AlexNet growth rate in ML systems.
I think you’d want to set the limit to something slightly faster than Moore’s law. Otherwise you have a constant large compute overhang.
Ultimately, we’re going to be limited by Moore’s law (or its successor) growth rates eventually anyway. We’re on a kind of z-curve right now, where we’re transitioning from ML compute being some small constant fraction of all compute to some much larger constant fraction of all compute. Before the transition it grows at the same speed as compute in general. After the transition it also grows at the same speed as compute in general. In the middle it grows faster as we rush to spend a much larger share of GWP on it.
From that perspective, Moore’s law growth is the minimum growth rate you might have (unless annual spend on ML shrinks). And the question is just whether you transition from the small constant fraction of all compute to the large constant fraction of all compute slowly or quickly.
Trying to not do the transition at all (i.e. trying to growing at exactly the same rate as compute in general) seems potentially risky, because the resulting constant compute overhang means it’s relatively easy for someone somewhere to rush ahead locally and build something much better than SOTA.
If on the other hand, you say full steam ahead and don’t try to slow the transition at all, then on the plus side the compute overhang goes away, but on the minus side, you might rush into dangerous and destabilizing capabilities.
Perhaps a middle path makes sense, where you slow the growth rate down from current levels, but also slowly close the compute overhang gap over time.
See my edit to my comment above. Sounds like GPT-3 was actually 250x more compute than GPT-2. And Claude / GPT-4 are about 50x more compute than that? (Though unclear to me how much insight the Anthropic folks had into GPT-4′s training before the announcement. So possible the 50x number is accurate for Claude and not for GPT-4.)
Doesn’t this part of the comment answer your question?
We can very easily “grab probability mass” in relatively optimistic worlds. From our perspective of assigning non-trivial probability mass to the optimistic worlds, there’s enormous opportunity to do work that, say, one might think moves us from a 20% chance of things going well to a 30% chance of things going well. This makes it the most efficient option on the present margin.
It sounds like they think it’s easier to make progress on research that will help in scenarios where alignment ends up being not that hard. And so they’re focusing there because it seems to be highest EV.
Seems reasonable to me. (Though noting that the full EV analysis would have to take into account how neglected different kinds of research are, and many other factors as well.)
Better meaning more capability per unit of compute? If so, how can we be confident that it’s better than Chinchilla?
I can see an argument that it should be at least as good — if they were throwing so much money at it, they would surely do what is currently known best practice. But is there evidence to suggest that they figured out how to do things more efficiently than had ever been done before?
Can we adopt a norm of calling this Safe.ai? When I see “CAIS”, I think of Drexler’s “Comprehensive AI Services”.
Still, this advance seems like a less revolutionary leap over GPT-3 than GPT-3 was over GPT-2, if Bing’s early performance is a decent indicator.
Seems like this is what we should expect, given that GPT-3 was 100x as big as GPT-2, whereas GPT-4 is probably more like ~10x as big as GPT-3. No?
EDIT: just found this from Anthropic:
We know that the capability jump from GPT-2 to GPT-3 resulted mostly from about a 250x increase in compute. We would guess that another 50x increase separates the original GPT-3 model and state-of-the-art models in 2023.
There already are general AIs. They just are not powerful enough yet to count as True AGIs.
Can you say what you have in mind as the defining characteristics of a True AGI?
It’s becoming a pet peeve of mine how often people these days use the term “AGI” w/o defining it. Given that, by the broadest definition, LLMs already are AGIs, whenever someone uses the term and means to exclude current LLMs, it seems to me that they’re smuggling in a bunch of unstated assumptions about what counts as an AGI or not.
Here are some of the questions I have for folks that distinguish between current systems and future “AGI”:
Is it about just being more generally competent (s.t. GPT-X will hit the bar, if it does a bit better on all our current benchmarks, w/o any major architectural changes)?
Is it about being always on, and having continual trains of thought, w/ something like long-term memory, rather than just responding to each prompt in isolation?
Is it about being formulated more like an agent, w/ clearly defined goals, rather than like a next-token predictor?
If so, what if the easiest way to get agent-y behavior is via a next-token (or other sensory modality) predictor that simulates an agent — do the simulations need to pass a certain fidelity threshold before we call it AGI?
What if we have systems with a hodge-podge of competing drives (like a The Sims character) and learned behaviors, that in any given context may be more-or-less goal-directed, but w/o a well-specified over-arching utility function (just like any human or animal) — is that an AGI?
Is it about being superhuman at all tasks, rather than being superhuman at some and subhuman at others (even though there’s likely plenty of risk from advanced systems well before they’re superhuman at absolutely everything)?
Given all these ambiguities, I’m tempted to suggest we should in general taboo “AGI”, and use more specific phrases in its place. (Or at least, make a note of exactly which definition we’re using if we do refer to “AGI”.)
I don’t think it’s ready for release, in the sense of “is releasing it a good idea from Microsoft’s perspective?”.
You sure about that?
EDIT: to clarify, I don’t claim that this price action is decisive. Hard to attribute price movements to specific events, and the market can be wrong, especially in the short term. But it seems suggestive that the market likes Microsoft’s choice.