We do not assume that humans are superior to AI in any way, or that neurons are superior to transistors. Similarly we do not claim that an AI CEO would be inferior to a human one. Rather we only claim that it would not dominate a human CEO as an AI chess player is to a human chess player. Note that currently, CEOs are usually not the smartest employees in their company, but that does not mean that they are the peripheral of their smartest engineers.
boazbarak
Yes, we usually select our leaders (e.g., presidents) not for their cognitive abilities but literally for how “aligned “ we believe they are with our interest. Even if we completely solve the alignment problem, AI would likely face an uphill battle in overcoming prejudice and convincing people that they are as aligned as an alternative human. As the saying goes for many discriminated groups, they would have to be twice as good to get to the same place.
Even if you assume that intelligence is distributed normally, why aren’t we selecting CEOs from the right tail of that distribution today?
Re myopic, I think that possibly, a difference between my view and at least some people’s is that rather than seeing being myopic as a property that we would have to be ensured by regulation or the goodness of the AI creator’s heart, I view it as the default. I think the biggest bang for the buck in AI would be to build systems with myopic training objectives and use them to achieve myopic tasks, where they produce some discrete output/product that can be evaluated on its own merits. I see AI as more doing tasks such as “find security flaws in software X and provide me exploit code as verification” than “chart a strategy for the company that would maximize its revenues over the next decade”.
The table we quote suggests that CEOs are something like only one standard deviation above the mean. This is not surprising: at least my common sense suggests that scientists and mathematicians should have on average greater skills of the type measured by IQ than CEOs, despite the latter’s decisions being more far reaching and their salary’s being higher.
Of course we cannot rule out that there is some “phase transition “ and while IQ 140 is not much better than IQ 120 for being a CEO, something happens with IQ 1000 (or whatever the equivalent).
We argue why we do not expect such a phase transition. (In the sense that at least in computation, there is only one phase transition to universality and after passing it, the system is not bottlenecks by the complexity of any one unit.)
However I agree that we cannot rule it out. We’re just pointing out that there isn’t evidence for that, in contrast to the ample evidence for the usefulness of information processing for medium term tasks.
Thanks for your comments! Some quick responses:
I agree that extracting short-term modules from long-term modules is very much an open question. However, it may well be that our main problem would be the opposite: the systems would be trained already with short-term goals, and so we just want to make sure that they don’t accidentally develop a long-term goal in the process (this may be related to your mechanisms posts, which I will respond to separately)
I do think that there is a sense in which, in a chaotic world, some “greedy” or simple heuristics end up to be better than ultra complex ones. In Chess you could sacrifice a Queen in order to get some advantage much later on, but in business, while you might sacrifice one metric (e.g., profit) to maximize another (e.g. growth), you need to make some measurable progress. If we think of cognitive ability as the ability to use large quantities of data and perform very long chains of reasonings on them, then I do believe these are more needed for scientists or engineers than for CEOs. (In an earlier draft we also had another example for the long-term benefits of simple strategies: the fact that the longest-surviving species are simple ones such as cockroaches, crocodiles etc. , but Ben didn’t like it :) )
I agree deterrence is very problematic, but prevention might be feasible. For example, while AI would greatly increase the capabilities for hacking, it would also increase the capabilities to harden our systems. In general, I find research on prevention to be more attractive than alignment since it also applies to the scenario (more likely in my view) of malicious humans using AI to cause massive harm. It also doesn’t require us to speculate about objects (long-term planning AIs) that don’t yet exist.
How many standard deviations? My (admittedly only partially justified) guess is that there are diminishing returns to being (say) three standard deviations above the mean compared to two in a CEO position as opposed to (say) a mathematician. (Not that IQ is perfectly correlated with math success either.)
There are only (by definition) 100 CEOs of Fortune 100 companies, so a priori, they could have an IQ score of the top 100 humans which (assuming a normal distribution) would be at least 4 standard deviations above the mean (see here).
It is indeed the case that sometimes we see phase transitions / discontinuous improvements, and this is an area which I am very interested in. Note however that (while not in our paper) typically in graphs such as BIG-Bench, the X axis is something like log number of parameters. So it does seem you pay quite a price to achieve improvement.
The claim there is not so much about the shape of the laws but rather about potential (though as you say, not certain at all) limitations as to what improvements you can achieve through pure software alone, without investing more compute and/or data. Some other (very rough) calculations of costs are attempted in my previous blog post.
I’m trying to understand this example. The way I would think of a software writing AI would be the following: after some pretraining we fine tune an AI on prompts explains the business task, the output being the software, and the objective related to various outcome measures.
Then we deploy it. It is not clear that we want to keep fine tuning after deployment. It does clearly raise issues of overfitting and could lead to issues such as the “blah blah blah…” example mentioned in the post. (E.g. if you’re writing the testing code for your future code, you might want to “take the hit” and write bad tests that would be easy to pass.) Also, as we mention, the more compute and data invested during training, the less we expect there to be much “on the job training”. The AI would be like a consultant that had thousands of years of software writing experience that is coming to do a particular project.
Let me try to make things more concrete. We are a company that is deploying a service, in which our ultimate goal might be to maximize our profit a decade from now (or maybe more accurately, maximize people’s perception of our future profit, which corresponds to our current stock price...).
My take is that while the leaders of the company might chart a strategy towards this far-off goal, they would set concrete goals for the software developers which correspond to very clear metrics. That is, the process of implementing a new feature for the service would involve the following steps:
Proposing the feature, and claiming which metric it would improve (e.g., latency on the website, click-through rate for ads, satisfaction with service, increasing users, etc...). Crucially, these metrics are simple and human-interpretable, since the assumption is that in a chaotic world, we cannot have “3D chess” type of strategies, and rather each feature should make some clear progress in some measure.
Writing code for the feature.
Reviewing and testing the code.
Deploying it (possibly with A/B testing)
Evaluating the deployment
AIs might be involved in all of these steps, but it would not be one coherent AI that does everything and whose goal is to eventually make the managers happy. Just as today we have different people doing these roles, so would different AIs be doing each one of these roles, and importantly, each one of them would have its own objective function that they are trying to maximize.
So, each one of these components would be separately, and in some sense trained adversarially (e.g., testing AI would be trained to maximize bugs found, while code writing AI would be trained to minimize them). Moreover, each one of them would be trained on its own giant corpus of data. If they are jointly trained (like in GANs) then indeed care must be taken that they are not collapsing into an undesirable equilibrium, but this is something that is well understood.
Would you agree that the current paradigm is almost in direct contradiction to long-term goals? At the moment, to a first approximation, the power of our systems is proportional to the logarithm of their number of parameters, and again to a first approximation, we need to take a gradient step per parameter in training. So what it means is that if we have 100 Billion parameters, we need to make 100 Billion iterations where we evaluate some objective/loss/reward value and adapt the system accordingly. This means that we better find some loss function that we can evaluate on a relatively time-limited and bounded (input, output) pair rather than a very long interaction.
Hi Vanessa,
Let me try to respond (note the claim numbers below are not the same as in the essay, but rather as in Vanessa’s comment):
Claim 1: Our claim is that one can separate out components—there is the predictable component which is non stationary, but is best approximated with a relatively simple baseline, and the chaotic component, which over the long run is just noise.In general, highly complex rules are more sensitive to noise (in fact, there are theorems along these lines in the field of Analysis of Boolean Functions), and so in the long run, the simpler component will dominate the accuracy.
Claim 2: Hacking is actually a fairly well-specified endeavor. People catalog, score, and classify security vulnerabilities. To hack would be to come up with a security vulnerability, and exploit code, which can be verified. Also, you seem to be envisioning a long-term AI that is then fine-tuned on a short-term task, but how did it evolve these long-term goals in the first place?
Claim 3: I would not say that there is no such thing as talent in being a CEO or presidents. I do however believe that the best leaders have been some combination of their particular characteristics and talents, and the situation they were in. Steve Jobs has led Apple to become the largest company in the world, but it is not clear that he is a “universal CEO” that would have done as good in any company (indeed he failed with NeXT). Similarly, Abraham Lincoln is typically ranked as the best U.S. president by historians, but again I think most would agree that he fit well the challenge that he had to face, rather than being someone that would have just as well handled the cold war or the 1970s energy crisis. Also, as Yafah points elsewhere here, for people to actually trust an AI with being the leader of a company or a country, it would need to not just be as good as humans or a little better, but better by a huge margin. In fact, most people’s initial suspicion is that AIs (or even humans that don’t look like them) is not “aligned” with their interests, and if you don’t convince them otherwise, their default would be to keep them from positions of power.
Claim 4: The main point is that we need to measure the powers of a system as a whole, not compare the powers of an individual human with an individual AI. Clearly, if you took a human, made their memory capacity 10 times bigger, and made their speed 10 times faster, then they could do more things. But we are comparing with the case that humans will be assisted with short-term AIs that would help them in all of the tasks that are memory and speed intensive.
As you probably imagine given my biography :) , I am never against any research, and definitely not for reasons of practical utility. So am definitely very supportive of research on alignment, and not claiming that it shouldn’t be done. In my view, one of the interesting technical questions is to what extent can long-term goals emerge from systems trained with short-term objectives, and (if it happens) whether we can prevent this while still keeping short-term performance as good. One reason I like the focus on the horizon rather than alignment with human values is that the former might be easier to define and argue about. But this doesn’t mean that we should not care about the latter.
I do not claim that AI cannot set long-term strategies. My claim is that this is not where AI’s competitive advantages over humans will be. I could certainly imagine that a future AI would be 10 times better than me in proving mathematical theorems. I am not at all sure it would be 10 times better than Joe Biden in being a U.S. president, and mostly it is because I don’t think that the information-processing capabilities are really the bottleneck for that job. (Though certainly, the U.S. as a whole, including the president, would benefit greatly from future AI tools, and it is quite possible that some of Biden’s advisors would be replaced by AIs.)
Thanks for so many comments! I do plan to read them carefully and respond, but it might take me a while. In the meantime, Scott Aaronson also has a relevant blog https://scottaaronson.blog/?p=6821
Happy thanksgiving to all who celebrate it!
Thanks! Some quick comments (though I think at some point we are getting to deep in threads that it’s hard to keep track..)
When saying that GAN training issues are “well understood” I meant that it is well understood that it is a problem, not that it’s well understood how to solve that problem…
One basic issue is that I don’t like to assign probabilities to such future events, and am not sure there is a meaningful way to distinguish between 75% and 90%. See my blog post on longtermism.
The general thesis is that when making long-term strategies, we will care about improving concrete metrics rather than thinking of very complex strategies that don’t make any measurable gains in the short term. So an Amazon Engineer would need to say something like “if we implement my code X then it would reduce latency by Y”, which would be a fairly concrete and measurable goal and something that humans could understand even if they couldn’t understand the code X itself or how it came up with it. This differs from saying something like “if we implement my code X, then our competitors would respond with X’, then we could respond with X″ and so on and so forth until we dominate the market”
When thinking of AI systems and their incentives, we should separate training, fine tuning, and deployment. Human engineers might get bonuses for their performance on the job, which corresponds to mixing “fine tuning” and “deployments”. I am not at all sure that would be a good idea for AI systems. It could lead to all kinds of over-optimization issues that would be clear for people without leading to doom. So we might want to separate the two and in some sense keep the AI disinterested about the code that it actually uses in deployment.
Quick comment (not sure it’s realted to any broader points): total compute for N models with 2M parameters is roughly 4NM^2 (since per Chinchilla, number of inference steps scales linearly with model size, and number of floating point operations also scales linearly, see also my calculations here). So an equal total compute cost would correspond to k=4.
What I was thinking when I said “power” is that it seems that in most BIG-Bench scales, if you put the y axis some measure of performance (e.g. accuracy) then it seems to scale as some linear or polynomial way in the log of parameters, and indeed I belive the graphs in that paper usually have log parameters in the X axis. It does seem that when we start to saturate performance (error tends to zero), the power laws kick in, and its more like inverse polynomial in the total number of parameters than their log.
Thanks! I guess one way to motivate our argument is that if the information-processing capabilities of humans were below the diminishing returns point, then we would have expect that individual humans with much greater than average information-processing capabilities to have distinct advantage in jobs such as CEOs and leaders. This doesn’t seem to be the case.
I guess that if the AI is deceptive and power-seeking but is not better at long-term planning than humans, then it basically becomes one more deceptive and power-seeking actor in a world that already has them, rather than completely dominate all other human agents.
I’ve written about the Meta AI paper on Twitter—actually its long-term component is a game engine which is not longer term than AlphaZero. The main innovation is combining such an engine with a language model.