I indeed believe that regulation should focus on deployment rather than on training.
boazbarak
See also my post https://www.lesswrong.com/posts/gHB4fNsRY8kAMA9d7/reflections-on-making-the-atomic-bomb
the Manhattan project was all about taking something that’s known to work in theory and solving all the Z_n’s
Reflections on “Making the Atomic Bomb”
There is a general phenomenon in tech that has been expressed many times of people over-estimating the short-term consequences and under-estimating the longer term ones (e.g., “Amara’s law”).
I think that often it is possible to see that current technology is on track to achieve X, where X is widely perceived as the main obstacle for the real-world application Y. But once you solve X, you discover that there is a myriad of other “smaller” problems Z_1 , Z_2 , Z_3 that you need to resolve before you can actually deploy it for Y.
And of course, there is always a huge gap between demonstrating you solved X on some clean academic benchmark, vs. needing to do so “in the wild”. This is particularly an issue in self-driving where errors can be literally deadly but arises in many other applications.
I do think that one lesson we can draw from self-driving is that there is a huge gap between full autonomy and “assistance” with human supervision. So, I would expect we would see AI be deployed as (increasingly sophisticated) “assistants’ way before AI systems actually are able to function as “drop-in” replacements for current human jobs. This is part of the point I was making here.
Some things like that already happened—bigger models are better at utilizing tools such as in-context learning and chain of thought reasoning. But again, whenever people plot any graph of such reasoning capabilities as a function of model compute or size (e.g., Big Bench paper) the X axis is always logarithmic. For specific tasks, the dependence on log compute is often sigmoid-like (flat for a long time but then starts going up more sharply as a function of log. compute) but as mentioned above, when you average over many tasks you get this type of linear dependence.
Ok drew it on the back now :)
Ok drew it on the back now :)
One can make all sorts of guesses but based on the evidence so far, AIs have a different skill profile than humans. This means if we think of any job a which requires a large set of skills, then for a long period of time, even if AIs beat the human average in some of them, they will perform worse than humans in others.
I always thought the front was the other side, but looking at Google images you are right.… don’t have time now to redraw this but you’ll just have to take it on faith that I could have drawn it on the other side 😀
>On the other hand, if one starts creating LLM-based “artificial AI researchers”, one would probably create diverse teams of collaborating “artificial AI researchers” in the spirit of multi-agent LLM-based architectures,.. So, one would try to reproduce the whole teams of engineers and researchers, with diverse participants.
I think this can be an approach to create a diversity of styles, but not necessarily of capabilities. A bit of prompt engineering telling the model to pretend to be some expert X can help in some benchmarks but the returns diminish very quickly. So you can have a model pretending to be this type of person and that but they will suck at Tic-Tac-Toe. (For example, GPT4 doesn’t know to recognize a winning move even when I tell it to play like Terence Tao.)
Regarding the existence of compact ML programs, I agree that it is not known. I would say however that the main benefit of architectures like transformers hasn’t been so much to save in the total number of FLOPs as much as to organize these FLOPs so they are best suited for modern GPUs—that is ensure that the majority of the FLOPs are spent multiplying dense matrices.
I agree that self-improvement is an assumption that probably deserves its own blog post. If you believe exponential self improvement will kick in at some point, then you can consider this discussion as pertaining until the point that it happens.
My own sense is that:
While we might not be super close to them, there are probably fundamental limits to how much intelligence you can pack per FLOP. I don’t believe there is a small C program that is human-level intelligent. In fact, since both AI and evolution seem to have arrived at roughly similar magnitude, maybe we are not that far off? If there are such limits, then no matter how smart the “AI AI-researchers” are, they still won’t be able to get more intelligence per FLOP than these limits.
I do think that AI AI-researchers will be incomparable to human AI-researchers in a similar manner to other professions. The simplistic view that AI research or any form of research as one-dimensional, where people can be sorted by an ELO-like scale, is dead wrong based on my 25 years of experience. Yes, some aspects of AI research might be easier to automate, and we will certainly use AI to automate them and make AI researchers more productive. But, like the vast majority of human professions (with all due respect to elevator operators :) ), I don’t think human AI researchers will be obsolete any time soon.
p.s. I also noticed this “2 comments”—not sure what’s going on. Maybe my footnotes count as comments?
The shape of AGI: Cartoons and back of envelope
I agree that there is much to do to improve AI reliability, and there are a lot of good reasons (in particular to make AI more useful for us) to do so. So I agree reliability will improve. In fact, I very much hope this happens! I believe faster progress on reliability would go a long way toward enabling positive applications of AI.
I also agree that a likely path to do so is by adjusting the effort based on estimates of reliability and the stakes involved. At the moment, systems such as ChatGPT spend the same computational effort if someone asks them to say a joke or if someone asks them for medical advice. I suspect this will change, and variable inference-time computation will become more standard. (Things like “chain of thought” already spend more time on inference compute to get better performance, but they don’t really have a “knob” we can turn so we can control the computation/reliability tradeoff.)
Regarding the deception issue, it might still be the case that such extra effort is observable, and also could come at the cost of solving the original task. (So your performance per compute is worse if you are not merely trying to just solve the task but to deceive in a highly reliable way.)
In particular, even if we apply inference time compute, unfortunately I don’t think we know of a path to get a overhead in inference time to achieve a failure probability of . It seems that we are still stuck in the regime. So if you wanted to get 99.9% probability of not getting caught, then you would incur a very noticeable effort.
Note all capabilities / tasks correspond to trying to maximize a subjective human response. If you are talking about finding software vulnerabilities, design some system, there may well be objective measures of success. In such a case, you can fine tune a system to maximize these measures and so extract capabilities without the issue of deception/manipulation.
Regarding “escapes”, the traditional fear was that because that AI is essentially code, it can spread and escape more easily. But I think that in some sense modern AI has a physical footprint that is more significant than humans. Think of trying to get superhuman scientific capabilities by doing something like simulating a collection of a1000 scientists using a 100T or so parameter model. Even if you already have the pre-trained weights, just running the model requires highly non-trivial computing infrastructure. (Which may be possible to track and detect.) So. it might be easier for a human to escape a prison and live undetected, than for a superhuman AI to “escape”.
We can of course define “intelligence” in a way that presumes agency and coherence. But I don’t want to quibble about definition.
Generally when you have uncertainty, this corresponds to a potential “distribution shift” between your beliefs/knowledge and reality. When you have such a shift then you want to reglularize which means not optimizing to the maximum.
This is not about the definition of intelligence. It’s more about usefulness. Like a gun without a safety, an optimizer without constraints or regularizarion is not very useful.
Maybe it will be possible to build it, just like today it’s possible to hook up our nukes to an automatic launching device. But it’s not necessary that people will do something so stupid.
The notion of a piece of code that maximizes a utility without any constraints doesn’t strike me as very “intelligent “.
if people really wanted to, they may be able to build such programs, but my guess is that they would be not very useful even before they become dangerous, as overfitting optimizers usually are.
at least some humans (e.g. most transhumanists), are “fanatical maximizers”: we want to fill the lightcone with flourishing sentience, without wasting a single solar system to burn in waste.
I agree that humans have a variety of objectives, which I think is actually more evidence for the hot mess theory?
the goals of an AI don’t have to be simple to not be best fulfilled by keeping humans around.
The point is not about having simple goals, but rather about optimizing goals to the extreme.
I think there is another point of disagreement. As I’ve written before, I believe the future is inherently chaotic. So even a super-intelligent entity would still be limited in predicting it. (Indeed, you seem to concede this, by acknowledging that even super-intelligent entities don’t have exponential time computation and hence need to use “sophisticated heuristics” to do tree search.)
What it means is that there is an inherent uncertainty in the world, and whenever there is uncertainty, you want to “regularize” and not go all out in exhausting a resource which you might not know if you’ll need it later on in the future.
Just to be clear, I think a “hot mess super-intelligent AI” could still result in an existential risk for humans. But that would probably be the case if humans were an actual threat to it, and there was more of a conflict. (E.g., I don’t see it as a good use of energy for us to hunt down every ant and kill it, even if they are nutrituous.)
I was thinking of this as a histogram- probability that the model solves the task at that level of quality