boazbarak

Karma: 484

boazbarak 30 Jul 2023 1:11 UTC
17 points
9
on: Self-driving car bets
There is a general phenomenon in tech that has been expressed many times of people over-estimating the short-term consequences and under-estimating the longer term ones (e.g., “Amara’s law”).
I think that often it is possible to see that current technology is on track to achieve X, where X is widely perceived as the main obstacle for the real-world application Y. But once you solve X, you discover that there is a myriad of other “smaller” problems Z_1 , Z_2 , Z_3 that you need to resolve before you can actually deploy it for Y.
And of course, there is always a huge gap between demonstrating you solved X on some clean academic benchmark, vs. needing to do so “in the wild”. This is particularly an issue in self-driving where errors can be literally deadly but arises in many other applications.
I do think that one lesson we can draw from self-driving is that there is a huge gap between full autonomy and “assistance” with human supervision. So, I would expect we would see AI be deployed as (increasingly sophisticated) “assistants’ way before AI systems actually are able to function as “drop-in” replacements for current human jobs. This is part of the point I was making here.

boazbarak 7 Jun 2023 15:22 UTC
13 points
11
on: A Playbook for AI Risk Reduction (focused on misaligned AI)
Thank you for writing this!

One argument for the “playbook” rather than the “plan” view is that there is a big difference between DISASTER (something very bad happening) and DOOM (irrecoverable extinction-level catastrophe). Consider the case of nuclear weapons. Arguably the disaster of Hiroshima and Nagasaki bombs led us to better arms control which helped so far prevent the catastrophe (even if not quite existential one) of an all-out nuclear war. In all but extremely fast take-off scenarios, we should see disasters as warning signs before doom.

The good thing is that avoiding disasters makes good business. In fact, I don’t expect AI labs to require any “altrusim” to focus their attention on alignment and safety. This survey by Timothy Lee on self-driving cars notes that after a single tragic incident in which an Uber self-driving car killed a pedestrian, “Uber’s self-driving division never really recovered from the crash, and Uber sold it off in 2020. The rest of the industry vowed not to repeat Uber’s mistake.” Given that a single disaster can be extremely hard to recover from, smart leaders of AI labs should focus on safety, even if it means being a little slower to the market.

While the initial push is to get AI to match human capabilities, as these tools become more than impressive demos and need to be deployed in the field, the customers will care much more about reliability and safety than they do about capabilities. If I am a software company using an AI system as a programmer, it’s more useful to me if it can reliably deliver bug-free 100-line subroutines than if it writes 10K sized programs that might contain subtle bugs. There is a reason why much of the programming infrastructure for real-world projects, including pull requests, code reviews, unit tests, is not aimed at getting something that kind of works out as quickly as possible, but rather make sure that the codebase grows in a reliable and maintainable fashion.
This doesn’t mean that the free market can take care of everything and that regulations are not needed to ensure that some companies don’t make a quick profit by deploying unsafe products and pushing off externalities to their users and the broader environment. (Indeed, some would say that this was done in the self-driving domain...) But I do think there is a big commercial incentive for AI labs to invest in research on how to ensure that systems pushed out behave in a predictable manner, and don’t start maximizing paperclips.

p.s. The nuclear setting also gives another lesson (TW: grim calculations follow). It is much more than a factor of two harder to extinguish 100% of the population than to kill the ~50% or so that live in large metropolitan areas. Generally, the difference between the effort needed to kill 50% of the population and the effort to kill a 1-p fraction should scale at least as 1/p.

boazbarak 20 May 2023 3:07 UTC
11 points
−9
in reply to: Jalex Stark’s comment on: GPT as an “Intelligence Forklift.”
Thank you! You’re right. Another point is that intelligence and agency are independent, and a tool AI can be (much) more intelligent than an agentic one.

boazbarak 23 Nov 2022 14:16 UTC
LW: 11 AF: 8
0
AF
in reply to: Vanessa Kosoy’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
Hi Vanessa,
Let me try to respond (note the claim numbers below are not the same as in the essay, but rather as in Vanessa’s comment):
Claim 1: Our claim is that one can separate out components—there is the predictable component which is non stationary, but is best approximated with a relatively simple baseline, and the chaotic component, which over the long run is just noise.In general, highly complex rules are more sensitive to noise (in fact, there are theorems along these lines in the field of Analysis of Boolean Functions), and so in the long run, the simpler component will dominate the accuracy.
Claim 2: Hacking is actually a fairly well-specified endeavor. People catalog, score, and classify security vulnerabilities. To hack would be to come up with a security vulnerability, and exploit code, which can be verified. Also, you seem to be envisioning a long-term AI that is then fine-tuned on a short-term task, but how did it evolve these long-term goals in the first place?
Claim 3: I would not say that there is no such thing as talent in being a CEO or presidents. I do however believe that the best leaders have been some combination of their particular characteristics and talents, and the situation they were in. Steve Jobs has led Apple to become the largest company in the world, but it is not clear that he is a “universal CEO” that would have done as good in any company (indeed he failed with NeXT). Similarly, Abraham Lincoln is typically ranked as the best U.S. president by historians, but again I think most would agree that he fit well the challenge that he had to face, rather than being someone that would have just as well handled the cold war or the 1970s energy crisis. Also, as Yafah points elsewhere here, for people to actually trust an AI with being the leader of a company or a country, it would need to not just be as good as humans or a little better, but better by a huge margin. In fact, most people’s initial suspicion is that AIs (or even humans that don’t look like them) is not “aligned” with their interests, and if you don’t convince them otherwise, their default would be to keep them from positions of power.
Claim 4: The main point is that we need to measure the powers of a system as a whole, not compare the powers of an individual human with an individual AI. Clearly, if you took a human, made their memory capacity 10 times bigger, and made their speed 10 times faster, then they could do more things. But we are comparing with the case that humans will be assisted with short-term AIs that would help them in all of the tasks that are memory and speed intensive.

boazbarak 23 Nov 2022 22:45 UTC
9 points
0
on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
Thanks for so many comments! I do plan to read them carefully and respond, but it might take me a while. In the meantime, Scott Aaronson also has a relevant blog https://scottaaronson.blog/?p=6821

Happy thanksgiving to all who celebrate it!

boazbarak 22 Nov 2022 23:01 UTC
9 points
4
in reply to: YafahEdelman’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
Yes, we usually select our leaders (e.g., presidents) not for their cognitive abilities but literally for how “aligned “ we believe they are with our interest. Even if we completely solve the alignment problem, AI would likely face an uphill battle in overcoming prejudice and convincing people that they are as aligned as an alternative human. As the saying goes for many discriminated groups, they would have to be twice as good to get to the same place.

boazbarak 7 Jun 2023 15:42 UTC
7 points
6
in reply to: Wei Dai’s comment on: A Playbook for AI Risk Reduction (focused on misaligned AI)
A partial counter-argument. It’s hard for me to argue about future AI, but we can look at current “human misalignment”—war, conflict, crime, etc.. It seems to me that conflicts in today’s world do not arise because that we haven’t progressed enough in philosophy since the Greeks. Rather conflicts arise when various individuals and populations (justifiably or not) perceive that they are in zero-sum games for limited resources. The solution for this is not “philosophical progress” as much as being able to move out of the zero-sum setting by finding “win win” resolutions for conflict or growing the overall pie instead of arguing how to split it.

(This is a partial counter-argument, because I think you are not just talking about conflict, but other issues of making the wrong choices. For example in global warming where humanity makes collectively the mistake of emphasizing short-term growth over long-term safety. However, I think this is related and “growing the pie” would have alleviated this issue as well, and enabled countries to give up on some more harmful ways for short-term growth.)

boazbarak 6 Jun 2023 1:46 UTC
5 points
0
in reply to: Martín Soto’s comment on: The (local) unit of intelligence is FLOPs
Thanks!

boazbarak 25 Nov 2022 17:27 UTC
LW: 5 AF: 3
0
AF
in reply to: Vanessa Kosoy’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
Hi Vanesssa,
Perhaps given my short-term preference, it’s not surprising that I find it hard to track very deep comment threads, but let me just give a couple of short responses.
I don’t think the argument on hacking relied on the ability to formally verify systems. Formally verified systems could potentially skew the balance of power to the defender side, but even if they don’t exist, I don’t think balance is completely skewed to the attacker. You could imagine that, like today, there is a “cat and mouse” game, where both attackers and defenders try to find “zero day vulnerabilities” and exploit (in one case) or fix (in the other). I believe that in the world of powerful AI, this game would continue, with both sides having access to AI tools, which would empower both but not necessarily shift the balance to one or the other.
I think the question of whether a long-term planning agent could emerge from short-term training is a very interesting technical question! Of course we need to understand how to define “long term” and “short term” here. One way to think about this is the following: we can define various short-term metrics, which are evaluable using information in the short-term, and potentially correlated with long-term success. We would say that a strategy is purely long-term if it cannot be explained by making advances on any combination of these metrics.

boazbarak 23 Nov 2022 3:01 UTC
5 points
0
in reply to: paulfchristiano’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
Thanks for your comments! Some quick responses:
- I agree that extracting short-term modules from long-term modules is very much an open question. However, it may well be that our main problem would be the opposite: the systems would be trained already with short-term goals, and so we just want to make sure that they don’t accidentally develop a long-term goal in the process (this may be related to your mechanisms posts, which I will respond to separately)
- I do think that there is a sense in which, in a chaotic world, some “greedy” or simple heuristics end up to be better than ultra complex ones. In Chess you could sacrifice a Queen in order to get some advantage much later on, but in business, while you might sacrifice one metric (e.g., profit) to maximize another (e.g. growth), you need to make some measurable progress. If we think of cognitive ability as the ability to use large quantities of data and perform very long chains of reasonings on them, then I do believe these are more needed for scientists or engineers than for CEOs. (In an earlier draft we also had another example for the long-term benefits of simple strategies: the fact that the longest-surviving species are simple ones such as cockroaches, crocodiles etc. , but Ben didn’t like it :) )
- I agree deterrence is very problematic, but prevention might be feasible. For example, while AI would greatly increase the capabilities for hacking, it would also increase the capabilities to harden our systems. In general, I find research on prevention to be more attractive than alignment since it also applies to the scenario (more likely in my view) of malicious humans using AI to cause massive harm. It also doesn’t require us to speculate about objects (long-term planning AIs) that don’t yet exist.

boazbarak 29 Jun 2023 7:47 UTC
4 points
0
in reply to: Max H’s comment on: Metaphors for AI, and why I don’t like them
at least some humans (e.g. most transhumanists), are “fanatical maximizers”: we want to fill the lightcone with flourishing sentience, without wasting a single solar system to burn in waste.
I agree that humans have a variety of objectives, which I think is actually more evidence for the hot mess theory?

the goals of an AI don’t have to be simple to not be best fulfilled by keeping humans around.
The point is not about having simple goals, but rather about optimizing goals to the extreme.
I think there is another point of disagreement. As I’ve written before, I believe the future is inherently chaotic. So even a super-intelligent entity would still be limited in predicting it. (Indeed, you seem to concede this, by acknowledging that even super-intelligent entities don’t have exponential time computation and hence need to use “sophisticated heuristics” to do tree search.)
What it means is that there is an inherent uncertainty in the world, and whenever there is uncertainty, you want to “regularize” and not go all out in exhausting a resource which you might not know if you’ll need it later on in the future.
Just to be clear, I think a “hot mess super-intelligent AI” could still result in an existential risk for humans. But that would probably be the case if humans were an actual threat to it, and there was more of a conflict. (E.g., I don’t see it as a good use of energy for us to hunt down every ant and kill it, even if they are nutrituous.)

boazbarak 7 Jun 2023 14:42 UTC
4 points
2
in reply to: dr_s’s comment on: Why I am not a longtermist (May 2022)
I really like this!
The hypothetical future people calculation is an argument why people should care about the future, but as you say the vast majority of currently living humans (a) already care about the future and (b) are not utilitarians and so this argument anyway doesn’t appeal to them.

boazbarak 7 Jun 2023 2:06 UTC
4 points
2
in reply to: CrimsonChin’s comment on: Why I am not a longtermist (May 2022)
Thanks! I should say that (as I wrote on windows on theory) one response I got to that blog was that “anyone who writes a piece called “Why I am not a longtermist” is probably more of a longtermist than 90% of the population” :)

That said, if the 0.001% is a lie then I would say that it’s an unproductive one, and one that for many people would be an ending point rather than a starting one.

boazbarak 25 Nov 2022 19:40 UTC
4 points
2
in reply to: gwern’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
Can you send links? In any case I do believe that it is understood that you have to be careful in a setting where you have two models A and B, where B is a “supervisor” of the output of A, and you are trying to simultaneously teach B to come up with good metric to judge A by, and teach A to come up with outputs that optimize B’s metric. There can be equilibriums where A and B jointly diverge from what we would consider “good outputs”.
This for example comes up in trying to tackle “over optimization” in instructGPT (there was a great talk by John Schulman in our seminar series a couple of weeks ago), where model A is GPT-3, and model B tries to capture human scores for outputs. Initially, optimizing for model B induces optimizing for human scores as well, but if you let model A optimize too much, then it optimizes for B but becomes negatively correlated with the human scores (i.e., “over optimizes”).
Another way to see this issue is even for powerful agents like AlphaZero are susceptible to simple adversarial strategies that can beat them: see “Adversarial Policies Beat Professional-Level Go AIs” and “Are AlphaZero-like Agents Robust to Adversarial Perturbations?”.
The bottom line is that I think we are very good at optimizing any explicit metric $M$ , including when that metric is itself some learned model. But generally, if we learn some model $A$ s.t. $A (y) \approx M (y)$ , this doesn’t mean that if we let $B (x) = arg max A (y)$ then it would give us an approximate maximizer of $M (y)$ as well. Maximizing $A$ would tend to push to the extreme parts of the input space, which would be exactly those where $A$ deviates from $M$ .
The above is not an argument against the ability to construct AGI as well, but rather an argument for establishing concrete measurable goals that our different agents try to optimize, rather than trying to learn some long-term equilibrium. So for example, in the software-writing and software-testing case, I think we don’t simply want to deploy two agents A and B playing a zero-sum game where B’s reward is the number of bugs found in A’s code.

boazbarak 25 Nov 2022 17:17 UTC
4 points
1
in reply to: Daniel Kokotajlo’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
My forecast would be that an AI that operates autonomously for long periods would be composed of pieces that make human-interpretable progress in the short term. For example, a self-driving car will be able to eventually to drive to New York to Los Angeles, but I believe it would do so by decomposing the task into many small tasks of getting from point A to B. It would not do so by sending it out to the world (or even a simulated world) and repeatedly playing a game where it gets a reward if it reaches Los Angeles, and gets nothing if it doesn’t.

boazbarak 23 Nov 2022 14:30 UTC
4 points
3
in reply to: DirectedEvolution’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
I do not claim that AI cannot set long-term strategies. My claim is that this is not where AI’s competitive advantages over humans will be. I could certainly imagine that a future AI would be 10 times better than me in proving mathematical theorems. I am not at all sure it would be 10 times better than Joe Biden in being a U.S. president, and mostly it is because I don’t think that the information-processing capabilities are really the bottleneck for that job. (Though certainly, the U.S. as a whole, including the president, would benefit greatly from future AI tools, and it is quite possible that some of Biden’s advisors would be replaced by AIs.)

boazbarak 23 Nov 2022 13:42 UTC
4 points
1
in reply to: paulfchristiano’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
Let me try to make things more concrete. We are a company that is deploying a service, in which our ultimate goal might be to maximize our profit a decade from now (or maybe more accurately, maximize people’s perception of our future profit, which corresponds to our current stock price...).
My take is that while the leaders of the company might chart a strategy towards this far-off goal, they would set concrete goals for the software developers which correspond to very clear metrics. That is, the process of implementing a new feature for the service would involve the following steps:
- Proposing the feature, and claiming which metric it would improve (e.g., latency on the website, click-through rate for ads, satisfaction with service, increasing users, etc...). Crucially, these metrics are simple and human-interpretable, since the assumption is that in a chaotic world, we cannot have “3D chess” type of strategies, and rather each feature should make some clear progress in some measure.
- Writing code for the feature.
- Reviewing and testing the code.
- Deploying it (possibly with A/B testing)
- Evaluating the deployment
AIs might be involved in all of these steps, but it would not be one coherent AI that does everything and whose goal is to eventually make the managers happy. Just as today we have different people doing these roles, so would different AIs be doing each one of these roles, and importantly, each one of them would have its own objective function that they are trying to maximize.
So, each one of these components would be separately, and in some sense trained adversarially (e.g., testing AI would be trained to maximize bugs found, while code writing AI would be trained to minimize them). Moreover, each one of them would be trained on its own giant corpus of data. If they are jointly trained (like in GANs) then indeed care must be taken that they are not collapsing into an undesirable equilibrium, but this is something that is well understood.

boazbarak 19 Aug 2023 9:41 UTC
3 points
0
in reply to: lukehmiles’s comment on: Self-driving car bets
See also my post https://www.lesswrong.com/posts/gHB4fNsRY8kAMA9d7/reflections-on-making-the-atomic-bomb

the Manhattan project was all about taking something that’s known to work in theory and solving all the Z_n’s

boazbarak 19 Jul 2023 1:36 UTC
3 points
0
in reply to: mishka’s comment on: The shape of AGI: Cartoons and back of envelope
One can make all sorts of guesses but based on the evidence so far, AIs have a different skill profile than humans. This means if we think of any job a which requires a large set of skills, then for a long period of time, even if AIs beat the human average in some of them, they will perform worse than humans in others.

boazbarak 17 Jul 2023 22:49 UTC
3 points
0
in reply to: mishka’s comment on: The shape of AGI: Cartoons and back of envelope
I agree that self-improvement is an assumption that probably deserves its own blog post. If you believe exponential self improvement will kick in at some point, then you can consider this discussion as pertaining until the point that it happens.
My own sense is that:
1. While we might not be super close to them, there are probably fundamental limits to how much intelligence you can pack per FLOP. I don’t believe there is a small C program that is human-level intelligent. In fact, since both AI and evolution seem to have arrived at roughly similar magnitude, maybe we are not that far off? If there are such limits, then no matter how smart the “AI AI-researchers” are, they still won’t be able to get more intelligence per FLOP than these limits.
2. I do think that AI AI-researchers will be incomparable to human AI-researchers in a similar manner to other professions. The simplistic view that AI research or any form of research as one-dimensional, where people can be sorted by an ELO-like scale, is dead wrong based on my 25 years of experience. Yes, some aspects of AI research might be easier to automate, and we will certainly use AI to automate them and make AI researchers more productive. But, like the vast majority of human professions (with all due respect to elevator operators :) ), I don’t think human AI researchers will be obsolete any time soon.
p.s. I also noticed this “2 comments”—not sure what’s going on. Maybe my footnotes count as comments?