I appreciate that you’re engaging in this question, but I have a hard time taking much away from this.
Quickly:
1. I think we’re both probably skeptical of what Joshua Achiam is approaching. My hunch is that he’s representing a particularly niche view on things.
2. It feels to me like there’s a gigantic challenge of choosing what scale to approach this problem. You seem to be looking at AGI and it’s effects from a very high-level. Like, we need to run evals on massive agents exploring the real world—and if this is too complex for more regular robust engineering, then we better throw out the idea of robust engineering. To me this seems like a bit complaining that “Robust engineering won’t work for vehicle safety because it’s won’t be able to tackle the complex society-wide effects of how vehicles will be used at scale.”
I think that the case for using more robust methods for AI doesn’t look like straightforwardly solving all global-scale large-agent challenges, but instead achieving high reliability levels on much narrower tasks. For example, can we make sure that a single LLM pass has a 99.99% success rate at certain kinds of hacks.
This wouldn’t solve all the challenges that would come from complex large systems, similar to how vehicle engineering doesn’t solve all challenges with vehicles. But it might be able to still be a big part of the solution.
You might then argue that a complex system composed of robust parts wouldn’t have any benefit from that robustness, because all the ‘interesting’ stuff happens on the larger scales. My quick guess is that this is one area where we would disagree, perhaps there’s a crux here.
(Thanks!) To me, your comment is like: “We have a great plan for robust engineering of vehicles (as long as they are at on a dry, warm, indoor track, going under 10kph).” OK that’s better than nothing. But if we are eventually going to be driving cars at high speed in the cold rain, it’s inadequate. We did not test or engineer them in the right environment.
This is not a complex systems objection (e.g., it’s not about how the world changes with billions of cars). It’s a distribution shift objection. Even just one car will fail at high speed in the cold rain under those circumstances.
If there’s a distribution shift (test environment systematically different from deployment environment), then you need sufficiently deep understanding of the system to allow extrapolation across the distribution shift.
In the AI case, the issue is: there’s a strong (quadrillion-dollar) economic incentive to make a type of AI that can found, run, and staff innovative companies, 100% autonomously, for years on end, even when the company is doing things that nobody has thought of, and even in a world possibly very different from our own.
And then there’s a huge distribution shift between the environment in which we expect such AIs to be operating, and the environment(s) in which we can safely test those AIs.
My actual opinion is that this type of AI won’t be an LLM, which e.g. have issues with long context windows and don’t do human-like continual learning.
…But even if it were an LLM (or system that includes LLMs), I think you’re going wrong by treating “reliability” and “robustness” as synonyms, when LLMs are actually much stronger at the former than the latter.
We can make a car that’s 99.99% “reliable” on a warm dry indoor track, but after you distribution-shift into the cold rain, it might be 10% or 0% reliable. So it’s not “robust” to distribution shifts. By the same token, I’m willing to believe that LLMs can be made 99.99% “reliable” (in the sense of reliably doing a specific thing in a specific situation). But in weird situations, LLMs sometimes go off the rails; e.g. nobody has yet made an LLM that could not be jailbroken, despite years of work. They’re not very “robust”.
You’re sorta implying that I’m against robustness, but in the OP I was saying the opposite. I think we desperately need robustness. I just think we’re not gonna get it without deeper understanding, because of distribution shifts.
I appreciate that you’re engaging in this question, but I have a hard time taking much away from this.
Quickly:
1. I think we’re both probably skeptical of what Joshua Achiam is approaching. My hunch is that he’s representing a particularly niche view on things.
2. It feels to me like there’s a gigantic challenge of choosing what scale to approach this problem. You seem to be looking at AGI and it’s effects from a very high-level. Like, we need to run evals on massive agents exploring the real world—and if this is too complex for more regular robust engineering, then we better throw out the idea of robust engineering. To me this seems like a bit complaining that “Robust engineering won’t work for vehicle safety because it’s won’t be able to tackle the complex society-wide effects of how vehicles will be used at scale.”
I think that the case for using more robust methods for AI doesn’t look like straightforwardly solving all global-scale large-agent challenges, but instead achieving high reliability levels on much narrower tasks. For example, can we make sure that a single LLM pass has a 99.99% success rate at certain kinds of hacks.
This wouldn’t solve all the challenges that would come from complex large systems, similar to how vehicle engineering doesn’t solve all challenges with vehicles. But it might be able to still be a big part of the solution.
You might then argue that a complex system composed of robust parts wouldn’t have any benefit from that robustness, because all the ‘interesting’ stuff happens on the larger scales. My quick guess is that this is one area where we would disagree, perhaps there’s a crux here.
(Thanks!) To me, your comment is like: “We have a great plan for robust engineering of vehicles (as long as they are at on a dry, warm, indoor track, going under 10kph).” OK that’s better than nothing. But if we are eventually going to be driving cars at high speed in the cold rain, it’s inadequate. We did not test or engineer them in the right environment.
This is not a complex systems objection (e.g., it’s not about how the world changes with billions of cars). It’s a distribution shift objection. Even just one car will fail at high speed in the cold rain under those circumstances.
If there’s a distribution shift (test environment systematically different from deployment environment), then you need sufficiently deep understanding of the system to allow extrapolation across the distribution shift.
In the AI case, the issue is: there’s a strong (quadrillion-dollar) economic incentive to make a type of AI that can found, run, and staff innovative companies, 100% autonomously, for years on end, even when the company is doing things that nobody has thought of, and even in a world possibly very different from our own.
And then there’s a huge distribution shift between the environment in which we expect such AIs to be operating, and the environment(s) in which we can safely test those AIs.
My actual opinion is that this type of AI won’t be an LLM, which e.g. have issues with long context windows and don’t do human-like continual learning.
…But even if it were an LLM (or system that includes LLMs), I think you’re going wrong by treating “reliability” and “robustness” as synonyms, when LLMs are actually much stronger at the former than the latter.
We can make a car that’s 99.99% “reliable” on a warm dry indoor track, but after you distribution-shift into the cold rain, it might be 10% or 0% reliable. So it’s not “robust” to distribution shifts. By the same token, I’m willing to believe that LLMs can be made 99.99% “reliable” (in the sense of reliably doing a specific thing in a specific situation). But in weird situations, LLMs sometimes go off the rails; e.g. nobody has yet made an LLM that could not be jailbroken, despite years of work. They’re not very “robust”.
You’re sorta implying that I’m against robustness, but in the OP I was saying the opposite. I think we desperately need robustness. I just think we’re not gonna get it without deeper understanding, because of distribution shifts.
I imagine we’d both agree that there can and should be a lot of evals and attempts at robustness / reliability for small / low-level systems?
It seems like the disagreement is in how useful such work will be for critical and broader alignment challenges.