Assume that “deploy powerful AI with no takeover” is exactly as hard as “build a rocket that flies correctly the first time even though it has 2x more thrust than anything anyone as tested before.”
I think you are way underestimating. A more reasonable guess is that expected odds of the first Starship launch failure go down logarithmically with budget and time. Even if you grant a linear relationship, reducing the odds of failure from 10% to 1% means 10x the budget and time. If you want to never fail, you need an infinite budget and time. If the failure results in an extinction event, then you are SOL.
A more reasonable guess is that expected odds of the first Starship launch failure go down logarithmically with budget and time.
That’s like saying that it takes 10 people to get 90% reliability, 100 people to get to 99% reliability, and a hundred million people to get to 99.99% reliability. I don’t think it’s a reasonable model though I’m certainly interested in examples of problems that have worked out that way.
Linear is a more reasonable best guess. I have quibbles, but I don’t think it’s super relevant to this discussion. I expect the starship first failure probability was >>90%, and we’re talking about the difficulty of getting out of that regime.
That’s like saying that it takes 10 people to get 90% reliability, 100 people to get to 99% reliability, and a hundred million people to get to 99.99% reliability. I don’t think it’s a reasonable model though I’m certainly interested in examples of problems that have worked out that way.
Conditional on it being a novel and complicated design. I routinely churn six-sigma code when I know what I am doing, and so do most engineers. But almost never on the first try! The feedback loop is vital, even if it is slow and inefficient. For anything new you are fighting not so much the designs, but human fallibility. Eliezer’s point is that it if you have only one try to succeed, you are hooped. I do not subscribe to the first part, I think we have plenty of opportunities to iterate as LLM capabilities ramp up, but, conditional on “perfect first try or extinction”, our odds of survival are negligible. There might be alignment by default, or some other way out, but conditional on that one assumption, we have no chance in hell.
It seems to me that you disagree with that point, somehow. That by pouring more resources upfront into something novel, we have good odds of succeeding on the first try, open loop. That is not a tenable assumption, so I assume I misunderstood something.
I agree you need feedback from the world; you need to do experiments. If you wanted to get a 50% chance of launching a rocket successfully on the first time (at any reasonable cost) you would need to do experiments.
The equivocation between “no opportunity to experiment” and “can’t retry if you fail” is doing all the work in this argument.
I think you are way underestimating. A more reasonable guess is that expected odds of the first Starship launch failure go down logarithmically with budget and time. Even if you grant a linear relationship, reducing the odds of failure from 10% to 1% means 10x the budget and time. If you want to never fail, you need an infinite budget and time. If the failure results in an extinction event, then you are SOL.
That’s like saying that it takes 10 people to get 90% reliability, 100 people to get to 99% reliability, and a hundred million people to get to 99.99% reliability. I don’t think it’s a reasonable model though I’m certainly interested in examples of problems that have worked out that way.
Linear is a more reasonable best guess. I have quibbles, but I don’t think it’s super relevant to this discussion. I expect the starship first failure probability was >>90%, and we’re talking about the difficulty of getting out of that regime.
Conditional on it being a novel and complicated design. I routinely churn six-sigma code when I know what I am doing, and so do most engineers. But almost never on the first try! The feedback loop is vital, even if it is slow and inefficient. For anything new you are fighting not so much the designs, but human fallibility. Eliezer’s point is that it if you have only one try to succeed, you are hooped. I do not subscribe to the first part, I think we have plenty of opportunities to iterate as LLM capabilities ramp up, but, conditional on “perfect first try or extinction”, our odds of survival are negligible. There might be alignment by default, or some other way out, but conditional on that one assumption, we have no chance in hell.
It seems to me that you disagree with that point, somehow. That by pouring more resources upfront into something novel, we have good odds of succeeding on the first try, open loop. That is not a tenable assumption, so I assume I misunderstood something.
I agree you need feedback from the world; you need to do experiments. If you wanted to get a 50% chance of launching a rocket successfully on the first time (at any reasonable cost) you would need to do experiments.
The equivocation between “no opportunity to experiment” and “can’t retry if you fail” is doing all the work in this argument.