I’m going to ask my implicit question again, explicitly this time:
At four points spread through your post you write things functionally equivalent to:
”we will run out of fuel before resolving the main problems”
I have read your post twice, carefully, and I still have no idea what “fuel” is intended to be a metaphor for here, or why you assert that “we’re going to run out of it”. I genuinely don’t know what argument you’re making. I’d like to understand your argument, so that I can consider and engage with it.
I do have a number of hypotheses (which may all be wrong — if so, please tell me):
You’re gesturing at (without mentioning) the old problem of “fully-updated deference”: that if we assume that our AI is an excellent Bayesian, and we have no other way to make it corrigible apart from its Bayesian uncertainty about the human values we’re aligning it to, then we can’t keep handing it contradictory corrections indefinitely past the point where this degree of changing our minds about what we want becomes implausible to a good Bayesian, and it decides to stop paying very much attention to us because we’re clearly a random number generator rather than a source of information about our own values — that the AI (by assumption, an excellent Bayesian) will run out of Bayesian uncertainty before the problem is solved (which by definition can only happen if we waste an implausible amount of it).
You’re talking about the fact (true in absolutely every field of science and engineering, especially safety engineering fields) that extrapolating out of your current distribution is hard work and requires you to proceed slowly, carefully and painstakingly, expanding the distribution a little bit at a time (such as by, at some point once you are sufficiently confident from lake-sailing the ship, taking it out to sea for half and hour on an extremely calm and waveless day, and then returning to shore to analyze all your sensor readings for a long time) — that we’ll run out of tests we can do reasonably safely, and their outcomes will not teach us enough to enable us to come up with any more tests that we are now able to do reasonably safely.
a. You’re concerned that if we do enough tests that we believe to be reasonably safe, one of them that isn’t done on the lake but actually at sea will fail so badly that we all die — that we’ll run out of luck. Or perhaps: b. You’re thus absolutely unwilling to do any such tests, even during the ship’s official maiden voyage and next 10 years at sea — which seems like a wasted opportunity, if the ship’s officers are taking it to sea anyway with us on board.
You feel that maritime safety engineering, and presumably by analogy AI safety engineering, sounds dull and detail-oriented and painstaking, and you think people will get bored and give up rather than solving all the problems involved carefully and iteratively (even when existential stakes are involved) — that we’ll run out of enthusiasm for the work.
Something else that I haven’t thought of.
Some or all of the above.
(You might also wish to consider reediting your post, or even writing a new one, in case I’m not the only person confused what your argument is.)
2 is close enough. Extrapolating the results of safe tests to unsafe settings requires a level of theoretical competence we don’t currently have. Steve Brynes just made a great post that is somewhat related, I endorse everything in that post.
Thanks. I’ll reread you post again in light of that. I guess I was always assuming that doing that was going to become necessary, as it is in every field, and as is particularly challenging to do safely in safety engineering fields. Also that we were currently only, say O(10%) of the way through the process. So I’m unsurprised we can’t see much of the route yet — but we can see more of it than we could, say, 5 years ago, and usually we do figure these things out eventually. What concerns me isn’t that this is impossible, but that I don’t think we’re on track to be done in another, say, 5 years, that rushing is is a great way to cause 3.a., and it’s unclear how long we have left.
I’ll also take a look at Steve Bryrnes post you link.
I’m going to ask my implicit question again, explicitly this time:
At four points spread through your post you write things functionally equivalent to:
”we will run out of fuel before resolving the main problems”
I have read your post twice, carefully, and I still have no idea what “fuel” is intended to be a metaphor for here, or why you assert that “we’re going to run out of it”. I genuinely don’t know what argument you’re making. I’d like to understand your argument, so that I can consider and engage with it.
I do have a number of hypotheses (which may all be wrong — if so, please tell me):
You’re gesturing at (without mentioning) the old problem of “fully-updated deference”: that if we assume that our AI is an excellent Bayesian, and we have no other way to make it corrigible apart from its Bayesian uncertainty about the human values we’re aligning it to, then we can’t keep handing it contradictory corrections indefinitely past the point where this degree of changing our minds about what we want becomes implausible to a good Bayesian, and it decides to stop paying very much attention to us because we’re clearly a random number generator rather than a source of information about our own values — that the AI (by assumption, an excellent Bayesian) will run out of Bayesian uncertainty before the problem is solved (which by definition can only happen if we waste an implausible amount of it).
You’re talking about the fact (true in absolutely every field of science and engineering, especially safety engineering fields) that extrapolating out of your current distribution is hard work and requires you to proceed slowly, carefully and painstakingly, expanding the distribution a little bit at a time (such as by, at some point once you are sufficiently confident from lake-sailing the ship, taking it out to sea for half and hour on an extremely calm and waveless day, and then returning to shore to analyze all your sensor readings for a long time) — that we’ll run out of tests we can do reasonably safely, and their outcomes will not teach us enough to enable us to come up with any more tests that we are now able to do reasonably safely.
a. You’re concerned that if we do enough tests that we believe to be reasonably safe, one of them that isn’t done on the lake but actually at sea will fail so badly that we all die — that we’ll run out of luck.
Or perhaps:
b. You’re thus absolutely unwilling to do any such tests, even during the ship’s official maiden voyage and next 10 years at sea — which seems like a wasted opportunity, if the ship’s officers are taking it to sea anyway with us on board.
You feel that maritime safety engineering, and presumably by analogy AI safety engineering, sounds dull and detail-oriented and painstaking, and you think people will get bored and give up rather than solving all the problems involved carefully and iteratively (even when existential stakes are involved) — that we’ll run out of enthusiasm for the work.
Something else that I haven’t thought of.
Some or all of the above.
(You might also wish to consider reediting your post, or even writing a new one, in case I’m not the only person confused what your argument is.)
2 is close enough. Extrapolating the results of safe tests to unsafe settings requires a level of theoretical competence we don’t currently have. Steve Brynes just made a great post that is somewhat related, I endorse everything in that post.
Thanks. I’ll reread you post again in light of that. I guess I was always assuming that doing that was going to become necessary, as it is in every field, and as is particularly challenging to do safely in safety engineering fields. Also that we were currently only, say O(10%) of the way through the process. So I’m unsurprised we can’t see much of the route yet — but we can see more of it than we could, say, 5 years ago, and usually we do figure these things out eventually. What concerns me isn’t that this is impossible, but that I don’t think we’re on track to be done in another, say, 5 years, that rushing is is a great way to cause 3.a., and it’s unclear how long we have left.
I’ll also take a look at Steve Bryrnes post you link.