Surely (maybe this is a literacy issue on my part) Evan is using “steam-engine world” to refer to worlds where we don’t have to get it right on the first try? We can’t perfectly analogize between building ASI and building the steam engine, the former is clearly a more continuous process (in the sense that, if current approaches work, the architecture for ASI will look like an architecture for not-quite-ASI but bigger, and we’ll be training lots of the smaller examples before we train the bigger example).
I’m also not sure how you’re getting to “likely” here. How do we get from “it’s possible for new catastrophic issues to appear anywhere in a continuous process, even if they haven’t shown up in the last little while” to “it’s likely that catastrophic new issues exist whenever you step forward, even if they haven’t existed for the last couple of steps forward, and it’s impossible to ever take a confident step”? It seems like you need something like the latter view to think that it’s likely that alignment that works on not-quite-ASI will fail for ASI, and the latter view is clearly false. I can imagine that there’s some argument which will convince me that doom is likely even if we can align weak AIs, I haven’t thought enough about that yet, but I don’t think anything along these lines can work.
Surely (maybe this is a literacy issue on my part) Evan is using “steam-engine world” to refer to worlds where we don’t have to get it right on the first try? We can’t perfectly analogize between building ASI and building the steam engine, the former is clearly a more continuous process (in the sense that, if current approaches work, the architecture for ASI will look like an architecture for not-quite-ASI but bigger, and we’ll be training lots of the smaller examples before we train the bigger example).
I’m also not sure how you’re getting to “likely” here. How do we get from “it’s possible for new catastrophic issues to appear anywhere in a continuous process, even if they haven’t shown up in the last little while” to “it’s likely that catastrophic new issues exist whenever you step forward, even if they haven’t existed for the last couple of steps forward, and it’s impossible to ever take a confident step”? It seems like you need something like the latter view to think that it’s likely that alignment that works on not-quite-ASI will fail for ASI, and the latter view is clearly false. I can imagine that there’s some argument which will convince me that doom is likely even if we can align weak AIs, I haven’t thought enough about that yet, but I don’t think anything along these lines can work.