My claim here is that there is no decisive blocker for plans that get getting a safe, highly capable AIs that is used for automated AI safety research, in the way that thermodynamics blocks you from getting a perpetual motion machine (under the assumption that the universe is time symmetric, that is physics stays the same no matter when an experiment happens), which has been well tested, and the proposed blockers do not have anywhere close to the amount of evidence thermodynamics does such that we can safely discard any plan that doesn’t meet a prerequisite.
Cool. What’s the actual plan and why should I expect it not to create machine Carissa Sevar? I agree that the Textbook From The Future Containing All The Simple Tricks That Actually Work Robustly enables the construction of such an AI, but also at that point you don’t need it.
Noosphere, why are you responding for a second time to a false interpretation of what Eliezer was saying, directly after he clarified this isn’t what he meant?
Okay, maybe he clarified that there was no thermodynamics-like blocker to getting a plan in principle to align AI, but I didn’t interpret Eliezer’s clarification to rule that out immediately, so I wanted to rule that interpretation out.
I didn’t see the interpretation as false when I wrote it, because I believed he only ruled out a decisive blocker to getting behaviors you don’t know how to verify.
I think the misunderstanding came from Eliezer’s reference to a perpetual motion machine. The point was that people suggesting how to build them often have complicated schemes that tend to not adequately address the central difficulty of creating one. That’s where the analogy ends. From thermodynamics, we have strong reasons to believe such a thing is not just difficult but impossible whereas we have no corresponding theory to rule out verifiably safe AI.
Habryka’s analogy to nuclear reactor plans is similar except we know that building one of those is difficult but actually possible.
My claim here is that there is no decisive blocker for plans that get getting a safe, highly capable AIs that is used for automated AI safety research, in the way that thermodynamics blocks you from getting a perpetual motion machine (under the assumption that the universe is time symmetric, that is physics stays the same no matter when an experiment happens), which has been well tested, and the proposed blockers do not have anywhere close to the amount of evidence thermodynamics does such that we can safely discard any plan that doesn’t meet a prerequisite.
Cool. What’s the actual plan and why should I expect it not to create machine Carissa Sevar? I agree that the Textbook From The Future Containing All The Simple Tricks That Actually Work Robustly enables the construction of such an AI, but also at that point you don’t need it.
Noosphere, why are you responding for a second time to a false interpretation of what Eliezer was saying, directly after he clarified this isn’t what he meant?
Okay, maybe he clarified that there was no thermodynamics-like blocker to getting a plan in principle to align AI, but I didn’t interpret Eliezer’s clarification to rule that out immediately, so I wanted to rule that interpretation out.
I didn’t see the interpretation as false when I wrote it, because I believed he only ruled out a decisive blocker to getting behaviors you don’t know how to verify.
I think the misunderstanding came from Eliezer’s reference to a perpetual motion machine. The point was that people suggesting how to build them often have complicated schemes that tend to not adequately address the central difficulty of creating one. That’s where the analogy ends. From thermodynamics, we have strong reasons to believe such a thing is not just difficult but impossible whereas we have no corresponding theory to rule out verifiably safe AI.
Habryka’s analogy to nuclear reactor plans is similar except we know that building one of those is difficult but actually possible.