“Ah, but we can construct extremely accurate honeypots / testing environments that simulate a real-world opportunity to take over, and then see what the ASI does.” Valid, but not sound, because we probably can’t actually do that.
I also think it’s important that you can do this with AIs weaker than the ASI, and iterate on alignment in that context.
I also think it’s important that you can do this with AIs weaker than the ASI, and iterate on alignment in that context.