This only works if alignment is basically intractable, right? If the problem is basically impossible for normal intelligences, then we should expect that normal intelligences do not generally want to build superintelligences. But if the problem is just out of reach for us, then a machine only slightly smarter than us might crack it. The same is basically true for capabilities.
Sure, and if a machine just slightly smarter than us deployed by an AI company solves alignment instead of doing what it’s been told to do, which is capabilities research, the argument will evidently have succeeded.
This only works if alignment is basically intractable, right? If the problem is basically impossible for normal intelligences, then we should expect that normal intelligences do not generally want to build superintelligences. But if the problem is just out of reach for us, then a machine only slightly smarter than us might crack it. The same is basically true for capabilities.
Sure, and if a machine just slightly smarter than us deployed by an AI company solves alignment instead of doing what it’s been told to do, which is capabilities research, the argument will evidently have succeeded.
I don’t think I understand what you’re saying here, can you rephrase in more words?