It could actually be enough to align a weak system. This is the case where the system is “weak” in the sense that it can’t perform a pivotal act on its own, but it’s powerful enough that it can significantly accelerate development toward a stronger aligned AI/AGI with pivotal act potential.
What specific capabilities will this weak AI have that lets you cross the distributional shift?
I think this sort of thing is not impossible, but I think it needs to have a framing like “I will write a software program that will make it slightly easier for me to think, and then I will solve the problem” and not like “I will write an AI that will do some complicated thought which can’t be so complicated that it’s dangerous, and there’s a solution in that space.” By the premise, the only safe thoughts are simple ones, and so if you have a specific strategy that could lead to alignment breakthrus but just needs to run lots of simple for loops or w/e, the existence of that strategy is the exciting fact, not the meta-strategy of “humans with laptops can think better than humans with paper.”
What specific capabilities will this weak AI have that lets you cross the distributional shift?
I think this sort of thing is not impossible, but I think it needs to have a framing like “I will write a software program that will make it slightly easier for me to think, and then I will solve the problem” and not like “I will write an AI that will do some complicated thought which can’t be so complicated that it’s dangerous, and there’s a solution in that space.” By the premise, the only safe thoughts are simple ones, and so if you have a specific strategy that could lead to alignment breakthrus but just needs to run lots of simple for loops or w/e, the existence of that strategy is the exciting fact, not the meta-strategy of “humans with laptops can think better than humans with paper.”