I like the main idea of the post. It’s important to note though that the setup assumed that we have a bunch of alignnent ideas that all have an independent 10% chance of working. Meanwhile, in reality I expect a lot of correlation: there is a decent chance that alignment is easy and a lot of our ideas will work, and a decent chance that it’s hard and basically nothing works.
Agreed, this only matters in the regime where some but not all of your ideas will work. But even in alignment-is-easy worlds, I doubt literally everything will work, so testing would still be helpful.
I like the main idea of the post. It’s important to note though that the setup assumed that we have a bunch of alignnent ideas that all have an independent 10% chance of working. Meanwhile, in reality I expect a lot of correlation: there is a decent chance that alignment is easy and a lot of our ideas will work, and a decent chance that it’s hard and basically nothing works.
Agreed, this only matters in the regime where some but not all of your ideas will work. But even in alignment-is-easy worlds, I doubt literally everything will work, so testing would still be helpful.