I think the biggest problem is currently: how do we get a group of people (e.g. a leading lab) to build powerful AGI in a safely conntained simulation and study it without releasing it? I think this scenario gives us ‘multiple tries’, and I think we need that to have a decent chance of succeeding at alignment.
If we do get there, we can afford to be wrong about a lot of our initial ideas, and then iterate. That’s inherently a much more favorable scenario.
I think the biggest problem is currently: how do we get a group of people (e.g. a leading lab) to build powerful AGI in a safely conntained simulation and study it without releasing it? I think this scenario gives us ‘multiple tries’, and I think we need that to have a decent chance of succeeding at alignment. If we do get there, we can afford to be wrong about a lot of our initial ideas, and then iterate. That’s inherently a much more favorable scenario.