Bronson Schoen comments on Why would AI companies use human-level AI to do alignment research?

Bronson Schoen 26 Apr 2025 16:17 UTC
1 point
0
There are some reasons for thinking automation of labor is particularly compelling in the alignment case relative to the capabilities case:
It always seems to me the free variable here is why the lab would value spending X% on alignment. For example, you could have the model that “labs will only allocate compute for alignment insofar as it is hampering capabilities progress”. While this would be a nonzero amount, it seems like the failure modes in this regime would be related to alignment research never getting allocated some fixed compute to use for making arbitrary progress, the progress is essentially bottlenecked on “how much misalignment is legibly impeding capabilities work”.