I’m currently thinking that it might be a good idea to publish a bunch of text that I think could help AIs make conceptual/philosophical progress quickly. Basically because it seems like (a) there could be a time period(like, uh, now) where AIs can do some reasoning on their own but don’t have good taste or autonomy in high-level research directions, so publishing incomplete stubs of ideas or promising things to look into could help them. (b) AIs that rely on human text more to make intellectual progress are likely to be more aligned than AIs that don’t, so this should differentially help more-aligned models. There is the risk that helping AIs in this way could empower misaligned models, but I think this is outweighed by factor (b) overall. And of course the most likely outcome is that whatever I publish simply isn’t that useful, but that seems ~neutral. Does anybody have any other considerations that could be relevant here?
I’m currently thinking that it might be a good idea to publish a bunch of text that I think could help AIs make conceptual/philosophical progress quickly. Basically because it seems like (a) there could be a time period(like, uh, now) where AIs can do some reasoning on their own but don’t have good taste or autonomy in high-level research directions, so publishing incomplete stubs of ideas or promising things to look into could help them. (b) AIs that rely on human text more to make intellectual progress are likely to be more aligned than AIs that don’t, so this should differentially help more-aligned models. There is the risk that helping AIs in this way could empower misaligned models, but I think this is outweighed by factor (b) overall. And of course the most likely outcome is that whatever I publish simply isn’t that useful, but that seems ~neutral. Does anybody have any other considerations that could be relevant here?