I think this is insightful and valid. It’s closely related to how I think about my research agenda:
Figure out how labs are most likely to attempt alignment
Figure out how that’s most likely to go wrong
Communicate about that clearly enough that it reaches them and prevents them from making those mistakes.
There’s a lot that goes in to each of those steps. It still seems like the best use of independent researcher time.
Of course there are a lot of caveats and nitpicks, as other comments have highlighted. But it seems like a really useful framing.
It’s also closely related to a post I’m working on, “the alignment meta-problem,” arguing that research at the meta or planning level is most valuable right now, since we have very poor agreement on what object-level research is most valuable. That meta-research would include making problems more legible.
I think this is insightful and valid. It’s closely related to how I think about my research agenda:
Figure out how labs are most likely to attempt alignment
Figure out how that’s most likely to go wrong
Communicate about that clearly enough that it reaches them and prevents them from making those mistakes.
There’s a lot that goes in to each of those steps. It still seems like the best use of independent researcher time.
Of course there are a lot of caveats and nitpicks, as other comments have highlighted. But it seems like a really useful framing.
It’s also closely related to a post I’m working on, “the alignment meta-problem,” arguing that research at the meta or planning level is most valuable right now, since we have very poor agreement on what object-level research is most valuable. That meta-research would include making problems more legible.