Possibly, this is somewhat beside the point of the post. You say the following:
If you are the sort of person who is going to do AGI capabilities research—and I recommend against it—then I’d recommend doing it at places that are more likely to be able to keep their research private...
A model that only contains the two buckets “capabilities research” and “alignment research” seems too simplistic to me. What if somebody works on developing more interpretable methods to become more capable? In some sense, this is pure capabilities research, but it would probably help alignment a lot by creating systems that are easier to analyze. I particularly have in mind people who would do this kind of research because of the alignment benefits and not as an excuse to do capabilities research or as a post hoc justification for their research.
This seems worth pointing out, as I have met multiple people who would immediately dismiss this kind of research with “this is capabilities research (and therefore bad)”. And I think this reflexive reaction is counterproductive.
Possibly, this is somewhat beside the point of the post. You say the following:
A model that only contains the two buckets “capabilities research” and “alignment research” seems too simplistic to me. What if somebody works on developing more interpretable methods to become more capable? In some sense, this is pure capabilities research, but it would probably help alignment a lot by creating systems that are easier to analyze. I particularly have in mind people who would do this kind of research because of the alignment benefits and not as an excuse to do capabilities research or as a post hoc justification for their research.
This seems worth pointing out, as I have met multiple people who would immediately dismiss this kind of research with “this is capabilities research (and therefore bad)”. And I think this reflexive reaction is counterproductive.