I have a different intuition here; I would much prefer the alignment team at e.g. DeepMind to be working at DeepMind as opposed to doing their work for some “alignment-only” outfit. My guess is that there is a non-negligible influence that an alignment team can have on a capabilities org in the form of:
The alignment team interacting with other staff either casually in the office or by e.g. running internal workshops open to all staff (like DeepMind apparently do)
The org consulting with the alignment team (e.g. before releasing models or starting dangerous projects)
Staff working on raw capabilities having somewhere easy to go if they want to shift to alignment work
I think the above benefits likely outweigh the impact of the influence in the other direction (such as the value drift from having economic or social incentives linked to capabilities work)
My sense is that this “they’ll encourage higher ups to think what they’re doing is safe” thing is a meme. Misaligned AI, for people like Yann Lecunn, is not even a consideration; they think it’s this stupid uninformed fearmongering. We’re not even near the point that Phillip Morris is, where tobacco execs have to plaster their webpage with “beyond tobacco” slogans to feel good about themselves—Demis Hassabis literally does not care, even a little bit, and adding alignment staff will not affect his decision making whatsoever.
I have a different intuition here; I would much prefer the alignment team at e.g. DeepMind to be working at DeepMind as opposed to doing their work for some “alignment-only” outfit. My guess is that there is a non-negligible influence that an alignment team can have on a capabilities org in the form of:
The alignment team interacting with other staff either casually in the office or by e.g. running internal workshops open to all staff (like DeepMind apparently do)
The org consulting with the alignment team (e.g. before releasing models or starting dangerous projects)
Staff working on raw capabilities having somewhere easy to go if they want to shift to alignment work
I think the above benefits likely outweigh the impact of the influence in the other direction (such as the value drift from having economic or social incentives linked to capabilities work)
My sense is that this “they’ll encourage higher ups to think what they’re doing is safe” thing is a meme. Misaligned AI, for people like Yann Lecunn, is not even a consideration; they think it’s this stupid uninformed fearmongering. We’re not even near the point that Phillip Morris is, where tobacco execs have to plaster their webpage with “beyond tobacco” slogans to feel good about themselves—Demis Hassabis literally does not care, even a little bit, and adding alignment staff will not affect his decision making whatsoever.
But shouldn’t we just ask Rohin Shah?
Even a little bit? Are you sure? https://www.lesswrong.com/posts/ido3qfidfDJbigTEQ/have-you-tried-hiring-people?commentId=wpcLnotG4cG9uynjC