A lot depends on how broadly you construe the field. There’s plenty of work in academia and at large labs on how to resist jailbreaks, improve RL on human feedback, etc. This is at least adjacent to AI safety work in your first category.
If you put a gun to my head and told me to make some guesses, there’s maybe like 600 people doing that sort of work, about 80 people more aware of alignment problems that get harder as AI gets smarter and so doing more centrally first-category work, about 40 people doing work that looks more like your second category (maybe with another 40 doing off-brand work in academia), and about 400 people doing AI safety work that doesn’t neatly fit into either group.
When this question was posted, I asked myself, what would be a “cynical” answer? What that means is, you ask yourself: given what I see and know, what would be a realistically awful state of affairs? So, not catastrophizing, but also having low expectations.
What my intuition came up with was, less than 10% working on user-centered alignment, and less than 1% on user-independent alignment. But I didn’t have the data to check those estimates against (and I also knew there would be issues of definition).
So let me try to understand your guesses. In my terminology, you seem to be saying:
1000 (600+400) doing AI safety work
600 doing work that relates to alignment
80 doing work on scalable user-centered alignment
80 (40+40) doing work on user-independent alignment
Sure, that’s one interpretation. If people are working on dual-use technology that’s mostly being used for profit but might sometimes contribute to alignment, I tend to not count them as “doing AI safety work,” but it’s really semantics.
A lot depends on how broadly you construe the field. There’s plenty of work in academia and at large labs on how to resist jailbreaks, improve RL on human feedback, etc. This is at least adjacent to AI safety work in your first category.
If you put a gun to my head and told me to make some guesses, there’s maybe like 600 people doing that sort of work, about 80 people more aware of alignment problems that get harder as AI gets smarter and so doing more centrally first-category work, about 40 people doing work that looks more like your second category (maybe with another 40 doing off-brand work in academia), and about 400 people doing AI safety work that doesn’t neatly fit into either group.
When this question was posted, I asked myself, what would be a “cynical” answer? What that means is, you ask yourself: given what I see and know, what would be a realistically awful state of affairs? So, not catastrophizing, but also having low expectations.
What my intuition came up with was, less than 10% working on user-centered alignment, and less than 1% on user-independent alignment. But I didn’t have the data to check those estimates against (and I also knew there would be issues of definition).
So let me try to understand your guesses. In my terminology, you seem to be saying:
1000 (600+400) doing AI safety work
600 doing work that relates to alignment
80 doing work on scalable user-centered alignment
80 (40+40) doing work on user-independent alignment
Sure, that’s one interpretation. If people are working on dual-use technology that’s mostly being used for profit but might sometimes contribute to alignment, I tend to not count them as “doing AI safety work,” but it’s really semantics.