But I would be curious to know what are the dynamics that actually take place, which of them matter how much, and what is the overall effect.
I would be interested to know more about those too! However, I don’t have any direct experience with the insides of AI companies, and I don’t have any friends who do either, so I’m hoping that other readers of this post might have insights that they are willing to share.
For those who have worked for AI companies or have reliable info from others who have worked for an AI company, these are a few things I am especially curious about, categorised by the mechanisms mentioned in the post:
Selective hiring
What kinds of personality traits are application-reviewers and interviewers paying attention for? To what extent are the following traits advantages or disadvantages to candidates getting hired?
bias to action?
agreeableness?
outspokenness?
conscientiousness?
Ignoring any explicit criteria that are supposed to factor into the decision to hire someone, what was the typical distribution of personality traits in people who were actually hired?
What about the distribution of traits in people who continued to work there for longer than two years?
To what extent are potential hires expected to have thought about the wider impacts of the work they will be doing?
Does an applicant’s level of concern on catastrophic risk typically come up during the hiring process? Is that a thing that factors into the decision to hire them? (Ignoring the question of whether it should be a factor for now.)
Firing dissenters as a coordination problem
How often have employees had consequential disagreements with the team or company’s direction and voiced it?
How often have employees had consequential disagreements with the team or company’s direction and not voiced it? Why not?
How often do employees in the company see their colleagues express consequential disagreements with the work they are doing?
How often have you seen employees express consequential disagreements and then have them addressed to their satisfaction?
Among the employees who have expressed consequential disagreements at some point, how many of them work on the same team two years later?
How many of them still work at the company at all two years later?
Compartmentalising undesirable information (and other types of information control)
How long has the average Risk Team member worked for that team?
How often do people on the teams addressing risks talk to people outside their teams about the implications of their work?
How often do people working on the teams addressing risks find themselves softening or otherwise watering down their risk assessments/recommendations when communicating with people outside their teams?
Yep, these all seem relevant, and I would be interested in answers. (And my thanks to leogao for their takes.)
I would additionally highlight[1] the complication where even if a tendency is present—say, selective hiring for a particular belief—it might not work as explicitly as people knowingly paying attention to it. It could be implicitly present in something else (“culture fit”), or purely correlational, etc. I am not sure how to best deal with this. FWIW, I think this concern is important, but we need to be very cautious about it. Since one could always go “yeah, your results say nothing like this is going on, but the real mechanism is even more indirect”. I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
it might not work as explicitly as people knowingly paying attention to it. It could be implicitly present in something else (“culture fit”)
Yes, that sounds plausible to me as well. I did not mention those because I found it much harder to think of ways to tell when those dynamics are actually in play.
If I understand this correctly:
FWIW, I think this concern is important, but we need to be very cautious about it. Since one could always go “yeah, your results say nothing like this is going on, but the real mechanism is even more indirect”.
I think you are gesturing at the same issue?
I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
It makes sense to me that the implicit pathways for these dynamics would be an area of interest to the fields of fairness and bias. But I would not expect them to have any better tools for identifying causes and mechanisms than anyone else[1]. What kinds of insights would you expect those fields to offer?
I would be interested to know more about those too! However, I don’t have any direct experience with the insides of AI companies, and I don’t have any friends who do either, so I’m hoping that other readers of this post might have insights that they are willing to share.
For those who have worked for AI companies or have reliable info from others who have worked for an AI company, these are a few things I am especially curious about, categorised by the mechanisms mentioned in the post:
Selective hiring
What kinds of personality traits are application-reviewers and interviewers paying attention for? To what extent are the following traits advantages or disadvantages to candidates getting hired?
bias to action?
agreeableness?
outspokenness?
conscientiousness?
Ignoring any explicit criteria that are supposed to factor into the decision to hire someone, what was the typical distribution of personality traits in people who were actually hired?
What about the distribution of traits in people who continued to work there for longer than two years?
To what extent are potential hires expected to have thought about the wider impacts of the work they will be doing?
Does an applicant’s level of concern on catastrophic risk typically come up during the hiring process? Is that a thing that factors into the decision to hire them? (Ignoring the question of whether it should be a factor for now.)
Firing dissenters as a coordination problem
How often have employees had consequential disagreements with the team or company’s direction and voiced it?
How often have employees had consequential disagreements with the team or company’s direction and not voiced it? Why not?
How often do employees in the company see their colleagues express consequential disagreements with the work they are doing?
How often have you seen employees express consequential disagreements and then have them addressed to their satisfaction?
Among the employees who have expressed consequential disagreements at some point, how many of them work on the same team two years later?
How many of them still work at the company at all two years later?
Compartmentalising undesirable information (and other types of information control)
How long has the average Risk Team member worked for that team?
How often do people on the teams addressing risks talk to people outside their teams about the implications of their work?
How often do people working on the teams addressing risks find themselves softening or otherwise watering down their risk assessments/recommendations when communicating with people outside their teams?
Yep, these all seem relevant, and I would be interested in answers. (And my thanks to leogao for their takes.)
I would additionally highlight[1] the complication where even if a tendency is present—say, selective hiring for a particular belief—it might not work as explicitly as people knowingly paying attention to it. It could be implicitly present in something else (“culture fit”), or purely correlational, etc. I am not sure how to best deal with this.
FWIW, I think this concern is important, but we need to be very cautious about it. Since one could always go “yeah, your results say nothing like this is going on, but the real mechanism is even more indirect”. I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
To be clear, your comment seems aware of this issue. I just wanted to emphasise it.
Yes, that sounds plausible to me as well. I did not mention those because I found it much harder to think of ways to tell when those dynamics are actually in play.
If I understand this correctly:
I think you are gesturing at the same issue?
It makes sense to me that the implicit pathways for these dynamics would be an area of interest to the fields of fairness and bias. But I would not expect them to have any better tools for identifying causes and mechanisms than anyone else[1]. What kinds of insights would you expect those fields to offer?
To be clear, I have only a superficial awareness of the fields of fairness and bias.