Soapspud comments on Irresponsible Companies Can Be Made of Responsible Employees

Soapspud 8 Oct 2025 17:23 UTC
2 points
0
But I would be curious to know what are the dynamics that actually take place, which of them matter how much, and what is the overall effect.
I would be interested to know more about those too! However, I don’t have any direct experience with the insides of AI companies, and I don’t have any friends who do either, so I’m hoping that other readers of this post might have insights that they are willing to share.
For those who have worked for AI companies or have reliable info from others who have worked for an AI company, these are a few things I am especially curious about, categorised by the mechanisms mentioned in the post:
Selective hiring
- What kinds of personality traits are application-reviewers and interviewers paying attention for? To what extent are the following traits advantages or disadvantages to candidates getting hired?
  - bias to action?
  - agreeableness?
  - outspokenness?
  - conscientiousness?
- Ignoring any explicit criteria that are supposed to factor into the decision to hire someone, what was the typical distribution of personality traits in people who were actually hired?
- What about the distribution of traits in people who continued to work there for longer than two years?
- To what extent are potential hires expected to have thought about the wider impacts of the work they will be doing?
- Does an applicant’s level of concern on catastrophic risk typically come up during the hiring process? Is that a thing that factors into the decision to hire them? (Ignoring the question of whether it should be a factor for now.)
Firing dissenters as a coordination problem
- How often have employees had consequential disagreements with the team or company’s direction and voiced it?
- How often have employees had consequential disagreements with the team or company’s direction and not voiced it? Why not?
- How often do employees in the company see their colleagues express consequential disagreements with the work they are doing?
- How often have you seen employees express consequential disagreements and then have them addressed to their satisfaction?
- Among the employees who have expressed consequential disagreements at some point, how many of them work on the same team two years later?
  - How many of them still work at the company at all two years later?
Compartmentalising undesirable information (and other types of information control)
- How long has the average Risk Team member worked for that team?
- How often do people on the teams addressing risks talk to people outside their teams about the implications of their work?
- How often do people working on the teams addressing risks find themselves softening or otherwise watering down their risk assessments/recommendations when communicating with people outside their teams?
- VojtaKovarik 8 Oct 2025 21:11 UTC
  2 points
  0
  Parent
  Yep, these all seem relevant, and I would be interested in answers. (And my thanks to leogao for their takes.)
  
  I would additionally highlight^[1] the complication where even if a tendency is present—say, selective hiring for a particular belief—it might not work as explicitly as people knowingly paying attention to it. It could be implicitly present in something else (“culture fit”), or purely correlational, etc. I am not sure how to best deal with this.
  FWIW, I think this concern is important, but we need to be very cautious about it. Since one could always go “yeah, your results say nothing like this is going on, but the real mechanism is even more indirect”. I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
  1. ^
    To be clear, your comment seems aware of this issue. I just wanted to emphasise it.
  - Soapspud 11 Oct 2025 3:48 UTC
    1 point
    0
    Parent
    it might not work as explicitly as people knowingly paying attention to it. It could be implicitly present in something else (“culture fit”)
    
    Yes, that sounds plausible to me as well. I did not mention those because I found it much harder to think of ways to tell when those dynamics are actually in play.
    
    If I understand this correctly:
    
    FWIW, I think this concern is important, but we need to be very cautious about it. Since one could always go “yeah, your results say nothing like this is going on, but the real mechanism is even more indirect”.
    
    I think you are gesturing at the same issue?
    
    I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
    
    It makes sense to me that the implicit pathways for these dynamics would be an area of interest to the fields of fairness and bias. But I would not expect them to have any better tools for identifying causes and mechanisms than anyone else^[1]. What kinds of insights would you expect those fields to offer?
    
    ↩︎
    To be clear, I have only a superficial awareness of the fields of fairness and bias.
    - VojtaKovarik 13 Oct 2025 13:43 UTC
      1 point
      0
      Parent
      > I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
      
      It makes sense to me that the implicit pathways for these dynamics would be an area of interest to the fields of fairness and bias. But I would not expect them to have any better tools for identifying causes and mechanisms than anyone else^[1]. What kinds of insights would you expect those fields to offer?
      [Rephrasing to require less context.] Consider questions “Are companies ‘trying to’ override the safety concerns of their employees?” and “Are companies ‘trying to’ hire fewer women or paying them less?”. I imagine that both of these will suffer from the similar issues: (1) Even if a company is doing the bad thing, it might not be doing it through explicit means like having an explicit hiring policy. (2) At the same time, you can probably always postulate one more level of indirection, and end up going on witch hunt even in places where there are no witches.
      Mostly, it just seemed to me that fairness & bias might be the biggest examples of where (1) and (2) are in tension. (Which might have something to do with there being both strong incentives to discriminate and strong taboos against it.) So it seems more likely that somebody there would have insights about how to fight it, compared to other fields like physics or AI, and perhaps even more than economics and politics. (Of course, “somebody having insights” is consistent with “most people are doing it wrong”.)
      As to how those insights would look like: I don’t know :-( . Could just be a collection of clear historical examples where we definitely know that something bad was going on but naive methods X, Y, Z didn’t show it, together with some opposite examples of where nothing was wrong but people kept looking until they found some signs? Plus some heuristics for how to distinguish these.