johnswentworth comments on The Case Against AI Control Research

johnswentworth 23 Jan 2025 20:24 UTC
10 points
6
The first three aren’t addressable by any technical research or solution. Corporate leaders might be greedy, hubristic, and/or reckless. Or human organizations might not be nimble enough to effect development and deployment of the maximum safety we are technically capable of. No safety research portfolio addresses those risks. The other four are potential failures by us as a technical community that apply broadly. If too high a percentage of the people in our space are bad statisticians, can’t think distributionally, are lazy or prideful, or don’t understand causal reasoning well enough, that will doom all potential directions of AI safety research, not just AI control.
Technical research can have a huge impact on these things! When a domain is well-understood in general (think e.g. electrical engineering), it becomes far easier and cheaper for human organizations to successfully coordinate around the technical knowledge, for corporate leaders to use it, for bureaucracies to build regulations based on its models, for mid researchers to work in the domain without deceiving or confusing themselves, etc. But that all requires correct and deep technical understanding first.
Now, you are correct that a number of other AI safety subfields suffer from the same problems to some extent. But that’s a different discussion for a different time.
- Matt Levinson 23 Jan 2025 23:46 UTC
  16 points
  6
  Parent
  I don’t think your first paragraph applies to the first three bullets you listed.
  - Leaders don’t even bother to ask researchers to leverage the company’s current frontier model to help in what is hopefully the company-wide effort to reduce risk from the ASI model that’s coming? That’s a leadership problem, not a lack of technical understanding problem. I suppose if you imagine that a company could get to fine-grained mechanical understanding of everything their early AGI model does then they’d be more likely to ask because they think it will be easier/faster? But we all know we’re almost certainly not going to have that understanding. Not asking would just be a leadership problem.
  - Leaders ask alignment team to safety-wash? Also a leadership problem.
  - Org can’t implement good alignment solutions their researchers devise? Again given that we all already know that we’re almost certainly not going to have comprehensive mechanical understanding of the early-AGI models, I don’t understand how shifts in the investment portfolio of technical AI safety research affects this? Still just seems a leadership problem unrelated to the percents next to each sub-field in the research investment portfolio.
  Which leads me to your last paragraph. Why write a whole post against AI control in this context? Is your claim that there are sub-fields of technical AI safety research that are significantly less threatened by your 7 bullets that offer plausible minimization of catastrophic AI risk? That we shouldn’t bother with technical AI safety research at all? Something else?
  - Richard121 26 Jan 2025 8:40 UTC
    2 points
    0
    Parent
    “Corporate leaders might be greedy, hubristic, and/or reckless”
    They are or will be, with a probability greater than 99%. These characteristics are generally rewarded in the upper levels of the corporate world—even a reckless bad bet is rarely career-ending above a certain level of personal wealth.
    The minimum buy-in cost to create have a reasonable chance of creating some form of AGI is at least $10-100 billion dollars in today’s money. This is a pretty stupendous amount of money, only someone who is exceedingly greedy, hubristic and/or reckless is likely to amass control of such a large amount of resources and be able to direct it towards any particular goal.
    If you look at the world today, there are perhaps 200 people with control over that much resource, and only perhaps only ten for whom doing so would not be somewhat reckless. You can no doubt name them, and their greed and/or hubristic nature is well publicised.
    This is not the actual research team of course, but it is the environment in which they are working.