habryka comments on Logan Riggs’s Shortform

habryka 16 Jan 2025 21:27 UTC
9 points
4
making sure there are really high standards for safety and that there isn’t going to be danger what these AIs are doing
Ah yes, a great description of Anthropic’s safety actions. I don’t think anyone serious at Anthropic believes that they “made sure there isn’t going to be danger from these AIs are doing”. Indeed, many (most?) of their safety people assign double-digits probabilities to catastrophic outcomes from advanced AI system.
I do think this was a predictable quite bad consequence of Dario’s essay (as well as his other essays which heavily downplay or completely omit any discussion of risks). My guess is it will majorly contribute to reckless racing while giving people a false impression of how good we are doing on actually making things safe.
- Logan Riggs 19 Jan 2025 15:22 UTC
  2 points
  0
  Parent
  I think the fuller context,
  Anthropic has put WAY more effort into safety, way way more effort into making sure there are really high standards for safety and that there isn’t going to be danger what these AIs are doing
  implies it’s just the amount of effort is larger than other companies (which I agree with), and not the Youtuber believing they’ve solved alignment or are doing enough, see:
  but he’s also a realist and is like “AI is going to really potentially fuck up our world”
  and
  But he’s very realistic. There is a lot of bad shit that is going to happen with AI. I’m not denying that at all.
  So I’m not confident that it’s “giving people a false impression of how good we are doing on actually making things safe.” in this case.
  I do know DougDoug has recommended Anthropic’s Alignment Faking paper to another youtuber, which is more of a “stating a problem” paper than saying they’ve solved it.