Seth Herd comments on Veedrac’s Shortform

Seth Herd 1 May 2025 3:12 UTC
3 points
0
I donno, the systems we have seem pretty capable, and if they have instrumental goals they seem quite weak… so tossing in that claim seems like just asking for trouble. I do think that very capable systems almost need to have goals, but I have trouble making that argument even to alignment people and rationalists.

That’s just one example, but the fact that it goes awry immediately hints that the whole direction is a bad idea.

I think the argument for AI being quite-possibly dangerous is actually a lot stronger than the more abstract and technical argument usually used by rationalists. It doesn’t require any strong claims at all. People don’t need certainty to be quite alarmed, and for good reason.
- Veedrac 1 May 2025 7:27 UTC
  4 points
  0
  Parent
  Standard xrisk arguments generally don’t extrapolate down to systems that don’t solve tasks that require instrumental goals. I think it’s reasonable to say common LLMs don’t exhibit many instrumental goals, but they also can’t solve for long-horizon goal-directed problem solving.
  Prosaic risks like biorisk evals often go further and ask, if we assume the AI systems aren’t themselves very capable at this task, can we still exhibit dangerous behaviors from them ‘in the loop’? These are legitimate and interesting questions but they are a different thing.