Veedrac comments on Veedrac’s Shortform

Veedrac 1 May 2025 0:44 UTC
4 points
0
I think Robert Miles does excellent introductory videos for newer people, and I linked him in the HN post. My goal here was different, though, which was to give a short, affirmative argument made of only directly defensible high probability claims.
I like your spin on it, too, more than those given in the linked thread, but it’s still looser, and I think there’s value giving an argument where it’s harder to disagree with the conclusion without first disagreeing with a premise. Eg. ‘some optimists assume we just won’t make AI with goals’ directly contradicts ‘capable systems necessarily have instrumental goals’, but I’m not sure it directly contradicts a premise you used.
- Seth Herd 1 May 2025 3:12 UTC
  3 points
  0
  Parent
  I donno, the systems we have seem pretty capable, and if they have instrumental goals they seem quite weak… so tossing in that claim seems like just asking for trouble. I do think that very capable systems almost need to have goals, but I have trouble making that argument even to alignment people and rationalists.
  
  That’s just one example, but the fact that it goes awry immediately hints that the whole direction is a bad idea.
  
  I think the argument for AI being quite-possibly dangerous is actually a lot stronger than the more abstract and technical argument usually used by rationalists. It doesn’t require any strong claims at all. People don’t need certainty to be quite alarmed, and for good reason.
  - Veedrac 1 May 2025 7:27 UTC
    4 points
    0
    Parent
    Standard xrisk arguments generally don’t extrapolate down to systems that don’t solve tasks that require instrumental goals. I think it’s reasonable to say common LLMs don’t exhibit many instrumental goals, but they also can’t solve for long-horizon goal-directed problem solving.
    Prosaic risks like biorisk evals often go further and ask, if we assume the AI systems aren’t themselves very capable at this task, can we still exhibit dangerous behaviors from them ‘in the loop’? These are legitimate and interesting questions but they are a different thing.