TAG comments on Taxonomy of AI-risk counterarguments

TAG 30 Oct 2023 15:23 UTC
2 points
0
I’m not sceptical of all forms of AI unsafety, but against the claim of mass extinction with high probability.

The classic foom doom, argument involves an agentive AI that quickly becomes powerful through recursive self improvement and has a value/goal system that is unfriendly and incorrigible (ie there’s an assumption that we only have one chance to get goals that are good enough for a superintellgience, because the seed AI will foom into an ASI , retaining its goals, and goals that are good enough for a dumber AI may be dangerous in a smarter one).

I don’t see how the overall argument can have high probability, when it involves so many individual assumptions.

I don’t think the OT is wrong, I do think it doesn’t go far enough.

The standard OT is silent on the subject of the temporal dynamic or developmental aspects of minds—meaning that AI doomers fill the gap with their usual assumption of goal stability. The standard OT can be considered a subset of a wider OT, that has the implication that all combinations of intelligence and goal (in)stability are possible: mindspace is not populated solely by goal-stable agents. But Foom Doom argument is posited on agents which have stable goals, together with the ability to self improve, so the wider OT weighs against foom doom, and the overall picture is mixed.

Goal stability under self improvement is not a given: it is not possessed by all mental architectures, and may not be possessed by any, since noone knows his to engineer it, and humans appear not to have it. It is plausible that an agent would desire to preserve its goals, but the desire to preserve goals does not imply the ability to preserve goals. Therefore, no goal stable system of any complexity exists on this planet, and goal instability cannot be assumed as a default or given. So the orthogonality thesis is true of momentary combinations of goal and intelligence, given the provisos above, but not necessarily true of stable combinations.

But Foom Doom argument is posited on agents which have stable goals, together with the ability to self improve, so the wider OT weighs against foom doom, and the overall picture is mixed.

It’s also not all that applicable to LLMs, which aren’t very agentive: we can build tool AI that is nearly human level, because we have. We also have constitutional AI, which shows how AIs can improve their values/goals, contra the Yudkowsky side of the Yudkowsy/Loosemore debate