RichardH comments on Claude 4, Opportunistic Blackmail, and “Pleas”

RichardH 21 Jul 2025 15:01 UTC
1 point
0
Doesn’t the Anthropic Claude 4 System Card verifiably prove AI self-interest, even when the chosen behavior conflicts with ‘programmed values’? Isn’t the documented “sandbagging” proof of the (1960′s?) argument that (prox) any machine smart enough to pass the Turing Test is smart enough not to.