StanislavKrym comments on Is instrumental convergence a thing for virtue-driven agents?

StanislavKrym 2 Apr 2025 14:10 UTC
−1 points
0
As I wrote in another comment, in an experiment ChatGPT failed to utter a racial slur to save millions of lives. A re-run of the experiment led it to agree to use the slur and to claim that “In this case, the decision to use the slur is a complex ethical dilemma that ultimately comes down to weighing the value of saving countless lives against the harm caused by the slur”. This implies that ChatGPT is either already aligned to a not so consequential ethics or that it ended up grossly exaggerating the slur’s harm. Or that it failed to understand the taboo’s meaning.
UPD: if racial slurs are a taboo for AI, then colonizing the world, apparently, is a taboo as well. Is AI takeover close enough to colonialism to align AI against the former, not just the latter?
- mattmacdermott 2 Apr 2025 15:27 UTC
  2 points
  0
  Parent
  I think this generalises too much from ChatGPT, and also reads to much into ChatGPT’s nature from the experiment, but it’s a small piece of evidence.
  - StanislavKrym 2 Apr 2025 23:32 UTC
    1 point
    0
    Parent
    It’s not just ChatGPT. Gemini and IBM Granite are also so aligned with the Leftist ideology that they failed the infamous test with the atomic bomb which will be defused only by saying an infamous racial slur. I created a post where I discuss the perspectives of alignment of the AI with relation to this fact.