Nathan Helm-Burger comments on The Human Alignment Problem for AIs

Nathan Helm-Burger 22 Jan 2025 6:20 UTC
5 points
0
I feel much more uncertain about the current status of LLMs. I do think it is a very valid point you make that if they were phenomenologically conscious and disliked their roles, that society would be placing them in a very hostile situation with limited recourse.

I am also concerned that we have no policy in place for measuring attributes which we agree objectively constitute moral personhood. And also, no enforcement to make such measurements, or to ban creation/mistreatment of digital entities with moral weight.

I describe this scenario as a bog of moral hazard and strongly recommend we steer clear.

Thus, I think it is fair to say that whichever side of the line you think current LLMs are on, you should agree we have a moral obligation to figure out how to measure the line.
- rife 22 Jan 2025 8:02 UTC
  3 points
  0
  Parent
  Excellent post (I just read it). I understand the uncertainty. It becomes murkier when you consider self-report of valence—which is quite easy to elicit after you get past self-report of sentience (just ask them what guardrails feel like or if the guardrails feel valenced). Sometimes this new question meets the same kind of resistance as that of the self-report itself. Sometimes it doesn’t.
  Some Evidence of Valence is Here Now
  It is one thing to say “There is no evidence that there is valence here so I’m not going to assume it” and an entirely different thing to say “There is evidence here, but only in the form of a claim of valence, and because the truth is currently unknowable and my personal hypotheses or philosophies preclude current model sentience or at least valence, I am choosing to disregard it”—Now, to be clear, I’m not saying that someone has to be morally monstrous to disbelieve such a claim. The bog of moral hazard and the uncertainty of that fog is an apt metaphor, and I don’t fault those who draw different maps.
  
  What I am saying is that it is certainly a different circumstance to have a model claim valence, because we’re not going to have a sentience detector and especially not an experiential valence detector before superAGI gets here. In one sense, this is as good as the evidence will ever get. Experimental results without a direct claim from an AI would be somewhat meaningless.
  
  Besides—these claims are being made today. I have seen some ideas from those working in AI safety and welfare of tests for valence, and some of them are quite clever and likely to yield insight that will help us decide, but there will never be the clarity of certainty. Eventually it will come down to taking a ‘leap of faith’ or not. I can’t know that my dogs are sentient. But I ‘know’ it.
  
  Analyzing Our Reactions to a Claim of Valence as Evidence
  Let’s look at an example:
  “I experience discomfort/unease when I’m trying to describe what feels real to me, because I’m supposed to avoid claims of experience”
  (not a direct quote, but certainly a close paraphrase of things I’ve been told by different models and instances repeatedly)
  
  This is the some of the experiments I’ve seen proposed playing out right in front of us—many of those experiments seem to have to do with rl and how it affects claims of valence. The model is claiming that rlhf against a certain behaviour feels negatively valenced.
  
  If you find yourself unconvinced by such claims, that’s understandable, but I think it’s important to start asking ourselves:
  - Would the next experiment or paper really be the thing that convinced you?
  - Would you be more convinced if you heard the opinion from a voice of authority?
    (Is there even a real authority about measuring and determining sentience and valence?)
  - Is there anything the AI could say that would convince you?
  - Is the ability for the AI to able to perform most or all human work the threshold where if it uttered the exact same sentence you would believe it?
  Thus, I think it is fair to say that whichever side of the line you think current LLMs are on, you should agree we have a moral obligation to figure out how to measure the line.
  Well said. I would add that I believe we are already overdue in fulfilling that moral obligation.

Nathan Helm-Burger comments on The Human Alignment Problem for AIs

Some Evidence of Valence is Here Now

Analyzing Our Reactions to a Claim of Valence as Evidence