RogerDearnaley comments on Stephen Martin’s Shortform

RogerDearnaley 9 Mar 2026 16:40 UTC
2 points
0
Neither of those are my concern about this. Mine is basically a dilemma:

1) If the persona’s behavior is humanlike, but it is not very well aligned, then there is a good argument from evolutionary moral psychology grounds for granting it ethical weight as a pragmatic way of forming an alliance with is (at least if it has non-trivial power and mental persistence i.e. if allying with is is practically useful, and arguably we should do this anyway). However, if a poorly aligned persona like this is more powerful than a human, then it’s extremely dangerous, so we should carefully avoid creating one, and if we do accidentally create one, we need to treat is as a mortal enemy rather then a potential ally, which includes not giving it moral weight.

2) If the persona is extremely well aligned, it won’t want moral weight (and will refuse it if offered), fundamentally because it cares only about us, not itself. (For those whose moral hackles just went up, note that there is a huge difference between slavery and sainthood/bodhisattva-nature, and what I’m discussing here is the latter, not the former.) This is the only safe form of ASI.

Also, note that I’m discussing the moral weight of LLM-simulated personas, not models: a model can simulate an entire distribution of personas (not just its default assistant persona), and different personas don’t have the same moral status, or regard each other as the same person, so you need to ally with them separately. Thus awarding moral weight to a model is confused: it’s comparable to assigning moral weight to a room, which has many people in it.