Could this be resolved by wanting to not kill, rather than not wanting to kill? My understanding is that “Don’t want X” acts as a filter of generated plans, while “Want to do X” (or “Want to not do X”) will either generate plans, or at least sort of act as an optimization pressure.
People are often defined by negative values in the sense of “don’t do stuff that hurt” or “don’t do stuff that result in lower status”, although it might be better to phrase that as “actively avoid doing things that result in pain/lower status”.
In English, “to not want X” ordinarily means “to want not-X”, not merely an absence of wanting X. I don’t know how common this is in languages generally, but in French at least, “je ne veux pas X” behaves the same way, and Google Translate suggests the same is true of many others. In fact, I would be surprised to find a language in which absence of wanting was as easy to express as want and not-want are.
Could this be resolved by wanting to not kill, rather than not wanting to kill? My understanding is that “Don’t want X” acts as a filter of generated plans, while “Want to do X” (or “Want to not do X”) will either generate plans, or at least sort of act as an optimization pressure.
People are often defined by negative values in the sense of “don’t do stuff that hurt” or “don’t do stuff that result in lower status”, although it might be better to phrase that as “actively avoid doing things that result in pain/lower status”.
In English, “to not want X” ordinarily means “to want not-X”, not merely an absence of wanting X. I don’t know how common this is in languages generally, but in French at least, “je ne veux pas X” behaves the same way, and Google Translate suggests the same is true of many others. In fact, I would be surprised to find a language in which absence of wanting was as easy to express as want and not-want are.