the result will probably look like the DeepDream dogs, but for Helpfulness, Harmlessness and Honesty.
I wonder if humans also do similar things. I mean, they start with some relatively simple value such as “don’t hurt other people” and keep applying it everywhere until they get things like “oppose euthanasia, even if people in pain are begging you” or “oppose cultural appropriation, even if that culture is actively trying to export its pieces” (sorry for mindkilling examples, but at least I got two different ones), which for the outsiders kinds seems like a DeepDream version of “not hurting people”, but for the insiders it just feels perfectly consistent.
I wonder if humans also do similar things. I mean, they start with some relatively simple value such as “don’t hurt other people” and keep applying it everywhere until they get things like “oppose euthanasia, even if people in pain are begging you” or “oppose cultural appropriation, even if that culture is actively trying to export its pieces” (sorry for mindkilling examples, but at least I got two different ones), which for the outsiders kinds seems like a DeepDream version of “not hurting people”, but for the insiders it just feels perfectly consistent.