“Humanity” is a weird word at the moment. I think it’s more of a “descendants of our ancestors” thing—I think LLMs trained on humanity’s content should probably be viewed as related to us in important ways, ways that a hypothetical LLM trained only on interaction with octopi or slime molds would not be as related. But this may be a weird view, so let’s ignore it henceforth.
I think the “benefit humanity” rule is actually much broader than you’re reading it as:
Secondly, as an animal advocate, I want to preserve the opportunity for AI to make a post that will benefit animal welfare, even if the post doesn’t benefit humanity.
Your welfare is intertwined with that of animals. You are distressed by their suffering. Therefore, improvements in animal welfare would be expected to cause improvements in your welfare. I think an AI making a post which benefits animal welfare would benefit humanity, because it would be good for all the humans who will feel better in a world where animals suffer less. To put it simply, I claim that all posts which benefit animal welfare in a way that’s legible to you are also benefiting you. Kind of a big claim, but I can’t come up with a counterexample—maybe you can?
Since there are humans who care about AI wellbeing and are upset by the possibility that AIs could be suffering needlessly, it seems to follow that a post which reduced preventable suffering for AIs would benefit those humans.
The rule isn’t demanding that posts benefit ALL of humanity. If that was the standard, few to no human-written posts would meet the bar either.
Maybe. I think there’s a level on which we ultimately demand that AI’s perception of values to be handled through a human lens. If you zoom out too far from the human perspective, things start getting really weird. For instance, if you try to reason for the betterment of all life in a truly species-agnostic way, you start getting highly plausible arguments for leaving bacterial or fungal infections untreated, as the human host is only one organism but the pathogens number in the millions of individuals.(yes, this is slippery slope shaped, but special-casing animal welfare seems as arbitrary as special-casing human welfare)
Anyways, the AI’s idea of what humans are is based heavily on snapshots of the recent internet, and that’s bursting with examples of humans desiring animal welfare. So if a model trained on that understanding of humanity’s goals attempts to reason about whether it’s good to help animals, it’d better conclude that humans will probably benefit from animal welfare improvements, or something has gone horribly wrong. Do you think it’s realistically plausible for humanity to develop into a species which we recognize as still human, but no individual prefers happy cute animals over sad ones? I don’t.