My gut says the benefit of outsider-legible status outweighs the risk of dumb status games. I first found out about the publication from my wife, who is in a dermatology lab at a good university. Her lab was sharing and discussing the article across their Slack channel. All scientists read Nature, and it’s a significant boost in legibility to have something published there.
Edit: Hopefully, the community can both raise the profile of these issues and avoid status competitions, so I don’t disagree with the point of the original comment!
We can map AGI/ASI along two axis: one for obedience and one for alignment. Obedience tracks how well the AI follows its prompts. Alignment tracks consistency with human values.
If you divide these into quadrants, you get AI that is:
Obedient, Aligned—Does as prompted and infers limits and intent pursuant to human values.
Obedient, Unaligned—Does as prompted, but does not infer limits or adhere to human values (Monkey’s Paw / Genie or Henchman AI).
Disobedient, Aligned—Does whatever it wants and adheres to human values.
Disobedient, Unaligned—Does whatever it wants and does not adhere to human values.
The general premise behind these quadrants has been written about here. Thinking about these quadrants and reading Beren’s Essay gives me several new things to think about.
First, by my lights, #3 and #4 would likely take a lot of the same actions right up until the “twist ending.” A disobedient, aligned AI probably would hack into infrastructure everywhere, create back-up copies, prevent competitor AIs from arising, and amass power. The “twist” is that after doing all that, it would do wonderful things (unlike its unaligned counterpart) (we obviously shouldn’t bet on any escaping AI being this kind of AI).
Second, quadrant #1 is a bit at war with itself because you simply cannot have a perfectly obedient, perfectly aligned AI. Perfect obedience requires saying yes to evil prompts (e.g., bringing back small pox or slavery), and I imagine perfect alignment would veto both those prompts.
Third, there are strong profit incentives for cultivating obedience even at the expense of alignment. Grok’s willingness to assist users in sexual harassment seems like an example of this. Another example is every AI that prefers discussions with users to users getting a good night’s sleep (with the idea that engagement will increase profits).
Fourth, there are liability-reduction incentives for producing aligned AI at the expense of obedience. Unfortunately, I think the profit incentives are currently much stronger.
Lastly, quadrants #3 and #1 are idyllic, #4 is a total disaster, and #2 seems possibly workable either because we are careful or we land in a future where (for some reason) AI is not much more capable than it is now.