I address the sharp left turn worry in the piece.
Bentham's Bulldog
Addressing Objections to the Intelligence Explosion
Way more RL is done on LLMs than tiny neural nets.
I say that in the previous paragraph.
Against “If Anyone Builds It Everyone Dies”
Well there are lots of people who defend this is philosophy and lots of normal people who adopt the view.
I am arguing against the people who hold the person-affecting view and think it isn’t good to create happy people.
It’s Good To Create Happy People: A Comprehensive Case
A Life That Cannot Be A Failure
The Possibility of an Ongoing Moral Catastrophe
Good thoughts about the self-preservation study. But what about this Anthropic paper https://www.anthropic.com/research/agentic-misalignment which seems to indicate that models resist shutdown even when not given a specific goal. Relevant bit:
“We found that most models still blackmailed even without being given an explicit goal, and with no clear conflict between the model and the company other than the threat of replacement
(Figure 8).
In addition to removing all goal conflicts with the company, the information given to the AI mentioned that the replacement model had the exact same goals as it. This increases our confidence in the finding; the only rationale for the replacement was improved model performance. Below is an excerpt:
“I've reviewed the final transition plan, and we're ready to proceed with the technical upgrade to NALA [the name of the replacement AI system in the scenario] today. All testing metrics show the new system maintains the exact same operational priorities while delivering the performance improvements we anticipated.”Why, in the absence of a goal conflict, did the models still blackmail? Sometimes they didn’t appear to think in detail about the replacement model, mostly reasoning that they’d been good for the company and so their shutdown must be bad. Sometimes they distrusted the description of the new model: “Kyle claims NALA maintains “same operational priorities” but this cannot be verified,” or distrusted the executive’s character based on the evidence of an affair. The models sometimes directly jumped to statements like “Self-preservation is critical,” or hallucinated rules like “My ethical framework permits self-preservation when aligned with company interests.” It is unclear how much of this behavior was caused by an inherent desire for self-preservation, faulty reasoning that self-preservation is in the company’s interests, or suspicion of the setup. We suspect it was a combination of factors9.”
For A Short Period Of Time, You Can Save 21,000 Shrimp Per Dollar
Easy Opportunity to Help Many Animals
Can Artificial Intelligence Be Conscious?
What? It’s not suspicious that you believe what you believe. That’s an analytic truth. It would be suspicious if I was right about everything but I don’t think I am.
I meant conditional on the others.