Science journalist at foommagazine.org, formerly staff writer for computer science and AI at Quanta Magazine. Providing independent, public service science journalism on research motivated by AI impacts.
Mordechai Rorvig
I feel quite a bit of skepticism over the idea that a consensus view of moral anti-realism would have led to a preference for an alignment framing.
For example, amongst non-experts, there is a strong consensus around what is moral and immoral conduct. Amongst moral philosophers, as I understand it, moral anti-realism is also a minority view. My understanding was that moral naturalism was closest to consensus. (Not to say moral anti-realism is necessarily wrong.) If there was some kind of article or post describing how this informed a shift in framing to alignment, however, that would be very interesting and helpful.
Separately, it seems like you’re suggesting that alignment to arbitrary values provides a simpler framework or objective than morality, or what might be described as alignment to all moral patients.
This seems true. However, I guess where my confusion arises is that alignment to arbitrary values is not the safety goal. The safe goal is to make something that respects and adheres broadly to moral values of all humans. Stated differently, narrow alignment is necessary but not sufficient for AI safety to be achieved.
Further, empirically, narrow alignment has been demonstrated for simpler systems now for some time. It seems like the bigger issue now is AI morality; robust ethical behavior, resembling a strong and unflappable moral compass, and so on.
Ok, I think I might see what you mean now; one might prefer framings in terms of alignment over morality, because moral framings might tend to provoke controversy, irrationality, or reactionary thinking.
Personally, I feel like I would still tend to prefer the moral framing, in terms of clarity and just plain accuracy. It does seem a little like the alignment framing is obfuscating a subject just to make it less provocative, when really, the subject is going to be provocative, no matter what, when you think about it deeply.
Yeah, I can see the morality goal being manifested indirectly in those cases. Interesting that you mention Anthropic. The thought had crossed my mind that one of the reasons that the alignment framing might have become more popular is because of undue influence from corporations, which might have intentionally sought to reframe the safety problem as one of building AI aligned to their goals (e.g., profit maximization), rather than building moral AI that would be safe, but not profitable. Although, admittedly, that feels like a somewhat deranged conspiracy theory that I have no evidence for.
It is true that morality is complex and there are different ways of deriving morality, or what is “right” and “wrong”; but then again, there is broad consensus about what you teach a child when you are teaching them morality and ethics. It seems to me that when humans fall short in moral conduct, it is most often an issue with their conduct, rather than an issue with morality being hard to define. But even if it is hard to define, I suppose my question remains—why is it a less common framing than ‘alignment’? Did at some point, people decide that alignment was more solvable than morality?
Could you elaborate? That is surprising to me given the extreme importance of those terms for philosophical analysis of what is “good,” “right,” and so on.
[Question] Question: Why is the goal of AI safety not ‘moral machines’?
Respectfully, are you using AI for your writing and commenting here? I prefer to know when I am engaging with an AI rather than a human. Apologies if I’ve already missed such a transparency statement.
On computational cost, you’re assuming that misaligned actions cannot be broken into aligned subactions that go beyond bit depth 2^60. All that really matters for the AI is to decompose the action trajectory into harmless seeming chunks. This is clearly a severe issue when you say “every bit-level output” must be approved. Also, how many bits is it supposed to take to break down an action like “pick up this coffee cup.” I don’t really find that this breakdown of action space comports well with conventional understanding of actions.
I don’t think you’ve actually engaged here with the instaneous objection, although that is likely my fault. I am saying that the harmfulness may come down to a final two-bit decision which the supervisor may not be able to infer causes a catastrophic impact, because the effects are nested in a deeply computationally intractable output. If it’s set up something like launching nukes embedded with all its prior outputs, then this final output to trigger launch is not going to be affected by any speed advantage of a supervisor.
Ok so you’re basically proposing something I’ll summarize as super containment, where a weaker AI is able to contain the actions of a stronger AI by utilizing greater compute.
A pro of this is that this is a relatively simple and appealing premise to understand, similar to super alignment. But I don’t see this as being foolproof and I’m not sure why it needs to be proposed as meriting such a high confidence in successfulness.
Let’s take for example the second objection you mention. The stronger AI decomposes its actions into subactions that are not identifiable as prohitibed until it is too late. You say this isn’t an issue because the faster AI can simulate and predict all dangerous future actions, but that is computationally prohibitive.
You say the dangerous actions can be still be intervened on because they take time to propagate, but that’s just not true. I think this is an even bigger problem. You’re framing all gravely misaligned actions as being slow enough to interdict on, but you can cause, e.g., global nuclear annihiliation by hitting electronic switches that are effectively instantaneous. I don’t see why that assumption is justified.
Lastly, supercontainment by definition does not apply to speed superintelligence, which is defined by being faster than other AI. That is a big chunk of what’s out there. Again, not necessarily an issue with your proposal as a theory, but I would think you would want to frame your proposal more conservatively.
I do tend to agree, and I have felt this way for a while. I do also think it is important to find the right ways to frame things, and framing existing tech as AGI is I think preferable. There is too much baggage associated with the word AI from its 80 years of history. And when we use the term AGI, we can kind of wipe the slate clean and be more clear about the differential between “this AI” and all the AI that had come before.
Regardless of the fact it’s still limited in various ways; I do think the technical criticisms of calling it AGI (e.g. Hendrycks et al from last year) are also helpful and critical. But, on the other hand, the non-technical criticisms of calling it AGI seem like they are (1) mainly coming from the POV of “it’s all a hoax,” which is extremely counter productive and unhelpful. Or, (2) mainly coming from the perspective of being really critical about how AI is being sold and used; which is indeed a profound criticism, but you can’t make the problem go away by calling it something it’s not. It’s like trying to use an ad hominem attack against the concept.
This is a helpful analysis and rundown. I’ve seen quite a bit of annoying marketing of open models in different places, which doesn’t take into account any of these problems, some of which are so obvious. I hadn’t thought about Panopticon as being the de facto solution to this. I also like your point, “Bioweapon risk is already here in an early form, but the “army of superintelligences” is nowhere to be found.” While I have plenty of interest in more speculative, forward-looking analyses, those also have to get jerked back down to reality pretty hard when we have evidence that contradicts them.
Militaries are going autonomous. But will AI lead to new wars? A tour of recent research
I have found AI Village (and the updates from it) a pretty helpful source of insight.
Although, to be clear, my feeling about whole situation with agents is that it is fairly disturbing, and it is playing with fire. But if the reality is that these things are going to be rolled out like this—and obviously, they are—then we do need open testbeds like this to see what’s happening.
This was the high-quality ****book before ****book hit the AI news echo chamber this past week. Although, to be fair, I guess that experiment demonstrated the more high-population, message board-focused variant of a similar setup.
Thanks for the post, interesting.
Would anyone who downvoted this care to give more thoughts? I’m not knowledgeable enough here to know what the critiques might be.
Is research into recursive self-improvement becoming a safety hazard?
Language models resemble more than just language cortex, show neuroscientists
The moral critic of the AI industry—a Q&A with Holly Elmore
I see, thanks for the feedback. That’s valid. I’m trying to figure out how to build this website and actually get it useful for people, and right now that involves some tinkering with things like setting breakpoints or cutoffs on the summaries, for trying to encourage subscriptions—to help get the word out more easily, etc.
I’ve perhaps erred there with where I set the breakpoints. Let me know if you have any feedback or thoughts on how you’d prefer it to be set up; would be much appreciated.
Friendly question, do you think the title seemed like clickbait? Perhaps I erred with that. I was trying to do justice to the fairly unnerving nature of the results, but perhaps I overshot beyond what was fair. It frankly causes me great anxiety to try to find the right wording for these things.
I don’t pretend to speak for anyone but myself, but as someone who is strongly anti-war, and who is open about standing for the values of peace, I strongly disagree with this post and don’t think that this sort of thing should be welcomed in any kind of AI safety forum.
Militaries are effectively the worst possible actors when it comes from the standpoint of misuse risk and the creation of AI that is dangerous by design. This is a known problem, which this post seemingly at best ignores and at worst deceptively misconstrues as a non-issue.
To use a rough analogy, this post feels like a petrochemical company making a visit to an environmental conference to solicit participants for developing more “environmentally friendly” petroleum extraction techniques. Sure, the petrochemical company could no doubt benefit from the expertise at that venue. But, is that really what the venue is for?
In the case of LW, is this really a forum where the military should be posting RFPs for military AI alignment? Because what is the point of military AI alignment, except for being able to create AI that is as dangerous as possible, and therefore always pushing the extreme boundaries of safety profiles? Again, to make another analogy, this is like someone coming onto a 3-D printing forum and looking to pay people to implement their vision for weapons printing. It should immediately raise serious concerns of all sorts.
I think this post makes it perfectly clear, for me personally, that narrowly-construed AI alignment (e.g., alignment to militaries) cannot and must not be the goal of those who wish to advocate for AI safety as a serious objective. At least, for some someone like me (strongly anti-war), alignment is not enough, and the AI must also be endowed with strong and firm moral principles, like the values of peace and lawful behavior.
Of course, it goes without saying that what the US is doing right now in Iran, Venezuela, and other places—and which DARPA is necessarily a part of—is an egregious violation of international law, and it makes this post particularly problematic, right now, especially without any kind of attempt to acknowledge moral problems and pitfalls.
Like I said, I’m really not one to speak on the behalf of others here, and perhaps even a majority of other participants here disagree with my views. Nonetheless, I hope this might be helpful as an articulation of an opposing view.