LessWrong dev & admin as of July 5th, 2022.
Thanks for the detailed response!
a case for AI alignment to be broadened to “alignment with all sentient beings”. And it seems to me such broadening creates a lot of new issues that the alignment field needs to think about.
This seems uncontroversial to me. I expect most people currently thinking about alignment would consider a “good outcome” to be one where the interests of all moral patients, not just humans, are represented—i.e. non-human animals (and potentially aliens). If you have other ideas in mind that you think have significant philosophical or technical implications for alignment, I’d be very interested in even a cursory writeup, especially if they’re new to the field.
For me, an AI is not really aligned if it only aligns with humans
Yep, see above.
It seems that you might be thinking that total extinction is bad for animals. But I think it’s the reverse, most animals live net negative lives, so their total extinction could be good “for them”. In other words, it sounds plausible to me an AI that makes all nonhuman animals go extinct could be (but also possibly not be) one that is “aligned”.
I think total extinction is bad for animals compared to counterfactual future outcomes where their interests are represented by an aligned AI. I don’t have a strong opinion on how it compares to the current state of affairs (but, purely on first-order considerations, it might be an improvement due to factory farming).
But misaligned AI can also create suffering/create things that cause suffering.
Agreed in principle, though I don’t think S-risks are substantially likely.
Can you go into a bit of detail about how the paper is relevant to AI alignment? I read most of it (and skimmed a few sections that looked less relevant), and the section titled Can AI systems make ethically sound decisions? was the closest to being relevant, but didn’t seem to meaningful engage with the core concerns of AI alignment.
Can AI systems make ethically sound decisions?
The paper also didn’t include any discussion of the most significant impact we’d expect of misaligned AI on animals, i.e. total extinction.
Mod note: I’ve decided to mark this as a Personal post, since we generally don’t frontpage organizational announcements and it feels a bit like a sales pitch. In the future I’d also be interested in reading about what you’ve learned as an organization about e.g. teaching epistemics, which would better meet Frontpage guidelines (“aim to explain, rather than persuade”).
I think there’s probably value in being on an alignment team at a “capabilities” org, or even embedded in a capabilities team if the role itself doesn’t involve work that contributes to capabilities (either via first-order or second-order effects).
I think that the “in the room” argument might start to make sense when there’s actually a plan for alignment that’s in a sufficiently ready state to be operationalized. AFAICT nobody has such a plan yet. For that reason, I think maintaining & improving lines of communication is very important, but if I had to guess, I’d say you could get most of the anticipated benefit there without directly doing capabilities work.
I wouldn’t call it an infohazard; generally that refers to information that’s harmful simply to know, rather than because it might e.g. advance timelines.
There are arguments to be made about how much overlap there is between capabilities research and alignment research, but I think by default most things that would be classified as capabilities research do not meaningful advance AI alignment. For that to be true, you’d need >50% of all capabilities work to advance alignment “by default” (and without requiring any active effort to “translate” that capabilities work into something helpful for alignment), since the relative levels of effort invested are so skewed to capabilities. See also https://www.lesswrong.com/tag/differential-intellectual-progress.
The view that I think you’re referring to is somewhat more nuanced. “Studying AI” and “trying to develop AI” are refer to fairly wide classes of activities, which may have very different risk profiles. If one buys the general class of arguments for AI risk, then “trying to develop AI” almost certainly means advancing AI capabilities, and shortens timelines (which is bad). “Studying AI” could mean anything from “doing alignment research” (probably good), “doing capabilities research” (probably bad), or something else entirely (the expected value of which would depend on specifics).
This seems like a misunderstanding of “overseer”-type proposals. ~Nobody thinks alignment is impossible; the rejection is the idea of using unaligned AGIs (or aligned-because-they’re-insufficiently-powerful AGIs) to reliably “contain” another unaligned AGI.
If OP doesn’t think nanotech is solvable in principle, I’m not sure where to take the conversation, since we already have an existence proof (i.e. biology). If they object to specific nanotech capabilities that aren’t extant in existing nanotech but aren’t ruled out by the laws of physics, that requires a justification.
I don’t actually see that you’ve presented an argument anywhere.
You could probably train a non-dangerous ML model that has superhuman theorem-proving abilities, but we don’t know how to formalize the alignment problem in a way that we can feed it into a theorem prover.
A model that can “solve alignment” for us would be a consequentialist agent explicitly modeling humans, and dangerous by default.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
Can you reread what I wrote?
If you come up with a way to build an AI that hasn’t crossed the rubicon of dangerous generality, but can solve alignment, that would very helpful. It doesn’t seem likely to be possible without already knowing how to solve alignment.
You aren’t going to get designs for specific nanotech, you’re going to get designs for generic nanotech fabricators.