A consolidated list of bad or incomplete solutions could have considerable didactic value—it could keep people learn more about the various challenges involved.
Ben Smith
Nature: “Stop talking about tomorrow’s AI doomsday when AI poses risks today”
Who Aligns the Alignment Researchers?
A brief review of the reasons multi-objective RL could be important in AI Safety Research
Biden-Harris Administration Announces First-Ever Consortium Dedicated to AI Safety
The intelligence-sentience orthogonality thesis
Signaling Virtuous Victimhood as Indicators of Dark Triad Personalities
AMC’s animated series “Pantheon” is relevant to our interests
Can we achieve AGI Alignment by balancing multiple human objectives?
Sets of objectives for a multi-objective RL agent to optimize
It seems like even amongst proponents of a “fast takeoff”, we will probably have a few months of time between when we’ve built a superintelligence that appears to have unaligned values and when it is too late to stop it.
At that point, isn’t stopping it a simple matter of building an equivalently powerful superintelligence given the sole goal of destroying the first one?
That almost implies a simple plan for preparation: for every AGI built, researchers agree together to also build a parallel AGI with the sole goal of defeating the first one. perhaps it would remain dormant until its operators indicate it should act. It would have an instrumental goal of protecting users’ ability to come to it and request the first one be shut down..
You haven’t factored in the possibility Putin gets deposed by forces inside Russia who might be worried about a nuclear war and conditional on use of tactical nukes, intuitively that seems likely enough to materially lower p(kaboom).
I have a few technical quibbles here.
-
It’s not quite accurate to imply Philosophical Transactions of the Royal Society B itself has made a claim about the robustness of MCB. Generally only an editorial endorsed by the editorial board should be taken as a statement about the journal’s position in particular. Generally academic journals provide a forum for academic debate. Only the authors of an article are really standing behind the position. The journal only publishes work of a certain standard, and publishing a paper is somewhat an endorsement of the quality of work in the paper, but not of the finding itself.
-
Generally I understand it is not a given that MCB will work. Only the sulphur dioxide solution is really proven, and “sulphur” makes me nervous (I don’t know if there are good grounds for that). More research is needed on MCB, which a point in favour that the research should be funded and carried out as soon as possible.
-
It may be that researchers could do research into MCB without it being seen as an endorsement by the entire field that we don’t need other solutions. MCB can and should be presented as important experimental work that needs to be done, as a last resort. When it is actually proven, I think that’s when we have the dilemma about what to tell the public. But at that point, looking at the status quo, it may be our only option left.
-
Hey Steve, I am reading through this series now and am really enjoying it! Your work is incredibly original and wide-ranging as far as I can see—it’s impressive how many different topics you have synthesized.
I have one question on this post—maybe doesn’t rise above the level of ‘nitpick’, I’m not sure. You mention a “curiosity drive” and other Category A things that the “Steering Subsystem needs to do in order to get general intelligence”. You’ve also identified the human Steering Subsystem as the hypothalamus and brain stem.
Is it possible things like a “curiosity drive” arises from, say, the way the telenchephalon is organized, rather than from the Steering Subsystem itself? To put it another way, if the curiosity drive is mainly implemented as motivation to reduce prediction error, or fill the the neocortex, how confident are you in identifying this process with the hypothalamus+brain stem?
I think I imagine the way in which I buy the argument is something like “steering system ultimately provides all rewards and that would include reward from prediction error”. But then I wonder if you’re implying some greater role for the hypothalamus+brain stem or not.
I know it’s a touchy topic. In my defense, the research is solid, published in social psychology’s top journal. I suppose the study deals with rhetoric in a political context. This community has a long history of drawing on social and cognitive psychological research to understand fallacies of thought and rhetoric (HPMOR), and I posted in that tradition. Apologies if I have strayed a little too far into a politicized area.
One needn’t see this study as a shot at any particular political side—I can imagine people engaging ‘virtuous-victimhood-signalling’ within a wide range of different politicized narratives, as well as in completely apolitical contexts.
It also shouldn’t be read to delegitimize victims from speaking out about their perspective! But perhaps it does provide evidence that sympathy can be weaponized in rhetorical conflict. We can all recognize this in political opponents and be blind to it amongst political allies.
I found the Clark et al. (2019) “Bayesing Qualia” article very useful, and that did give me an intuition of the account that perhaps sentience arises out of self-awareness. But they themselves acknowledged in their conclusion that the paper didn’t quite demonstrate that principle, and I didn’t find myself convinced of it.
Perhaps what I’d like readers to take away is that sentience and self-awareness can be at the very least conceptually distinguished. Even if it isn’t clear empirically whether or not they are intrinsically linked, we ought to maintain a conceptual distinction in order to form testable hypotheses about whether they are in fact linked, and in order to reason about the nature of any link. Perhaps I should call that “Theoretical orthogonality”. This is important to be able to reason whether, for instance, giving our AIs a self-awareness or situational awareness will cause them to be sentient. I do not think that will be the case, although I do think that, if you gave them the sort of detailed self-monitoring feelings that humans have, that may yield sentience itself. But it’s not clear!
I listened to the whole episode with Bach as a result of your recommendation! Bach hardly even got a chance to express his ideas, and I’m not much closer to understanding his account of
meta-awareness (i.e., awareness of awareness) within the model of oneself which acts as a ‘first-person character’ in the movie/dream/”controlled hallucination” that the human brain constantly generates for oneself is the key thing that also compels the brain to attach qualia (experiences) to the model. In other words, the “character within the movie” thinks that it feels something because it has meta-awareness (i.e., the character is aware that it is aware (which reflects the actual meta-cognition in the brain, rather than in the brain, insofar the character is a faithful model of reality).
which seems like a crux here.
He sort of briefly described “consciousness as a dream state” at the very end, but although I did get the sense that maybe he thinks meta-awareness and sentience are connected, I didn’t really hear a great argument for that point of view.
He spent several minutes arguing that agency, or seeking a utility function, is something humans have, but that these things aren’t sufficient for consciousness (I don’t remember whether he said whether they were necessary, so I suppose we don’t know if he thinks they’re orthogonal).
That was an inspiring and enjoyable read!
Can you say why you think AUP is “pointless” for Alignment? It seems to me attaining cautious behavior out of a reward learner might turn out to be helpful. Overall my intuition is it could turn out to be an essential piece of the puzzle.
I can think of one or two reasons myself, but I barely grasp the finer points of AUP as it is, so speculation on my part here might be counterproductive.
I think your point is interesting and I agree with it, but I don’t think Nature are only addressing the general public. To me, it seems like they’re addressing researchers and policymakers and telling them what they ought to focus on as well.
Owning a house doesn’t give you fewer ongoing costs. It tends to give you less costs overall, but that’s heavily contingent on rental and mortgage rates. And it’s actually more administrative hassle, because you have to spend money on rates (local property taxes), repairs, and so on. The main thing owning a house gives you is it gives you is stability in terms of predicting future price changes.
There’s not just acceptance at stake here. Medical insurance companies are not typically going to buy into a responsibility to support clients’ morphological freedom, as if medically transitioning is in the same class of thing as a cis person getting a facelift
woman getting a boob job, because it is near-universally understood this is an “elective” medical procedure. But if their clients have a “condition” that requires “treatment”, well, now insurers are on the hook to pay.A lot of mental health treatment works the same way imho—people have various psychological states, many of which get inappropriately shoehorned into a pathology or illness narrative in order to get the insurance companies to pay.
All this adds a political dimension to the not inconsiderable politics of social acceptance.