German writer of science-fiction novels and children’s books (pen name Karl Olsberg). I blog and create videos about AI risks in German at www.ki-risiken.de and youtube.com/karlolsbergautor.
Karl von Wendt
Thank you for being so open about your experiences. They mirror my own in many ways. Knowing that there are others feeling the same definitely helps me coping with my anxieties and doubts. Thank you also for organizing that event last June!
While writing, track an estimate of the mental state of a future reader—confusion, excitement, eyes glossing over, etc.
This may be true if you write a scientific paper, an essay or a non-fiction book. As a professional writer, when I write a novel, I usually don’t think about the reader at all (maybe because, in a way, I am the reader). Instead, I track a mental state of the character I’m writing about. This leads to interesting situations when a character “decides” to do something completely different from what I intended her to do, as if she had her own will. I have heard other writers describe the same thing, so it seems to be a common phenomenon. In this situation, I have two options: I can follow the lead of the character (my tracking of her mental state) and change my outline or even ditch it completely, or I can force her to do what the outline says she’s supposed to do. The second choice inevitably leads to a bad story, so tracking the mental state of your characters indeed seems to be essential to writing good fiction.
I assume that readers do a similar thing, so if a character in a book does something that doesn’t fit the mental model they have in mind, they often find it “unbelievable” or “unrealistic”, which is one of the reasons while “listen to your characters” seems to be good advice while writing.
Thanks for pointing this out! I should have made it clearer that I did not use ChatGPT to come up with a criticism, then write about it. Instead, I wanted to see if even ChatGPT was able to point out the flaws in LeCun’s argument, which seemed obvious to me. I’ll edit the text accordingly.
As a participant, I probably don’t fit the “typical” AISC profile: I’m a writer, not a researcher (even though I’ve got a Ph.D. in symbolic AI), I’m at the end of my career, not the beginning (I’m 61). That I’m part of AISC is due to the fact that this time, there was a “non-serious” topic included in the camp’s agenda: Designing an alignment tabletop role-playing game (based on an idea by Daniel Kokotajlo). Is this a good thing?
For me, it certainly was. I came to AISC mostly to learn and get connections into the AI alignment community, and this worked very well. I feel like I know a lot less about alignment than I thought I knew at the start of the camp, which is a sure sign that I learned a lot. And I made a lot of great and inspiring contacts, even friendships, some of which I think will stay long after the camp is over. So I’m extremely happy and grateful that I had the opportunity to participate.
But what use am I to AI alignment? Well, together with another participant, Jan Kirchner, I did try to contribute an idea, but I’m not sure how helpful that is. However, one thing I can do: As a writer, I can try to raise awareness for the problem. That is the reason I participated in the first place. I see a huuuuuge gap between the importance and urgency of AI alignment and the attention it gets outside the community, among people who probably could do something about it, e.g. politicians and “established” scientists. For example, in Germany, we have the “Institut für Technikfolgenabschätzung” (ITAS) which claims on its website to be the leading institute for technology assessment. I asked them whether they are working on AI alignment. Apparently, they aren’t even aware that there IS a problem. The same seems to be true for the scientific establishment in the rest of Germany and the EU.
You may question how helpful it is to get people like them to work on alignment. But I think that if we hope to solve the problem in time, we need as much attention on it as possible. There are some smart people at ITAS and elsewhere, and it would be great to get them to work on the problem, even if it seems a bit late. Maybe we need just one brilliant idea, and the more people are searching for it, the more likely it is to find it, I think. It could also be that there is no solution, in which case it is even more important that as many people as possible agree on that, the more established and accepted, the better. If we need regulation, or try to implement a global ban or freeze on AGI research, we need as much support as possible.
So that’s what I’m trying to do, with my limited outreach outside of the AI alignment community. My participation in AISC taught me many things and helped me get my message straight. A lot of it will probably find its way into my next novel. And maybe our tabletop RPG will also help spreading the message. All in all, I think it was a good idea to broaden the scope of AISC a bit, and I recommend doing it again. Thank you very much, Remmelt, Daniel, and all the others for taking me in!
“my fellow humans get nice stuff” happens to be the weird unpredictable desire that I ended up with at the equilibrium of reflection on the weird unpredictable godshatter that ended up inside me
This may not be what evolution had “in mind” when it created us. But couldn’t we copy something like this into a machine so that it “thinks” of us (and our descendants) as its “fellow humans” who should “get nice stuff”? I understand that we don’t know how to do that yet. But the fact that Eliezer has some kind of “don’t destroy the world from a fellow human perspective” goal function inside his brain seems to mean a) that such a function exists and b) that it can be coded into a neuronal network, right?
I was also thinking about the specific way we humans weigh competing goals and values against each other. So while for instance we do destroy much of the biosphere by blindly pursuing our misaligned goals, some of us still care about nature and animal welfare and rain forests, and we may even be able to prevent total destruction of them.
Thanks for pointing this out! I agree that my defintion of “optimism” is not the only way one can use the term. However, from my experience (and like I said, I am basically an optimist), in a highly uncertain situation, the weighing of perceived benefits vs risks heavily influences ones probability estimates. If I want to found a start-up, for example, I convince myself that it will work. I will unconsciously weigh positive evidence higher than negative. I don’t know if this kind of focusing on the positiv outcomes may have influenced your reasoning and your “rosy” view of the future with AGI, but it has happened to me in the past.
“Optimism” certainly isn’t the same as a neutral, balanced view of possibilities. It is an expression of the belief that things will go well despite clear signs of danger (e.g. the often expressed concerns of leading AI safety experts). If you think your view is balanced and neutral, maybe “optimism” is not the best term to use. But then I would have expected much more caveats and expressions of uncertainty in your statements.
Also, even if you think you are evaluating the facts unbiased and neutral, there’s still the risk that others who read your texts will not, for the reaons I mention above.
I have strong-upvoted this post because I think that a discussion about the possibility of alignment is necessary. However, I don’t think an impossibility proof would change very much about our current situation.
To stick with the nuclear bomb analogy, we already KNOW that the first uncontrolled nuclear chain reaction will definitely ignite the atmosphere and destroy all life on earth UNLESS we find a mechanism to somehow contain that reaction (solve alignment/controllability). As long as we don’t know how to build that mechanism, we must not start an uncontrollable chain reaction. Yet we just throw more and more enriched uranium into a bucket and see what happens.
Our problem is not that we don’t know whether solving alignment is possible. As long as we haven’t solved it, this is largely irrelevant in my view (you could argue that we should stop spending time and resources at trying to solve it, but I’d argue that even if it were impossible, trying to solve alignment can teach us a lot about the dangers associated with misalignment). Our problem is that so many people don’t realize (or admit) that there is even a possibility of an advanced AI becoming uncontrollable and destroying our future anytime soon.
Thank you for the correction!
Yes. But my impression so far is that anything we can even imagine in terms of a goal function will go badly wrong somehow. So I find it a bit reassuring that at least one such function that will not necessarily lead to doom seems to exist, even if we don’t know how to encode it yet.
That’s the kind of sentence that I see as arguments for believing your assessment is biased.
Yes, my assessment is certainly biased, I admitted as much in the post. However, I was referring to your claim that LW (in this case, me) was “a failure in rational thinking”, which sounds a lot like Mitchell’s “ungrounded speculations” in my ears.
Of course she gave supporting arguments, you just refuse to hear them
Could you name one? Not any of Mitchell’s argument, but a support for the claim that AI x-risk is just “ungrounded speculation” despite decades of alignment research and lots of papers proving various failures in existing AIs?
In other words you side with Tegmark on insisting to take the question literally, without noticing that both Lecun and Mitchell admit there’s no zero risk
I do side with Tegmark. LeCun compared the risk to an asteroid x-risk, which Tegmark quantified as 1:100,000,000. Mitchell refused to give a number, but it was obvious that she would have put it even below that. If that were true, I’d agree that there is no reason to worry. However, I don’t think it is true. I don’t have a specific estimate, but it is certainly above 1% IMO, high enough to worry about in any case.
As for the style and tone of this exchange, instead of telling me that I’m not listening/not seeing Mitchell’s arguments, it would be helpful if you could tell me what exactly I don’t see.
The first rule is that ASI is inevitable, and within that there are good or bad paths.
I don’t agree with this. ASI is not inevitable, as we can always decide not to develop it. Nobody will even lose any money! As long as we haven’t solved alignment, there is no “good” path involving ASI, and no positive ROI. Thinking that it is better that player X (say, Google) develops ASI first, compared to player Y (say, the Chinese) is a fallacy IMO because if the ASI is not aligned with our values, both have the same consequence.
I’m not saying focusing on narrow AI is easy, and if someone comes up with a workable solution for alignment, I’m all for ASI. But saying “ASI is inevitable” is counterproductive in my opinion, because it basically says “any sane solution is impossible” given the current state of affairs.
Thank you very much for sharing this—it is very helpful to me! I agree that academics, in particular within the EU, but probably also everywhere else, are an underutilized and potentially very valuable resource, especially with respect to AI governance. Your post seems to support my own view that we should be talking about “uncontrollable AI” instead of “misaligned AGI/superintelligence”, which I have explained here: https://www.lesswrong.com/posts/6JhjHJ2rdiXcSe7tp/let-s-talk-about-uncontrollable-ai
but your own post make me update toward LW being a failure of rational thinking, e.g. it’s an echo chamber that makes your ability to evaluate reality weaker, at least on this topic.
I don’t see you giving strong arguments for this. It reminds me of the way Melanie Mitchell argued: “This is all ungrounded speculation”, without giving any supporting arguments for this strong claim.
Concerning the “strong arguments” of LeCun/Mitchell you cite:
AIs will likely help with other existential risks
Yes, but that’s irrelevant to the question of whether AI may pose an x-risk in itself.
foom/paperclip are incoherent bullshit
Nobody argued pro foom, although whether this is “incoherent bullshit” remains to be seen. The orthogonality thesis is obviously true, as demonstrated by humans every day.
intelligence seems to negatively correlate with power trip
I can’t see any evidence for that. The smartest people may not always be the ones in power, but the smartest species on earth definitely is. Instrumental goals are a logical necessity for any rational agent, including power-seeking.
We know from actual political advertising experiments their data was far too crude to make any impact
I don’t think this invalidates the point that microtargeting can be very effective.
Thank you for posting this, as I find it helpful for practicing my own skills of argumentation. Here are my brief counterarguments to your counterarguments, I’d appreciate it if anyone could point out any flaws in my logic:
A. Contra “superhuman AI systems will be goal-directed”
As far as I understand it, “intelligence” is the ability to achieve one’s goals through reasoning and making plans, so a highly intelligent system is goal-directed by definition. Less goal-directed AIs are certainly possible, but they must necessarily be considered less intelligent—the thermometer example illustrates this. Therefore, a less goal-directed AI will always lose in competition against a more goal-directed one.
B. Contra “goal-directed AI systems’ goals will be bad”
The supposed counterexample of artificially generated human faces is in fact a case in point in my opinion. These faces aren’t like humans at all. They’re not three-dimensional. They’re not moving. They don’t talk. They don’t smell. They’re not soft and don’t radiate warmth. Oh, we didn’t mention that was important, right? We just gave the AI a reward function that enabled it to learn how to generate pictures that look like photographs of real people. If that’s what we want, then little differences on the pixel level probably don’t matter much. The differences between the paperclips Bostrom’s paperclip maximizer makes and a perfect paperclip probably won’t matter much, either. To put it another way, these fake humans are only “good” if we lower our expectations to the point where they’re already met.C. Contra “superhuman AI would be sufficiently superior to humans to overpower humanity”
Even if “human success isn’t from individual intelligence”, this doesn’t mean that human intelligence is not the decisive factor making us the dominant species. Individual intelligence is what enables collective intelligence in the first place. I agree that humans shouldn’t be seen as a universal benchmark for intelligence, but that only means that the bar for developing an uncontrollable AI may be even lower. It took us humans more than 2,000 years to collectively master Go. It took AlphaGo Zero three days from scratch to beat us. AI may one day be sufficiently good at manipulating and controlling humans to take over the world even without being “superintelligent” in all aspects. It could be way more intelligent in the relevant ways, like AlphaGo Zero compared to a child learning to play Go. I believe there is no upper boundary for manipulation skills and other forms of gaining power. So whether intelligence is an overwhelming advantage is probably a matter of scale.However AI systems have one serious disadvantage as employees of humans: they are intrinsically untrustworthy, while we don’t understand them well enough to be clear on what their values are or how they will behave in any given case. Even if they did perform as well as humans at some task, if humans can’t be certain of that, then there is reason to disprefer using them.
Really? Look at how we use AI today, e.g. in letting it decide what we see, hear and believe, who gets on parole from prison, and who gets a loan. It seems to me that humans tend to trust AI already more than other humans, in particular if they don’t understand how it works.
I have some goals. For instance, I want some good romance. My guess is that trying to take over the universe isn’t the best way to achieve this goal. The same goes for a lot of my goals, it seems to me. Possibly I’m in error, but I spend a lot of time pursuing goals, and very little of it trying to take over the universe.
Imagine you had a magic wand or a genie in a bottle that would fulfill every wish you could dream of. Would you use it? If so, you’re incentivized to take over the world, because the only possible way of making every wish come true is absolute power over the universe. The fact that you normally don’t try to achieve that may have to do with the realization that you have no chance. If you had, I bet you’d try it. I certainly would, if only so I could stop Putin. But would me being all-powerful be a good thing for the rest of the world? I doubt it.
D. Contra the whole argument
No, AI is not like a corporation run by humans. AI is more like an alien life form. It does not have intrinsic human motives and values. We may be able to tame it or to give it a beneficial goal, but unless we do, if it can, it will transform the world in very weird and probably unforeseen ways. Apart from that, corporations are currently wreaking a lot of havoc on the world (e.g. climate change), which is a good example of how difficult it is to give a powerful entity a beneficial goal.
A question for Eliezer: If you were superintelligent, would you destroy the world? If not, why not?
If your answer is “yes” and the same would be true for me and everyone else for some reason I don’t understand, then we’re probably doomed. If it is “no” (or even just “maybe”), then there must be something about the way we humans think that would prevent world destruction even if one of us were ultra-powerful. If we can understand that and transfer it to an AGI, we should be able to prevent destruction, right?
Thanks for adding this!
I think even most humans don’t have a “dominance” instinct. The reasons we want to gain money and power are also mostly instrumental: we want to achieve other goals (e.g., as a CEO, getting ahead of a competitor to increases shareholder value and make a “good job”), impress our neighbors, generally want to be admired and loved by others, live in luxury, distract ourselves from other problems like getting older, etc. There are certainly people who want to dominate just for the feeling of it, but I think that explains only a small part of the actual dominant behavior in humans. I myself have been a CEO of several companies, but I never wanted to “dominate” anyone. I wanted to do what I saw as a “good job” at the time, achieving the goals I had promised our shareholders I would try to achieve.
Thank you for your reply and the clarifications! To briefly comment on your points concerning the examples for blind spots:
superintelligence does not magically solve physical problems
I and everyone I know on LessWrong agree.
evolution don’t believe in instrumental convergence
I disagree. Evolution is all about instrumental convergence IMO. The “goal” of evolution, or rather the driving force behind it, is reproduction. This leads to all kinds of instrumental goals, like developing methods for food acquisition, attack and defense, impressing the opposite sex, etc. “A chicken is an egg’s way of making another egg”, as Samuel Butler put it.
orthogonality thesis equates there’s no impact on intelligence of holding incoherent values
I’m not sure what you mean by “incoherent”. Intelligence tells you what to do, not what to want. Even complicated constructs of seemingly “objective” or “absolute” values in philosophy are really based on the basic needs we humans have, like being part of a social group or caring for our offspring. Some species of octopuses, for example, which are not social animals, might find the idea of caring for others and helping them when in need ridiculous if they could understand it.
the more intelligent human civilization is becoming, the gentler we are
I wish that were so. We have invented some mechanisms to keep power-seeking and deception in check, so we can live together in large cities, but this carries only so far. What I currently see is a global deterioration of democratic values. In terms of the “gentleness” of the human species, I can’t see much progress since the days of Buddha, Socrates, and Jesus. The number of violent conflicts may have decreased, but their scale and brutality have only grown worse. The way we treat animals in today’s factory farms certainly doesn’t speak for general human gentleness.
oilI: Could you name one reason (not from Mitchell) for questioning the validity of many works on x-risk in AIs?
Ilio: Intelligence is not restricted to agents aiming at solving problems (https://www.wired.com/2010/01/slime-mold-grows-network-just-like-tokyo-rail-system/) and it’s not even clear that’s the correct conceptualisation for our own minds (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7305066/).
Thanks for that. However, my definition of “intelligence” would be “the ability to find solutions for complex decision problems”. It’s unclear whether the ability of slime molds to find the shortest path through a maze or organize in seemingly “intelligent” ways has anything to do with intelligence, although the underlying principles may be similar.
I haven’t read the article you linked in full, but at first glance, it seems to refer to consciousness, not intelligence. Maybe that is a key to understanding the difference in thinking between me, Melanie Mitchell, and possibly you: If she assumes that for AI to present an x-risk, it has to be conscious in the way we humans are, that would explain Mitchell’s low estimate for achieving this anytime soon. However, I don’t believe that. To become uncontrollable and develop instrumental goals, an advanced AI would probably need what Joseph Carlsmith calls “strategic awareness”—a world model that includes the AI itself as a part of its plan to achieve its goals. That is nothing like human experience, emotions, or “qualia”. Arguably, GPT-4 may display early signs of this kind of awareness.
I see how my above question seems naive. Maybe it is. But if one potential answer to the alignment problem lies in the way our brains work, maybe we should try to understand that better, instead of (or in addition to) letting a machine figure it out for us through some kind of “value learning”. (Copied from my answer to AprilSR:) I stumbled across two papers from a few years ago by a psychologist, Mark Muraven, who thinks that the way humans deal with conflicting goals could be important for AI alignment (https://arxiv.org/abs/1701.01487 and https://arxiv.org/abs/1703.06354).They appear a bit shallow to me and don’t contain any specific ideas on how to implement this. But maybe Muraven has a point here.