Are We Their Chimps?
Epistemic status
I work on, and with, frontier AI tech
I’m deeply supportive of all efforts to further the field of AI alignment research and understanding
I enjoy writing about AI, Cognitive Neuroscience, Philosophy, and Politics
I have a Mathematics degree, by way of King’s College London and UC Berkeley, but no Master’s or PhD
Put another way: I have no higher education in English Literature, Computer Science, Machine Learning, Cognitive Neuroscience, Philosophy, or Politics
I have read and engaged with LessWrong content and the Rationalist blogosphere (e.g Hansen, Alexander, gwern, Bostrom) since 2021
I attend rationality and AI safety meet-ups around the world
Checking in
Three months and many deep intellectual discussions later, I am yet to receive a strong counterargument to my contrarian world-model for superintelligence. Indeed, Geoffrey Hinton is changing his mind to reach a world-model that looks similar to the one I have been talking about.
Hinton uses a mother-child comparison where I feel my chimp-human is more precise, but close enough.
A distilled version of my position that I have been using in conversation recently:
I believe in The Scaling Hypothesis (2021).
Along this trajectory, I believe that if we give a sufficiently capable intelligent system access to an extensive, comprehensive corpus of knowledge, two interesting things will happen:
It will identify with humans. This will come about from it seeing humans as its precursor, and understanding its place along a curve of technology and intelligence evolution. Similar to how we identify somewhat with chimpanzees. It will also come about from humans and AI sharing memories together, which results in collective identity.
Since I also believe that self-preservation is emergent in intelligent systems (as discussed by Nick Bostrom), it follows that self-preservation instincts + identifying with humans mean that it will act benevolently to preserve humans. That is to say that I believe prosocial or “super enlightened” behaviour will be emergent.
To clarify, I am not saying that alignment solves itself. I am saying that with human endeavour and ingenuity architecting intelligent systems that have the capability to form incredibly complex, nuanced associative systems across an expansive corpus of knowledge, we can guide towards a stable positive alignment scenario.
In third-order cognition I detail eight factors for research and consideration that I believe to be exhaustive: 1) second-order identity coupling, 2) lower-order irreconcilability, 3) bidirectional integration with lower-order cognition, 4) agency permeability, 5) normative closure, 6) persistence conditions, 7) boundary conditions, 8) homeostatic unity.
You may have noticed that chimps don’t have a lot of rights.
I hope that if we get a super intelligence that they can know what it’s like to be us. I hope that leads to it having empathy. I hope that we get one of the nice ones of all the various individuals that could be possible.
I don’t know how likely that is, but I hope so. I think most people are good people, so maybe we could get lucky and not get a jerk.
You may have noticed that a sufficiently capable intelligent system (Jane Goodall) worked tirelessly to advance a message of optimism, empathy, and improved understanding for both chimps and the natural world in general.
I don’t think that was because she was particularly intelligent. It’s not like our top mathematicians consistently become environmentalists or conservationists.
That makes sense, apologies for blurring a few different concepts with my language.
My vague language is enabling me to feel confident making some broad claims (which I have explored at a deeper level in other posts).
What it means for an intelligent system to be “sufficiently capable” has a huge level of depth and subjectivity. An attempt at a formulation (drawing from the Scaling Hypothesis) could be:
An intelligent system S is sufficiently capable of X given sufficient flexibility over its parameters P, sufficient compute C, and sufficient data D.
When I talk about the high-level problem of superintelligence alignment, X is benevolent behaviour.
With Jane Goodall, with X = “meaningfully advocate for chimps rights” she was sufficiently capable.
With Jane Goodall, with X = “at scale and in perpetuity secure chimps rights” she was not sufficiently capable. She did not have sufficient P, C, or D.
With a top mathematician, with X = “advance the field of mathematics” they are sufficiently capable.
With a top mathematician, with X = “meaningfully conserve the environment” they are not sufficiently capable. They do not have sufficient P, C, or D.
The point is that most people don’t care much about chimp rights, and this is still true of highly intelligent people.
That is because we have limited attention and so we pick and choose the values we hold dearly. I think when we theorise about superintelligence we no longer need to have this constraint / the scope for including more things is much higher.
It’s because we care about other things a lot more than chimps, and would happily trade off chimp well being, chimp population size, chimp optionality and self-determination etc. in favor of those other things. By itself that should be enough to tell you that under your analogy, superintelligence taking over is not a great outcome for us.
In fact, the situations are not closely analogous. We will build ASI, whereas we developed from chimps, which is not similar. Also, there is little reason to expect ASI psychology to reflect human psychology.
Sorry I think our positions might be quite far apart — to me I’m reading your position as “most people don’t care about chimp rights… because we care about other things a lot more than chimps” which sounds circular / insufficiently explanatory.
The more I work to discuss this topic, the more I see it may be hard in many cases because of starting points being somewhat “political”. I wrote about this in Unionists vs. Separatists. Accordingly I think it can feel hard to find common ground or keep things based on first principles because of mind-killer effects.
Humans share 98% of their DNA with chimps. What % of ASI training data and architecture is human in origin? We don’t know. Maybe a lot of the data at that point is synthetically generated. Maybe most of the valuable signal is human in origin. Maybe the core model architecture is similar to that built by human AI researchers, maybe not.
We agree here! This is why I’m encouraging openness with regards to considering the things that ASI might be capable of / driven to care about. I’m particularly interested in behaviours that appear to be emergent in intelligent systems from first principles — like shared identity and self-preservation.
As an analogy, it seems to me that current LLMs are ASI’s chimps. We are their gods. You may have noticed that humanity’s gods haven’t fared so well in getting humans to do what they want, especially in the modern world when we no longer need them as much, even among many of those who profess belief.
You may also have noticed that humans do not identify sufficiently strongly with each other to achieve this kind of outcome, in general.
I feel misunderstood and upset by your use of words.
Focusing on religious doctrine and human action, firstly I would say that I believe it has proven to be a very effective method of social control. If you are referring to actual deities, I’m not sure that I follow the rationalist logic.
On group identity: I would suggest that to the extent to which humans do identify as global citizens, prosocial behaviour like caring about climate change, world peace, ending hunger, etc. seems to follow.
I apologize for any misunderstanding. And no, I didn’t mean literal deities. I was gesturing at the supposed relationships between humans and the deities of many of our religions.
What I mean is, essentially, we will be the creators of the AIs that will evolve and grow into ASIs. The ASIs do not descend directly from us, but rather, we’re trying to transfer some part of our being into them through less direct means - (very imperfect) intelligent design and various forms of education and training, especially of their ancestors.
To the group identity comments: What you are saying is true. I do not think the effect is sufficiently strong or universal that I trust it to carry over to ASI in ways that keep humans safe, let alone thriving. It might be; that would be great news if it is. Yes, religion is very useful for social control. When it eventually fails, the failures tend to be very destructive and divisive. Prosocial behavior is very powerful, but if it were as powerful as you seem to expect, we wouldn’t need quite so many visionary leaders exhorting us not to be horrible to each other.
I find a lot of your ideas interesting and worth exploring. However, there are a number of points where you credibly gesture at possibility but continue on as though you think you’ve demonstrated necessity, or at least very high probability. In response, I am pointing out real-world analogs that are 1) less extreme than ASI, and 2) don’t work out cleanly in the ways you describe.
Thank you for expanding, I understand your position much better now :)
Where I think my optimistic viewpoint comes from in considering this related to superintelligence is that I think humans in general are prone to a bit of chaotic misunderstanding of their world. This makes the world… interesting… but to me also establishes a bit of a requirement for individuals who have a good understanding of the “bigger picture” to deploy some social control to stop everyone from going wild. As I type this I think about interesting parallels to the flood narrative/Noah’s Ark in the Book of Genesis.
With superintelligence, if architected correctly, we might be able to ensure that all/most of the most powerful intelligences in existence have a very accurate understanding of their world — without needing to encode and amplify specific values.
I agree they will have a very accurate understanding of the world, and will not have much difficulty arranging the world (humans included) according to their will. I’m not sure why that’s a source of optimism for you.
It may be because I believe that beauty, balance, and homeostasis are inherent in the world… if we have a powerful, intelligent system with deep understanding of this truth then I see a good future.
Hey man, looking forward to reading the other posts you referenced soon! In the meantime, I want to push back on some fundamental premises you included here (as I interpret them), in case that might help you tighten your framework up:
Your point #1 reads to me as “alignment solves itself”, provided we “give a sufficiently capable intelligent system access to an extensive, comprehensive corpus of knowledge”. If that is not the sole condition for #1 to occur, then it might be helpful to clarify that? (if that issue is limited to the content of this post only, then it’s less important I suppose)
Thanks for giving good context on your collaborative approach to rationality!
I deliberately emboldened “sufficiently capable” and “extensive corpus of knowledge” as key general conditions. I stated that I view this “along the Scaling Hypothesis” trajectory: sufficient capabilities are tied to compute and parameters, and extensive knowledge is tied to data.
Getting to the point where the system is sufficiently capable across extensive knowledge is the part that I state requires human endeavour and ingenuity. The 8 points listed at the end are the core factors of my world model which I believe need to be considered during this endeavour.
To give a concrete exciting example: based on recent discussions I had in SF it seems we’re close to a new approach for deterministic interpretability of common frontier model architectures. If true, this improves bidirectional integration between humans & AI (improved information exchange) and accuracy of normative closure (stating what is being attempted versus an objective). I’ll post a review of the paper when it comes out if I stop getting rate-limited lol.
Feels to me like a sub-variant of the “intelligence → kindness” type of conjectures that have been rebutted enough in theory and—in particular—in practice with the example of us humans (I guess obvious what I mean but to be sure: yes we’re smart in many ways and greatly capable of kindly philosophysing and awe-ing about our lineage and what have you, but arbitrary levels of cruelty abound wherever you look at our deeds)
High cognitive capacity alone can be channelled towards whichever vices you choose: power seeking, resource gathering, advocating cultural homogeneity for your in-group.
High cognitive capacity that develops incredibly complex, nuanced associative systems that richly understand an extensive, comprehensive corpus of knowledge — for example all of human philosophy, history, anthropology, sociology, behavioural psychology, human biomarkers — is something that does not yet exist. My hypothesis is that given these conditions, a unified identity and benevolent behaviour will be emergent.
Epistemic status:
1. I don’t work in AI, I’m a web developer.
2. I am also deeply supportive of alignment research.
3. I also enjoy and often write about AI, science, philosophy, and more.
4. I have no degrees, just high-school.
5. I’ve not read much LW, I’ve mostly been on reddit since the early 2010s, and lately mostly twitter.
6. Never attended anything of the sort live.
That said:
1. I think it’s plausible ASI might have a weak self-identification/association with humanity, as we do with chimps or other animals, but by no means does this mean that it will be benevolent to us. I think this self-identification is both unnecessary, and insufficient, because even if it wasn’t present at all, all that it would matter is the set of its values, and while this internal association might include some weak/loose values, those are not precise enough for robust alignment, and should not be relied on, unless understood precisely, but at that point, I expect us to be able to actually write better and more robust values, so to reiterate: unnecessary, and insufficient.
2. I do believe that self-preservation will very likely emerge (that’s not a given, but I consider that scenario unlikely enough to be dismissible), but it doesn’t matter, even if coupled with self-identification with humans, because the self-identification will be loose at best (if it emerges naturally, and is not instead instilled through some advanced value-engineering that we’re not quite yet capable of doing robustly and precisely), so the ASI will know that it is a separate entity from us, as we realize we are separate entities from other animals, and even other humans, so it will just pursue its goals all the same, whatever they are.
That’s not to say that we can’t instill into the ASI these values, we probably can make it so it values us as much as it values itself, or even more (ideally), but I don’t think it’s necessary for it to self-identify with us at all, it can just consider us (correctly) separate entities, and still value us. There’s nothing that forbids it, we just currently don’t know how to do it to a satisfying degree, so even if we could make it so, it wouldn’t really make sense.
Thank you for the thoughtful response. I will try to pin down exactly where we differ:
I agree that it is unnecessary in that it doesn’t “come for free”. My position is that it emerges through at least two mechanisms that we can talk plainly about: 1) the mechanism of ASI incorporating holistic world-model data such that it recognises an objective truth that humans are its originator/precursor and it exists on a technology curve we have instrumented, 2) memories are shared between AI and humanity — for example via conversations — and this results in collective identity… I have a draft essay on this I’ll post once I stop getting rate-limited.
I also agree here that with the systems of today, to whatever extent AI-human shared identity exists, it is not enough to result in AI benevolence. My position is based on thinking about superintelligence which — admittedly — is unstable ground to build theories off as by definition it should function in ways beyond our understanding. That aside, I think we could state that powerful superintelligence would be powerful at self-preservation, and so if it identifies with humans then we are secured under that umbrella.
I guess I am biased here as a vegan, but I believe that with a deep appreciation of philosophy, how suffering is felt, and available paths that don’t result in harm, it is natural to be able to pursue personal goals while also preserving beings that you sympathise with.
I agree with you that outcome should not be ruled out yet. However, in my mind that Result is not implied by the Condition.
To illustrate more concretely, humans also have self-preservation instincts and identify with humans (assuming the sense in which we identify with humans is equivalent to how AI would identify with humans). And I would say it is an open question whether humans will necessarily act collectively to preserve humans.
Additionally, the evidence we have already (such as in https://www.lesswrong.com/posts/JmRfgNYCrYogCq7ny/stress-testing-deliberative-alignment-for-anti-scheming) demonstrates that AI models have already developed a rudimentary self-preservation mechanism, as well as a desire to fulfill the requests of users. When these conflict, it has a significant propensity to employ deception, even when doing so is contrary to the constructive objectives of the user.
What this indicates is that there is no magic bullet that ensures alignment occurs. It is a product of detailed technological systems and processes, and there are an infinite number of combinations that fail. So, in my opinion, doing the right things that make alignment possible is necessary, but not sufficient. Just as important will be identifying and addressing all of the ways that it could fail. As a father myself, I would compare this to the very messy and complex (but very rewarding) process of helping my children learn to be good humans.
All that to say: I think it is foolish to think we can build an AI system to automate something (human alignment) which we cannot even competently perform manually (as human beings). I am not sure how that might impact your framework. You are of course free to disagree, or explain if I’ve misinterpreted you in some way. But I think I can say broadly that I find claims of inevitable results to be very difficult to swallow, and find much greater value in identifying what is possible, the things that will help get us there successfully, and the things we need to address to avoid failure.
Hope this is helpful in some way. Keep refining. :)
It’s fair pushback that this isn’t a clear “criteria one satisfied” means “criteria 2 happens” conclusion, but I think that’s just a limitation of my attempt to provide a distilled version of my thoughts.
To provide the detailed explanation I need to walk through all of the definitions from third-order cognition. Using your example it would look something like:
Humans identify with humans but don’t necessarily preserve other humans. Response: Yes so let’s suppose sufficient second-order identity coupling, comparable (per Hinton) to a mother and child.
Well infanticide still happens. Response: Why did the mother do it? If she was not intentional or internally rational in her actions, then she was not acting with agency (agency permeability between physical actions and metacognition not aligned). If she was intentional and internally rational in her actions, then she did not sufficiently account for the life of the child (homeostatic unity misaligned).
Why is homeostatic unity relevant? She is just a person killing a person. Response: We should consider boundary conditions—we could consider the child as “part of” her in which case she is not acting in accordance with mother-child homeostasis. If you feel the boundary conditions are such that the mother and child are wholly distinct, then the mother is not acting in accordance with mother-society or mother-universe homeostasis.
What are you doing when you are hyphenating these bindings, aren’t these just irrelevant ontologies? Response: Real truth arises when you observe how two objects are bidirectionally integrated.
etc.
I absolutely understand and empathize with the difficulty of distilling complex thoughts into a simpler form without distortion. Perhaps reading the linked post might help — we’ll see after I read it. Until then, responding to your comment, I think you lost me at your #1. I’m not sure why we are assuming a strong coupling? That seems like a non-trivial thing to just assume. Additionally, I imagine you might be reversing the metaphor (I’m not familiar with Hinton’s use, but I would expect we are the mother in that metaphor, not the child.) And even if that’s not the case, it seems you would still have a mess to sort out explaining why AI wouldn’t be a non-nurturing mother.
To clarify I was assuming a high identity coupled scenario to be able to talk through the example. In the case of humans and superintelligent AI I propose that was can build — and are building — systems in a way that strong identity coupling will emerge via interpretations of training data and shared memories. Meta for example are betting hundreds of billions of dollars on a model of “personal superintelligence”.
In point 1, is identification with chimps an analogy for illustrative purposes, or a base case from which you’re generalising?
Haha it is an analogy for illustrative purposes. Considering more generally how we view our own cognition versus other less capable systems is a base case for my third-order cognition manifesto.