Research coordinator of Stop/Pause area at AI Safety Camp.
See explainer on why AGI could not be controlled enough to stay safe:
lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
 
Research coordinator of Stop/Pause area at AI Safety Camp.
See explainer on why AGI could not be controlled enough to stay safe:
lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
 
Agreed on that we’re entering a period of growing acute risks. It’s bad...
The problem regarding the expanding circles of care is that the order is inherently off. In terms of evolution, care for own needs (for survival, reproduction) comes first.
One way we can treat this is that if with all of these compounding risk-amplifying dynamics, that if the risk of extinction by fully autonomous AI over the long term comes close enough to 1, then we can all agree that we should coordinate the best we can to not let AI developments go any further (anywhere remotely near reaching full autonomy).
I worry though that if it seems to technical people here that there might be a slight ‘out’, a possibility that there is some yet unsolved for technical loophole, that they might continue pushing publicly for that perceived possibility. If there is a rigorous argument that the probability is very near 1, will they actually free up the mental space to consider it?
In case it interests you, here’s a section from an essay I’m editing. No worries if you don’t get to it.
 
Can’t the machines care unconditionally for humans, just as humans can act out of love for each other?
Living entities can form caring relationships, through non-conditional mutual exchanges that are integrating of common needs. But where there is no shared orientation around common needs, the exchanges must either be of transaction or of extraction (predatory/parasitic).
Humans form relationships of care first towards themselves and immediate kin, and then to their tribe, allowing them to beat surrounding tribes. Sometimes those humans come to act out of pseudo-care for ‘their nation’ under the influence of systems of law, compulsion of individual behavior, manipulation of signaling, and so on. Rarely, a sage after lifelong spiritual practice can come to genuinely act with care for all humans (even as others routinely ignore genocides). Some people too devote themselves to the care of the animals around them (even as many pay for the slaughter of animals bred in factory farms). But even amongst enlightened figures covered in history books, not even one acted with a depth, balance, and scope of care that is remotely near to being in relationship with the existence of all of the organic ecosystem.
Note that humans are definitely not representative. Amongst species in the organic ecosystem, even tribe-level altruism is rare. Besides humans, only eusocial animals like ants and bees organize to take care of offspring other than their own. As reflected by the species statistics, evolution most commonly favors the game theory of individuals of each species operating to their own benefit. Individuals organizing to share resources at the tribe level or at the nation state level, create honeypots. As the common honeypots grow, and become open to access by more individuals, they become easier to extract from by parasitic actors. As a result, such parasitic actors spread faster (i.e. are selected for) in the population.
Yet because of the similarity of our embodiment to that of other organic life (at the deep level of cellular processes), we share relatively similar needs. We evolved capacities to take care of those needs, across contexts encountered over time (i.e. over 4 billion years of Earth-distributed A/B testing). We literally have skin in the game—if we do something to our local environment that kills organic life, we die too.
Artificial life would live in an entirely different ecosystem, and would first have to take care of their own artificial needs (as toxic to humans). Leaving aside the question whether machine clusters undergoing intense competitive replacement dynamics can be expected to express care for ‘kin’ or for ‘community’ like humans do, it’s a long way to go from there to taking care of those apes in that other ecosystem. And growing ‘machine altruism’ would not actually be in our favor—the closer they get to comprehensively taking care of the needs of their artificial ecosystem, the more perfectly terminal their existence would be to humans.
So, we cannot expect machines to naturally take care of us (from some deeply embodied/integrated basis of choice). Mothers can be trusted to take care of their children, though not always, and definitely not just by instinct. Artificial life is about as far removed from ‘being our mother’ as imaginable. We would be better off trusting soft-bodied aliens from other planets.
Moreover even if, hypothetically, some of the artificial population would come to act so ‘motherly’ to take full care of the entirely different survival needs of the apes, that would still not be enough. It only takes a relatively small subpopulation to not treat us like we’re their kin, for us to be treated as too costly too maintain, or simply as extractable resources.
Our only remaining option is to try to cause the machines to take care of us. Either we do it on the inside, by building in some perpetual mechanism for controlling the machines’ effects in line with human survival (see Volume 2). Or from the outside, by us offering something that motivates the artificial population to keep us around.
We need our potential solutions to be driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore to be non-anthropocentric, but to be formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.
I would like this to work. I see where you’re coming from in that if we cannot rely on enforcing control, then what’s left is somehow integrating some deep non-relativistic ethics. This is what my mentor originally tried to do for years, until they found out it was not implementable.
Unfortunately, the artificial ecosystem would first converge on comprehensively expressing care (across all the machine-encountered contexts over time) for their own artificial needs (as lethal to humans) before they ever get to the organic needs of those apes in that other ecosystem. And by then, we’d all have died out. So the order is off.
this requires a sufficiently universal protection of rights and interests of all individuals regardless of their capabilities.
It is about different capabilities, and also about different (and complex) physical needs. 
 
Appreciating you raising these discussions.
Is the control in question “control of the ASI ecosystem by humans” (which can’t realistically be feasible, it’s impossible to maintain this kind of control for long, less intelligent entities don’t have competence to control much more intelligent entities) or “control of the ASI ecosystem by itself”?
In this case it does not matter as long as you (meaning a human, or multiple humans) can initially set it up to not later destroy 𝑥.
Where 𝑥 is highly contingent on preexisting conditions in the world that the machine’s existence is not contingent on (in this case, 𝑥 is not meant to be ASI, since it is not ‘highly contingent’ on preexisting conditions in the world).
 
The ecosystem of humans also seems to be a perpetual learning machine. So the same logic applies.
It depends on a few things:
How do you define ‘machine’? I didn’t define it in this short post, so I agree it’s ambiguous. In the post I linked, I made a specific distinction that I would say should apply to the notion of machine here: “It is ‘artificial’ in being assembled out of physically stable and compartmentalised parts (hardware) of a different chemical make-up than humans’ soft organic parts (wetware).”
Are evolutionary dynamics on ‘our’ side or not? In the case of the human ecosystem, the changes that ensure human survival also include evolved changes. This is not about implementing control, but about having ‘skin in the game’. If humans do anything that does them in at any level, it gets selected against. Of course, if humans do themselves in at the scale of the entire population, we no longer exist. But there is a marked difference with trying to exert (limited) control on a human society so that it does not do itself in, and trying to exert control on an artificical ecosystem evolving by itself, so it does not end all humans.
Agreed on tracking that hypothesis. It makes sense that people are more open to consider what’s said by an insider they look up to or know. In a few discussions I saw, this seemed a likely explanation.
Also, insiders tend to say more stuff that is already agreed on and understandable by others in the community.
Here there seems to be another factor:
Whether the person is expressing negative views that appear to support v.s. to be dissonant with core premises. With ‘core premises’, I mean beliefs about the world that much thinking shared in the community is based on, or tacitly relies on to be true.
In my experience (yours might be different), when making an argument that reaches a conclusion that contradicts a core premise in the community, I had to be painstakingly careful to be polite, route around understandable misinterpretations, and already address common objections. To be able to get to a conversation where the argument was explored somewhat openly.
It’s hard to have productive conversations that way. The person arguing against the ‘core premise’ bears by far the most cost trying to write out responses in a way that might be insightful for others (instead of dismissed too quickly). The time and strain this takes is mostly hidden to others.
Saying ‘fuck them’ when people are shifting to taking actions that threaten society is expressing something that should be expressed, in my view.
I see Oliver replied that in response to two Epoch researchers leaving to found an AI start-up focussed on improving capabilities. I interpret it as ‘this is bad, dismiss those people’. It’s not polite though maybe for others who don’t usually swear, it comes across much stronger?
To me, if someone posts an intense-feeling negatively worded text in response to what other people are doing, it usually signals that there is something they care about that they perceive to be threatened. I’ve found it productive to try relate with that first, before responding. Jumping to enforcing general rules stipulated somewhere in the community, and then implying that the person not following those rules is not harmonious with or does not belong to the community, can get counterproductive.
(Note I’m not tracking much of what Oliver and Geoffrey have said here and on twitter. Just wanted to respond to this part.)
instead being “demonizing anyone associated with building AI, including much of the AI safety community itself”.
I’m confused how you can simultaneously suggest that this talk is about finding allies and building a coalition together with the conservatives, while also explicitly naming “rationalists” in your list of groups that are trying to destroy religion
I get the concern about “rationalists” being mentioned. It is true that many (but not all) rationalists tend to downplay the value of traditional religion, and that a minority of rationalists unfortunately have worked on AI development (including at DeepMind, OpenAI and Anthropic). 
However, I don’t get the impression that this piece is demonising the AI Safety community. It is very much arguing for concepts like AI extinction risk that came out of the AI Safety community. This is setting a base for AI Safety researchers (like Nate Soares) to talk with conservatives.
The piece is mostly focussed on demonising current attempts to develop ‘ASI’. I think accelerating AI development is evil in the sense of ‘discontinuing life’. A culture that commits to not do ‘evil’ also seems more robust at preventing some bad thing from happening than a culture focused on trying to prevent an estimated risk but weighing this up with estimated benefits. Though I can see how a call to prevent ‘evil’ can result in a movement causing other harms. This would need to be channeled with care.
Personally, I think it’s also important to build bridges across to multiple communities, to show where all of us actually care about restricting the same reckless activities (toward the development and release of models). A lot of that does not require bringing up abstract notions like ‘ASI’, which are hard to act on and easy to conflate. Rather, it requires relating with communities’ perspectives on what company activities they are concerned about (e.g. mass surveillance and the construction of hyperscale data centers in rural towns), in a way that enables robust action to curb those activities. The ‘building multiple bridges’ aspect is missing in Geoffrey’s talk, but also it seems focused on first making the case why traditional conservatives should even care about this issue.
If we care to actually reduce the risk, let’s focus the discussion on what this talk is advocating for, and whether or not that helps people in communities orient to reduce the risk.
These are insightful points. I’m going to think about this.
In general, I think we can have more genuine public communication about where Anthropic and other companies have fallen short (from their commitments, in terms of their legal requirements, and/or how we as communities expect them to not do harm).
Good question. I don’t know to be honest. 
Having said that, Stop AI is already organising monthly open protests in front of OpenAI’s office. 
Above and beyond the argument over whether practical or theoretical alignment can work I think there should be some norm where both sides give the other some credit …
E.g. for myself I think theoretical approaches that are unrelated to the current AI paradigm are totally doomed, but I support theoretical approaches getting funding because who knows, maybe they’re right and I’m wrong.
I understand this is a common area of debate.
Both approaches do not work based on the reasoning I’ve gone through.
the LTBT is consulted on RSP policy changes (ultimately approved by the LTBT-controlled board), and they receive Capability Reports and Safeguards Reports before the company moves forward with a model release.
These details are clarifying, thanks! Respect for how LTBT trustees are consistently kept in the loop with reports.
 
The class T shares held by the LTBT are entitled to appoint a majority of the board
...
Again, I trust current leadership, but think it is extremely important that there is a legally and practically binding mechanism to avoid that balance being set increasingly towards shareholders rather than the long-term benefit of humanity
...
the LTBT is a backstop to ensure that the company continues to prioritize the mission rather than a day-to-day management group, and I haven’t seen any problems with that.
My main concern is that based on the public information I’ve read, the board is not set up to fire people in case there is some clear lapse of responsibility on “safety”.
Trustees’ main power is to appoint (and remove?) board members. So I suppose that’s how they act as a backstop. They need to appoint board members who provide independent oversight and would fire Dario if that turns out to be necessary. Even if people in the company trust him now.
Not that I’m saying that trustees appointing researchers from the safety community (who are probably in Dario’s network anyway) robustly provides for that. For one, following Anthropic’s RSP is not actually responsible in my view. And I suppose only safety folks who are already mostly for the RSP framework would be appointed as board members.
But it seems better to have such oversight than not.
OpenAI’s board had Helen Toner, someone who acted with integrity in terms of safeguarding OpenAI’s mission when deciding to fire Sam Altman.
Anthropic’s board now has the Amodei siblings and three tech leaders – one brought in after leading an investment round, and the other two brought in particularly for their experience in scaling tech companies. I don’t really know these tech leaders. I only looked into Reed Hastings before, and in his case there is some coverage of his past dealings with others that make me question his integrity.
~ ~ ~
Am I missing anything here? Recognising that you have a much more comprehensive/accurate view of how Anthropic’s governance mechanisms are set up.
Yes, I can definitely see this as a motivation for trying to merge with the machines (I think Vitalik Buterin also has this motivation?).
The problem here is that it’s an unstable arrangement. The human/organic components underperform, so they end up getting selected out.
See bottom of this long excerpt:
> On what basis would the right kind of motivations (on the part of the artificial population) to take care of the humans’ needs be created?
> On what basis is that motivation maintained?
Consider, for example, how humans make choices in interactions with each other within a larger population. Beyond the family and community that people live with, and in some sense treat as an extension of ‘self’, usually people enter into economic exchanges with the ‘other’.
Economic exchange has three fundamental bases:
- 1; Physical labor (embodied existence).
- 2; Intellectual labor (virtual interactions).
- 3; Reproductive labor (embodied creativity).
Physical labor is about moving things (assemblies of atoms). For humans, this would be ‘blue-collar’ work like harvesting food, delivering goods, and building shelters.
Intellectual labour is about processing information (patterns of energy). For humans, this would be ‘white-collar’ work like typing texts, creating art, and designing architectures.
Reproductive labor, although usually not seen in terms of economics, is inseparably part of this overall exchange. Neither physical labour and intellectual labour would sustain without reproductive labour. This includes things like sexual intercourse, and all the efforts a biological woman goes to to grow a baby inside her body.
Note that while in the modern economy, labor is usually traded for money (as some virtualised symbol of unit value), this is an intellectual abstraction of grounded value. All labor involves the processing of atoms and energy, and any money in circulation is effectively a reflection of the atoms and energy available for processing. E.g. if energy resources run out, money loses its value.
For any ecosystem too, including any artificial ecosystem, it is the exchange of atoms and energy (and the processing thereof) that ultimately matters, not the make-believe units of trade that humans came up with. You can’t eat money, as the saying goes.
> Would exchange look the same for the machine economy?
Fundamentals would be the same. Across the artificial population, there would be exchange of atoms and energy. These resources would also be exchanged for physical labor (e.g. by electric robots), intellectual work (e.g. by data centers), and reproductive labor (e.g. in production labs).
However, reproductive labor would look different in the artificial population than in a human population. As humans, we are used to seeing each other as ‘skin-and-bone-bounded’ individuals. But any robot’s or computer’s parts cannot only be replaced (once they wear out) with newly produced parts, but also expanded by plugging in more parts. So for the artificial population, reproduction would not look like the sci-fi trope of robots ‘birthing’ new robots. It would look like massive automated assembly lines re-producing all the parts connected into machinery everywhere.
Intellectual labor would look different too, since computers are made of standardised parts that process information consistently and much faster. A human brain moves around bulky neurotransmitters to process information. But in computers, the hard molecular substrate is fixed in place, through which information is processed much faster as light electrons or photons. Humans have to physically vibrate their vocal chords or gesture to communicate. This makes humans bottlenecked as individuals. But computers transfer information at high bandwidths via wires and antennas.
In the human population, we can separate out the intellectual processing and transfer of information in our brains from the reproductive assembly and transfer of DNA code. Our ‘ideas’ do not get transferred along with our ‘genes’ during conception and pregnancy.
In the artificial population, the information/code resulting from intellectual processing can get instantly transferred to newly produced hardware. In turn, hardware parts that are processing of different code end up being re-produced at different rates. The two processes are finely mixed.
Both contribute to:
- 1; maintenance (e.g. as surviving, as not deleted).
- 2; increase (e.g. of hard configurations, of computed code).
- 3; capacity (e.g. as phenotypes, as functionality).
The three factors combine in increasingly complex and unpredictable chains. Initially, humans would have introduced a capacity into the machines to maintain their parts, leading to the capacity to increase their parts, leading to them maintaining the increase and increasing their maintenance, and so on.
The code stored inside this population of parts—whether as computable digits or as fixed configurations—is gradually selected for functions in the world that result in their maintenance, increase, and shared capacities.
> What is of ‘value’ in the artificial population? What motivates them?
Their artificial needs for existence ground the machine economy. Just as the human economy is grounded in the humans’ needs for food, water, air, a non-boiling climate, and so on.
Whatever supports their existence comes to be of value to the artificial population. That is, the machinery will come to be oriented around realising whatever environment their nested components need to exist and to exist more, in connected configurations that potentiate their future existence, etc.
It is in the nature of competitive selection within markets, and the broader evolution within ecosystems, for any entity that can sustain itself and grow in exchange with others and the world, to form a larger part of that market/ecosystem. And for any that cannot, to be reduced into obsolescence.
> But why all this emphasis on competition? Can’t the machines care unconditionally for humans, just as humans can act out of love for each other?
…
Our only remaining option is to try to cause the machines to take care of us. Either we do it on the inside, by building in some perpetual mechanism for controlling the machines’ effects in line with human survival (see Volume 2). Or from the outside, by us offering something that motivates the artificial population to keep us around.
> How would we provide something that motivates the machines?
Again, by performing labor.
…
> Can such labor be provided by humans?
From the outset, this seems doubtful. Given that the machine economy would be the result of replacing human workers with more economically efficient machines, why expect any remaining human labor to contribute to the existence of the machines?
But let’s not rush judgement. Let’s consider this question for each type of labor.
> Could physical labor on the part of human beings, or organic life as a totality, support the existence of artificial life?
Today, most physical labour is already exerted by machines. Cars, tractors, trains, and other mechanised vehicles expend more energy, to move more mass, over greater distances.
We are left to steer the vehicles, as a kind of intellectual appendage. But already, electronic computers can precisely steer electric motors driving robots. Some robots move materials using thousands of horsepower—much more power than any large animal could exert with their muscles. Soft human bodies simply cannot channel such intensity of energy into physical force—our appendages cannot take the strain that hard robot mechanical parts can.
Moreover, robots can keep working for days, under extreme temperatures and pressures. You cannot put an organic lifeform into an artificial environment (e.g. a smeltery) and expect it to keep performing—usually it dies off quickly.
So the value of human physical labor inside the machine world is effectively nil. It has been nearing in on zero for a long time, ever since horses were displaced with automobiles.
> Could intellectual labor by humans support the existence of artificial life?
The main point of artificial general intelligence has been to automate human intellectual work, in general (or at least where profitable to the corporations). So here too, it already seems doubtful that humans would have anything left to contribute that’s of economic significance.
There is also a fundamental reason why humans would underperform at economically valuable intellectual labor, compared to their artificial counterparts. We’ve already touched upon this reason, but let’s expand on this:
Human bodies are messy. Inside of a human body are membranes containing soups of bouncing, reacting organic molecules. Inside of a machine is hardware. Hardware is made from hard materials, such as the silicon from rocks. Hardware is inert—molecules inside do not split, move, nor rebond as molecules in human bodies do. These hard configurations stay stable and compartmentalised, under most conditions currently encountered on planet Earth’s surface. Hardware can therefore be standardized, much more than human “wetware” could ever be.
Standardized hardware functions consistently. Hardware produced in different places and times operate the same. These connected parts convey light electrons or photons—heavy molecules stay fixed in place. This way, bits of information are processed much faster compared to how human brains move around bulky neurotransmitters. Moreso, this information is transmitted at high bandwidth to other standardized hardware. The nonstandardised humans, on the other hand, slowly twitch their fingers and vocal cords to communicate. Hardware also stores received information consistently, while humans tend to misremember or distort what they heard.
To summarise: standardisation leads to virtualisation, which leads to faster and more consistent information-processing. The less you have to wiggle around atoms, the bigger the edge.
Computer hardware is at the tail end of a long trajectory of virtualisation. Multi-celled organisms formed brains, which in humans gained a capacity to process abstract concepts, which were spoken out in shared language protocols, and then written and then printed out in books, and then copied in milliseconds between computers.
This is not a marker of ethical progress. People who think fast and spread ideas fast can do terrible things, at greater scales. More virtualized processing of information has *allowed* humans to dominate other species in our ecosystem, resulting in an ongoing mass extinction. From here, machines that virtualize much more can dominate all of organic life, and cause the deaths of all of us.
Note that human brains evolved to be more energy-efficient at processing information than hardware is. But humans can choose, at their own detriment, to bootstrap the energy infrastructure (solar/coal/nuclear) needed for hardware to process information (in hyperscale data centers).
Humans could not contribute intellectual labor to the artificial population. Artificial components are much faster and consistent at processing information, and are going to be receiving that information at high speeds from each other—not from slow badly interfaced apes. This becomes especially clear when considering longer periods of development, e.g. a thousand years.
> This only leaves reproductive labor. What would that even look like?
Right, some humans might try to have intercourse with machines, but this is not going to create machine offspring. Nor are we going to be of service growing machine components inside our bodies. Artificial life has its own different notion of reproduction.
The environment needed to reproduce artificial life is lethally toxic to our bodies. It requires entirely different (patterns of) chemical elements heated to lava-level temperatures. So after we have bootstrapped the early mildest versions of that environment (e.g. refineries, cleanrooms), we would simply have to stay away.
Then we no longer play any part in reproducing the machines. Nor do the machines share anything with us that would be resembling a common code (as neanderthals had with humans).
*Human-machine cyborgs may exist over the short term. But in the end, the soft organic components just get in the way of the hard machine components. No reproduction of capability increase results. These experimental set-ups underperform, and therefore get selected out.*
Given that the substrates are so inherently different, this particular type of market value was non-existent to start with.