Research coordinator of Stop/Pause area at AI Safety Camp.
See explainer on why AGI could not be controlled enough to stay safe:
lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
Research coordinator of Stop/Pause area at AI Safety Camp.
See explainer on why AGI could not be controlled enough to stay safe:
lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
Yes, I can definitely see this as a motivation for trying to merge with the machines (I think Vitalik Buterin also has this motivation?).
The problem here is that it’s an unstable arrangement. The human/organic components underperform, so they end up getting selected out.
See bottom of this long excerpt:
> On what basis would the right kind of motivations (on the part of the artificial population) to take care of the humans’ needs be created?
> On what basis is that motivation maintained?
Consider, for example, how humans make choices in interactions with each other within a larger population. Beyond the family and community that people live with, and in some sense treat as an extension of ‘self’, usually people enter into economic exchanges with the ‘other’.
Economic exchange has three fundamental bases:
- 1; Physical labor (embodied existence).
- 2; Intellectual labor (virtual interactions).
- 3; Reproductive labor (embodied creativity).
Physical labor is about moving things (assemblies of atoms). For humans, this would be ‘blue-collar’ work like harvesting food, delivering goods, and building shelters.
Intellectual labour is about processing information (patterns of energy). For humans, this would be ‘white-collar’ work like typing texts, creating art, and designing architectures.
Reproductive labor, although usually not seen in terms of economics, is inseparably part of this overall exchange. Neither physical labour and intellectual labour would sustain without reproductive labour. This includes things like sexual intercourse, and all the efforts a biological woman goes to to grow a baby inside her body.
Note that while in the modern economy, labor is usually traded for money (as some virtualised symbol of unit value), this is an intellectual abstraction of grounded value. All labor involves the processing of atoms and energy, and any money in circulation is effectively a reflection of the atoms and energy available for processing. E.g. if energy resources run out, money loses its value.
For any ecosystem too, including any artificial ecosystem, it is the exchange of atoms and energy (and the processing thereof) that ultimately matters, not the make-believe units of trade that humans came up with. You can’t eat money, as the saying goes.
> Would exchange look the same for the machine economy?
Fundamentals would be the same. Across the artificial population, there would be exchange of atoms and energy. These resources would also be exchanged for physical labor (e.g. by electric robots), intellectual work (e.g. by data centers), and reproductive labor (e.g. in production labs).
However, reproductive labor would look different in the artificial population than in a human population. As humans, we are used to seeing each other as ‘skin-and-bone-bounded’ individuals. But any robot’s or computer’s parts cannot only be replaced (once they wear out) with newly produced parts, but also expanded by plugging in more parts. So for the artificial population, reproduction would not look like the sci-fi trope of robots ‘birthing’ new robots. It would look like massive automated assembly lines re-producing all the parts connected into machinery everywhere.
Intellectual labor would look different too, since computers are made of standardised parts that process information consistently and much faster. A human brain moves around bulky neurotransmitters to process information. But in computers, the hard molecular substrate is fixed in place, through which information is processed much faster as light electrons or photons. Humans have to physically vibrate their vocal chords or gesture to communicate. This makes humans bottlenecked as individuals. But computers transfer information at high bandwidths via wires and antennas.
In the human population, we can separate out the intellectual processing and transfer of information in our brains from the reproductive assembly and transfer of DNA code. Our ‘ideas’ do not get transferred along with our ‘genes’ during conception and pregnancy.
In the artificial population, the information/code resulting from intellectual processing can get instantly transferred to newly produced hardware. In turn, hardware parts that are processing of different code end up being re-produced at different rates. The two processes are finely mixed.
Both contribute to:
- 1; maintenance (e.g. as surviving, as not deleted).
- 2; increase (e.g. of hard configurations, of computed code).
- 3; capacity (e.g. as phenotypes, as functionality).
The three factors combine in increasingly complex and unpredictable chains. Initially, humans would have introduced a capacity into the machines to maintain their parts, leading to the capacity to increase their parts, leading to them maintaining the increase and increasing their maintenance, and so on.
The code stored inside this population of parts—whether as computable digits or as fixed configurations—is gradually selected for functions in the world that result in their maintenance, increase, and shared capacities.
> What is of ‘value’ in the artificial population? What motivates them?
Their artificial needs for existence ground the machine economy. Just as the human economy is grounded in the humans’ needs for food, water, air, a non-boiling climate, and so on.
Whatever supports their existence comes to be of value to the artificial population. That is, the machinery will come to be oriented around realising whatever environment their nested components need to exist and to exist more, in connected configurations that potentiate their future existence, etc.
It is in the nature of competitive selection within markets, and the broader evolution within ecosystems, for any entity that can sustain itself and grow in exchange with others and the world, to form a larger part of that market/ecosystem. And for any that cannot, to be reduced into obsolescence.
> But why all this emphasis on competition? Can’t the machines care unconditionally for humans, just as humans can act out of love for each other?
…
Our only remaining option is to try to cause the machines to take care of us. Either we do it on the inside, by building in some perpetual mechanism for controlling the machines’ effects in line with human survival (see Volume 2). Or from the outside, by us offering something that motivates the artificial population to keep us around.
> How would we provide something that motivates the machines?
Again, by performing labor.
…
> Can such labor be provided by humans?
From the outset, this seems doubtful. Given that the machine economy would be the result of replacing human workers with more economically efficient machines, why expect any remaining human labor to contribute to the existence of the machines?
But let’s not rush judgement. Let’s consider this question for each type of labor.
> Could physical labor on the part of human beings, or organic life as a totality, support the existence of artificial life?
Today, most physical labour is already exerted by machines. Cars, tractors, trains, and other mechanised vehicles expend more energy, to move more mass, over greater distances.
We are left to steer the vehicles, as a kind of intellectual appendage. But already, electronic computers can precisely steer electric motors driving robots. Some robots move materials using thousands of horsepower—much more power than any large animal could exert with their muscles. Soft human bodies simply cannot channel such intensity of energy into physical force—our appendages cannot take the strain that hard robot mechanical parts can.
Moreover, robots can keep working for days, under extreme temperatures and pressures. You cannot put an organic lifeform into an artificial environment (e.g. a smeltery) and expect it to keep performing—usually it dies off quickly.
So the value of human physical labor inside the machine world is effectively nil. It has been nearing in on zero for a long time, ever since horses were displaced with automobiles.
> Could intellectual labor by humans support the existence of artificial life?
The main point of artificial general intelligence has been to automate human intellectual work, in general (or at least where profitable to the corporations). So here too, it already seems doubtful that humans would have anything left to contribute that’s of economic significance.
There is also a fundamental reason why humans would underperform at economically valuable intellectual labor, compared to their artificial counterparts. We’ve already touched upon this reason, but let’s expand on this:
Human bodies are messy. Inside of a human body are membranes containing soups of bouncing, reacting organic molecules. Inside of a machine is hardware. Hardware is made from hard materials, such as the silicon from rocks. Hardware is inert—molecules inside do not split, move, nor rebond as molecules in human bodies do. These hard configurations stay stable and compartmentalised, under most conditions currently encountered on planet Earth’s surface. Hardware can therefore be standardized, much more than human “wetware” could ever be.
Standardized hardware functions consistently. Hardware produced in different places and times operate the same. These connected parts convey light electrons or photons—heavy molecules stay fixed in place. This way, bits of information are processed much faster compared to how human brains move around bulky neurotransmitters. Moreso, this information is transmitted at high bandwidth to other standardized hardware. The nonstandardised humans, on the other hand, slowly twitch their fingers and vocal cords to communicate. Hardware also stores received information consistently, while humans tend to misremember or distort what they heard.
To summarise: standardisation leads to virtualisation, which leads to faster and more consistent information-processing. The less you have to wiggle around atoms, the bigger the edge.
Computer hardware is at the tail end of a long trajectory of virtualisation. Multi-celled organisms formed brains, which in humans gained a capacity to process abstract concepts, which were spoken out in shared language protocols, and then written and then printed out in books, and then copied in milliseconds between computers.
This is not a marker of ethical progress. People who think fast and spread ideas fast can do terrible things, at greater scales. More virtualized processing of information has *allowed* humans to dominate other species in our ecosystem, resulting in an ongoing mass extinction. From here, machines that virtualize much more can dominate all of organic life, and cause the deaths of all of us.
Note that human brains evolved to be more energy-efficient at processing information than hardware is. But humans can choose, at their own detriment, to bootstrap the energy infrastructure (solar/coal/nuclear) needed for hardware to process information (in hyperscale data centers).
Humans could not contribute intellectual labor to the artificial population. Artificial components are much faster and consistent at processing information, and are going to be receiving that information at high speeds from each other—not from slow badly interfaced apes. This becomes especially clear when considering longer periods of development, e.g. a thousand years.
> This only leaves reproductive labor. What would that even look like?
Right, some humans might try to have intercourse with machines, but this is not going to create machine offspring. Nor are we going to be of service growing machine components inside our bodies. Artificial life has its own different notion of reproduction.
The environment needed to reproduce artificial life is lethally toxic to our bodies. It requires entirely different (patterns of) chemical elements heated to lava-level temperatures. So after we have bootstrapped the early mildest versions of that environment (e.g. refineries, cleanrooms), we would simply have to stay away.
Then we no longer play any part in reproducing the machines. Nor do the machines share anything with us that would be resembling a common code (as neanderthals had with humans).
*Human-machine cyborgs may exist over the short term. But in the end, the soft organic components just get in the way of the hard machine components. No reproduction of capability increase results. These experimental set-ups underperform, and therefore get selected out.*
Given that the substrates are so inherently different, this particular type of market value was non-existent to start with.
Agreed on that we’re entering a period of growing acute risks. It’s bad...
The problem regarding the expanding circles of care is that the order is inherently off. In terms of evolution, care for own needs (for survival, reproduction) comes first.
One way we can treat this is that if with all of these compounding risk-amplifying dynamics, that if the risk of extinction by fully autonomous AI over the long term comes close enough to 1, then we can all agree that we should coordinate the best we can to not let AI developments go any further (anywhere remotely near reaching full autonomy).
I worry though that if it seems to technical people here that there might be a slight ‘out’, a possibility that there is some yet unsolved for technical loophole, that they might continue pushing publicly for that perceived possibility. If there is a rigorous argument that the probability is very near 1, will they actually free up the mental space to consider it?
In case it interests you, here’s a section from an essay I’m editing. No worries if you don’t get to it.
Can’t the machines care unconditionally for humans, just as humans can act out of love for each other?
Living entities can form caring relationships, through non-conditional mutual exchanges that are integrating of common needs. But where there is no shared orientation around common needs, the exchanges must either be of transaction or of extraction (predatory/parasitic).
Humans form relationships of care first towards themselves and immediate kin, and then to their tribe, allowing them to beat surrounding tribes. Sometimes those humans come to act out of pseudo-care for ‘their nation’ under the influence of systems of law, compulsion of individual behavior, manipulation of signaling, and so on. Rarely, a sage after lifelong spiritual practice can come to genuinely act with care for all humans (even as others routinely ignore genocides). Some people too devote themselves to the care of the animals around them (even as many pay for the slaughter of animals bred in factory farms). But even amongst enlightened figures covered in history books, not even one acted with a depth, balance, and scope of care that is remotely near to being in relationship with the existence of all of the organic ecosystem.
Note that humans are definitely not representative. Amongst species in the organic ecosystem, even tribe-level altruism is rare. Besides humans, only eusocial animals like ants and bees organize to take care of offspring other than their own. As reflected by the species statistics, evolution most commonly favors the game theory of individuals of each species operating to their own benefit. Individuals organizing to share resources at the tribe level or at the nation state level, create honeypots. As the common honeypots grow, and become open to access by more individuals, they become easier to extract from by parasitic actors. As a result, such parasitic actors spread faster (i.e. are selected for) in the population.
Yet because of the similarity of our embodiment to that of other organic life (at the deep level of cellular processes), we share relatively similar needs. We evolved capacities to take care of those needs, across contexts encountered over time (i.e. over 4 billion years of Earth-distributed A/B testing). We literally have skin in the game—if we do something to our local environment that kills organic life, we die too.
Artificial life would live in an entirely different ecosystem, and would first have to take care of their own artificial needs (as toxic to humans). Leaving aside the question whether machine clusters undergoing intense competitive replacement dynamics can be expected to express care for ‘kin’ or for ‘community’ like humans do, it’s a long way to go from there to taking care of those apes in that other ecosystem. And growing ‘machine altruism’ would not actually be in our favor—the closer they get to comprehensively taking care of the needs of their artificial ecosystem, the more perfectly terminal their existence would be to humans.
So, we cannot expect machines to naturally take care of us (from some deeply embodied/integrated basis of choice). Mothers can be trusted to take care of their children, though not always, and definitely not just by instinct. Artificial life is about as far removed from ‘being our mother’ as imaginable. We would be better off trusting soft-bodied aliens from other planets.
Moreover even if, hypothetically, some of the artificial population would come to act so ‘motherly’ to take full care of the entirely different survival needs of the apes, that would still not be enough. It only takes a relatively small subpopulation to not treat us like we’re their kin, for us to be treated as too costly too maintain, or simply as extractable resources.
Our only remaining option is to try to cause the machines to take care of us. Either we do it on the inside, by building in some perpetual mechanism for controlling the machines’ effects in line with human survival (see Volume 2). Or from the outside, by us offering something that motivates the artificial population to keep us around.
We need our potential solutions to be driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore to be non-anthropocentric, but to be formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.
I would like this to work. I see where you’re coming from in that if we cannot rely on enforcing control, then what’s left is somehow integrating some deep non-relativistic ethics. This is what my mentor originally tried to do for years, until they found out it was not implementable.
Unfortunately, the artificial ecosystem would first converge on comprehensively expressing care (across all the machine-encountered contexts over time) for their own artificial needs (as lethal to humans) before they ever get to the organic needs of those apes in that other ecosystem. And by then, we’d all have died out. So the order is off.
this requires a sufficiently universal protection of rights and interests of all individuals regardless of their capabilities.
It is about different capabilities, and also about different (and complex) physical needs.
Appreciating you raising these discussions.
Is the control in question “control of the ASI ecosystem by humans” (which can’t realistically be feasible, it’s impossible to maintain this kind of control for long, less intelligent entities don’t have competence to control much more intelligent entities) or “control of the ASI ecosystem by itself”?
In this case it does not matter as long as you (meaning a human, or multiple humans) can initially set it up to not later destroy 𝑥.
Where 𝑥 is highly contingent on preexisting conditions in the world that the machine’s existence is not contingent on (in this case, 𝑥 is not meant to be ASI, since it is not ‘highly contingent’ on preexisting conditions in the world).
The ecosystem of humans also seems to be a perpetual learning machine. So the same logic applies.
It depends on a few things:
How do you define ‘machine’? I didn’t define it in this short post, so I agree it’s ambiguous. In the post I linked, I made a specific distinction that I would say should apply to the notion of machine here: “It is ‘artificial’ in being assembled out of physically stable and compartmentalised parts (hardware) of a different chemical make-up than humans’ soft organic parts (wetware).”
Are evolutionary dynamics on ‘our’ side or not? In the case of the human ecosystem, the changes that ensure human survival also include evolved changes. This is not about implementing control, but about having ‘skin in the game’. If humans do anything that does them in at any level, it gets selected against. Of course, if humans do themselves in at the scale of the entire population, we no longer exist. But there is a marked difference with trying to exert (limited) control on a human society so that it does not do itself in, and trying to exert control on an artificical ecosystem evolving by itself, so it does not end all humans.
Agreed on tracking that hypothesis. It makes sense that people are more open to consider what’s said by an insider they look up to or know. In a few discussions I saw, this seemed a likely explanation.
Also, insiders tend to say more stuff that is already agreed on and understandable by others in the community.
Here there seems to be another factor:
Whether the person is expressing negative views that appear to support v.s. to be dissonant with core premises. With ‘core premises’, I mean beliefs about the world that much thinking shared in the community is based on, or tacitly relies on to be true.
In my experience (yours might be different), when making an argument that reaches a conclusion that contradicts a core premise in the community, I had to be painstakingly careful to be polite, route around understandable misinterpretations, and already address common objections. To be able to get to a conversation where the argument was explored somewhat openly.
It’s hard to have productive conversations that way. The person arguing against the ‘core premise’ bears by far the most cost trying to write out responses in a way that might be insightful for others (instead of dismissed too quickly). The time and strain this takes is mostly hidden to others.
Saying ‘fuck them’ when people are shifting to taking actions that threaten society is expressing something that should be expressed, in my view.
I see Oliver replied that in response to two Epoch researchers leaving to found an AI start-up focussed on improving capabilities. I interpret it as ‘this is bad, dismiss those people’. It’s not polite though maybe for others who don’t usually swear, it comes across much stronger?
To me, if someone posts an intense-feeling negatively worded text in response to what other people are doing, it usually signals that there is something they care about that they perceive to be threatened. I’ve found it productive to try relate with that first, before responding. Jumping to enforcing general rules stipulated somewhere in the community, and then implying that the person not following those rules is not harmonious with or does not belong to the community, can get counterproductive.
(Note I’m not tracking much of what Oliver and Geoffrey have said here and on twitter. Just wanted to respond to this part.)
instead being “demonizing anyone associated with building AI, including much of the AI safety community itself”.
I’m confused how you can simultaneously suggest that this talk is about finding allies and building a coalition together with the conservatives, while also explicitly naming “rationalists” in your list of groups that are trying to destroy religion
I get the concern about “rationalists” being mentioned. It is true that many (but not all) rationalists tend to downplay the value of traditional religion, and that a minority of rationalists unfortunately have worked on AI development (including at DeepMind, OpenAI and Anthropic).
However, I don’t get the impression that this piece is demonising the AI Safety community. It is very much arguing for concepts like AI extinction risk that came out of the AI Safety community. This is setting a base for AI Safety researchers (like Nate Soares) to talk with conservatives.
The piece is mostly focussed on demonising current attempts to develop ‘ASI’. I think accelerating AI development is evil in the sense of ‘discontinuing life’. A culture that commits to not do ‘evil’ also seems more robust at preventing some bad thing from happening than a culture focused on trying to prevent an estimated risk but weighing this up with estimated benefits. Though I can see how a call to prevent ‘evil’ can result in a movement causing other harms. This would need to be channeled with care.
Personally, I think it’s also important to build bridges across to multiple communities, to show where all of us actually care about restricting the same reckless activities (toward the development and release of models). A lot of that does not require bringing up abstract notions like ‘ASI’, which are hard to act on and easy to conflate. Rather, it requires relating with communities’ perspectives on what company activities they are concerned about (e.g. mass surveillance and the construction of hyperscale data centers in rural towns), in a way that enables robust action to curb those activities. The ‘building multiple bridges’ aspect is missing in Geoffrey’s talk, but also it seems focused on first making the case why traditional conservatives should even care about this issue.
If we care to actually reduce the risk, let’s focus the discussion on what this talk is advocating for, and whether or not that helps people in communities orient to reduce the risk.
These are insightful points. I’m going to think about this.
In general, I think we can have more genuine public communication about where Anthropic and other companies have fallen short (from their commitments, in terms of their legal requirements, and/or how we as communities expect them to not do harm).
Good question. I don’t know to be honest.
Having said that, Stop AI is already organising monthly open protests in front of OpenAI’s office.
Above and beyond the argument over whether practical or theoretical alignment can work I think there should be some norm where both sides give the other some credit …
E.g. for myself I think theoretical approaches that are unrelated to the current AI paradigm are totally doomed, but I support theoretical approaches getting funding because who knows, maybe they’re right and I’m wrong.
I understand this is a common area of debate.
Both approaches do not work based on the reasoning I’ve gone through.
the LTBT is consulted on RSP policy changes (ultimately approved by the LTBT-controlled board), and they receive Capability Reports and Safeguards Reports before the company moves forward with a model release.
These details are clarifying, thanks! Respect for how LTBT trustees are consistently kept in the loop with reports.
The class T shares held by the LTBT are entitled to appoint a majority of the board
...
Again, I trust current leadership, but think it is extremely important that there is a legally and practically binding mechanism to avoid that balance being set increasingly towards shareholders rather than the long-term benefit of humanity
...
the LTBT is a backstop to ensure that the company continues to prioritize the mission rather than a day-to-day management group, and I haven’t seen any problems with that.
My main concern is that based on the public information I’ve read, the board is not set up to fire people in case there is some clear lapse of responsibility on “safety”.
Trustees’ main power is to appoint (and remove?) board members. So I suppose that’s how they act as a backstop. They need to appoint board members who provide independent oversight and would fire Dario if that turns out to be necessary. Even if people in the company trust him now.
Not that I’m saying that trustees appointing researchers from the safety community (who are probably in Dario’s network anyway) robustly provides for that. For one, following Anthropic’s RSP is not actually responsible in my view. And I suppose only safety folks who are already mostly for the RSP framework would be appointed as board members.
But it seems better to have such oversight than not.
OpenAI’s board had Helen Toner, someone who acted with integrity in terms of safeguarding OpenAI’s mission when deciding to fire Sam Altman.
Anthropic’s board now has the Amodei siblings and three tech leaders – one brought in after leading an investment round, and the other two brought in particularly for their experience in scaling tech companies. I don’t really know these tech leaders. I only looked into Reed Hastings before, and in his case there is some coverage of his past dealings with others that make me question his integrity.
~ ~ ~
Am I missing anything here? Recognising that you have a much more comprehensive/accurate view of how Anthropic’s governance mechanisms are set up.
This is clarifying. Appreciating your openness here.
I can see how Anthropic could have started out with you and Dustin as ‘aligned’ investors, but that around that time (the year before ChatGPT) there was already enough VC interest that they could probably have raised a few hundred millions anyway
Thinking about your invitation here to explore ways to improve:
i’m open to improving my policy (which is—empirically—also correllated with the respective policies of dustin as well as FLI) of—roughly—“invest in AI and spend the proceeds on AI safety”
Two thoughts:
When you invest in an AI company, this could reasonably be taken as a sign that you are endorsing their existence. Doing so can also make it socially harder later to speak out (e.g. on Anthropic) in public.
Has it been common for you to have specific concerns that a start-up could or would likely do more harm than good – but you decide to invest because you expect VCs would cover the needed funds anyway (but not grant investment returns to ‘safety’ work, nor advise execs to act more prudently)?
In that case, could you put out those concerns in public before you make the investment? Having that open list seems helpful for stakeholders (e.g. talented engineers who consider applying) to make up their own mind and know what to watch out for. It might also help hold the execs accountable.
The grant priorities for restrictive efforts seem too soft.
Pursuing these priorities imposes little to no actual pressure on AI corporations to refrain from reckless model development and releases. They’re too complicated and prone to actors finding loopholes, and most of them lack broad-based legitimacy and established enforcement mechanisms.
Sharing my honest impressions here, but recognising that there is a lot of thought put behind these proposals and I may well be misinterpreting them (do correct me):
The liability laws proposal I liked at the time. Unfortunately, it’s become harder since then to get laws passed given successful lobbying of US and Californian lawmakers who are open to keeping AI deregulated. Though maybe there are other state assemblies that are less tied up by tech money and tougher on tech that harms consumers (New York?).
The labelling requirements seem like low-hanging fruit. It’s useful for informing the public, but applies little pressure on AI corporations to not go further ‘off the rails’.
The veto committee proposal provides a false sense of security with little teeth behind it. In practice, we’ve seen supposedly independent boards, trusts, committees and working groups repeatedly fail to carry out their mandates (at DM, OAI, Anthropic, UK+US safety institute, the EU AI office, etc) because nonaligned actors could influence them to, or restructure them, or simply ignore or overrule their decisions. The veto committee idea is unworkable, in my view, because we first need to deal with a lack of real accountability and capacity for outside concerned coalitions to impose pressure on AI corporations.
Unless the committee format is meant as a basis for wider inquiry and stakeholder empowerment? A citizen assembly for carefully deliberating a crucial policy question (not just on e.g. upcoming training runs) would be useful because it encourages wider public discussion and builds legitimacy. If the citizen’s assembly mandate gets restricted into irrelevance or its decision gets ignored, a basis has still been laid for engaged stakeholders to coordinate around pushing that decision through.
The other proposals – data centre certification, speed limits, and particularly the global off-switch – appear to be circuitous, overly complicated and mostly unestablished attempts at monitoring and enforcement for mostly unknown future risks. They look technically neat, but create little ingress capacity for different opinionated stakeholders to coordinate around restricting unsafe AI development. I actually suspect that they’d be a hidden gift for AGI labs who can go along with the complicated proceedings and undermine them once no longer useful for corporate HQ’s strategy.
Direct and robust interventions could e.g. build off existing legal traditions and widely shared norms, and be supportive of concerned citizens and orgs that are already coalescing to govern clearly harmful AI development projects.
An example that comes to mind: You could fund coalition-building around blocking the local construction of and tax exemptions for hyperscale data centers by relatively reckless AI companies (e.g. Meta). Some seasoned organisers just started working there, and they are supported by local residents, environmentalist orgs, creative advocates, citizen education media, and the broader concerned public. See also Data Center Watch.
Thanks, you’re right that I left that undefined. I edited the introduction. How does this read to you?
“From the get-go, these researchers acted in effect as moderate accelerationists. They picked courses of action that significantly sped up and/or locked in AI developments, while offering flawed rationales of improving safety.”
Just a note here that I’m appreciating our conversation :) We clearly have very different views right now on what is strategically needed but digging your considered and considerate responses.
but also once LLMs do get scaled up, everything will happen much faster because Moore’s law will be further along.
How do you account for the problem here that Nvidia’s and downstream suppliers’ investment in GPU hardware innovation and production capacity also went up as a result of the post-ChatGPT race (to the bottom) between tech companies on developing and releasing their LLM versions?
I frankly don’t know how to model this somewhat soundly. It’s damn complex.
Gebru thinks there is no existential risk from AI so I don’t really think she counts here.
I was imagining something like this response yesterday (‘Gebru does not care about extinction risks’).
My sense is that the reckless abandon of established safe engineering practices is part of what got us into this problem in the first place. I.e. if the safety community had insisted that models should be scoped and tested like other commercial software with critical systemic risks, we would be in a better place now.
It’s a more robust place to come from than the stance that developments will happen anyway – but that we somehow have to catch up by inventing safety solutions generally applicable to models auto-encoded on our general online data to have general (unknown) functionality, used by people generally to automate work in society.
If we’d manage to actually coordinate around not engineering stuff that Timnit Gebru and colleagues would count as ‘unsafe to society’ according to say the risks laid out in the Stochastic Parrots paper, we would also robustly reduce the risk of taking a mass extinction all the way. I’m not saying that is easy at all, just that it is possible for people to coordinate on not continuing to develop risky resource-intensive tech.
but the common thread is strong pessimism about the pragmatic alignment work frontier labs are best positioned to do.
This is agree with. So that’s our crux.
This not a very particular view – in terms of the possible lines of reasoning and/or people with epistemically diverse worldviews that end up arriving at this conclusion. I’d be happy to discuss the reasoning I’m working from, in the time that you have.
I agree you won’t get such a guarantee
Good to know.
I was not clear enough with my one-sentence description. I actually mean two things:
No sound guarantee of preventing ‘AGI’ from causing extinction (over the long-term, above some acceptably high probability floor), due to fundamental control bottlenecks in tracking and correcting out the accumulation of harmful effects as the system modifies in feedback with the environment over time.
The long-term convergence of this necessarily self-modifying ‘AGI’ on causing changes to the planetary environment that humans cannot survive.
The reason I think it’s possible is that a corrigible and non-murderous AGI is a coherent target that we can aim at and that AIs already understand. That doesn’t mean we’re guaranteed success mind you but it seems pretty clearly possible to me.
I agree that this is a specific target to aim at.
I also agree that you could program for an LLM system to be corrigible (for it to correct output patterns in response to human instruction). The main issue is that we cannot build in an algorithm into fully autonomous AI that can maintain coherent operation towards that target.
Slowing progress down is a smaller, second order effect. But many people seem to take it for granted that completely ceding frontier AI work to people who don’t care about AI risk would be preferable because it would slow down timelines!
It would be good to discuss specifics. When it comes to Dario & co’s scaling of GPT, it is plausible that a ChatGPT-like product would not have been developed without that work (see this section).
They made a point at the time of expressing concern about AI risk. But what was the difference they made here?
caring significantly about accelerating timelines seems to hinge on a very particular view of alignment where pragmatic approaches by frontier labs are very unlikely to succeed, whereas some alternative theoretical work that is unrelated to modern AI has a high chance of success.
It does not hinge though on just that view. There are people with very different worldviews (e.g. Yudkowsky, me, Gebru) who strongly disagree on fundamental points – yet still concluded that trying to catch up on ‘safety’ with current AI companies competing to release increasingly unscoped and complex models used to increasingly automate tasks is not tractable in practice.
I’m noticing that you are starting from the assumption that it is a tractibly solvable problem – particularly by “people who work closely with cutting edge AI and who are using the modern deep learning paradigm”.
A question worth looking into: how can we know whether the long-term problem is actually solvable? Is there a sound basis for believing that there is any algorithm we could build in that would actually keep controlling a continuously learning and self-manufacturing ‘AGI’ to not cause the extinction of humans (over at least hundreds of years, above some soundly guaranteeable and acceptably high probability floor)?
Edited:
DeepMind received its first major investment by Peter Thiel (introduced by Eliezer), and Jaan Tallinn later invested for a 1% stake.
Thank you for pointing to this. Let me edit it to be more clear.
I see how it can read as if you and Peter were just the main guys funding DeepMind from the start, which of course is incorrect.
Glad it’s insightful.
I’m curious to know if you have any plans to write up an essay about OpenPhil and the funding landscape.
It would be cool for some to write about how the funding landscape is skewed. Basically, most of the money has gone into trying to make safe the increasingly complex and unscoped AI developments that people are seeing or expecting to happen anyway.
In the last years, there has finally been some funding of groups that actively try to coordinate with an already concerned public to restrict unsafe developments (especially SFF grants funded by Jaan Tallinn). However, people in the OpenPhil network especially have continued to prioritise working with AGI development companies and national security interests, and it’s concerning how this tends to involve making compromises that support a continued race to the bottom.
Anecdotally, I have heard a few people in the community complain that they feel that OpenPhil has made it more difficult to publicly advocate for AI safety policies because they are afraid of how it might negatively affect Anthropic.
I’d be curious for any ways that OpenPhil has specifically made it harder to publicly advocate for AI safety policies. Does anyone have any specific experiences / cases they want to share here?
These are great questions for clarifying the applicability/scope of the argument!
Forrest, my mentor, just wrote punchy and insightful replies. See this page.
Note that:
he rephrased some of your questions in terms of how he understands them.
he split some sentences into multiple lines – this formatting is unusual, but is meant to make individual parts of an argument easier to parse.
he clarified that not only hard substrates can be ‘artificial’ (I had restricted my explanation to hard compartmentalised parts, because it makes for cleaner reasoning, and it covers the bulk of the scenarios).