Google’s Ethical AI team and AI Safety

cross-posted from my blog

Background on the events

I have been thinking about this since the firing of Dr. Timnit Gebru, and yet still no one has actually written about it beyond my own tweets, so I guess it falls to me.

I find, and I imagine many people in the rat-sphere agree, the idea of talking about energy consumption and climate change to be low on my list of ethical priorities surrounding AI. But I find that uncompelling because I think that (a) this cost can be weighed against the benefits AI can create and (b) this cost can be literally offset by potential current and future carbon capture technologies. I think this is well established in the EA community, with recent possible exceptions taking shape.

But these ideas rely on current assumptions about how much power is being used for what purposes. If AI continues to scale by adding compute, as is generally expected, this could create conflicts of interest in the AI space. That would be bad for a number of reasons, chief among them that it would mean that only actors who are willing to impose substantial costs on the commons would be able to implement their visions. This is my central point, so I will return to it later.

For now, just keep in mind that the low priority of climate change among EAs is an empirical question of how easy it is to influence certain changes. I don’t think any of the specific work by Dr. Gebru makes a convincing case to me that the question has a different answer. But I haven’t heard literally any other single person say that!

Instead, she was fired, and today the other co-lead of her team was also fired. The justification for firing Gebru was “she quit.” No public statement has been made, even internally to the team both managed, about why Margaret Mitchell was fired, unless you count “it’s part of a re-org.” For reference, my team at Google has been re-org’d at least four times, and I have never seen anyone fired or even moved out of their management position in that time. Usually I don’t even notice.

(Because of timing there has been some conflation of this incident with the firing of a recruiter who worked mostly with historically Black colleges and universities. Maybe this is evidence that racism played a part in the decision, but I intend to regard “bad decision-making” on the part of Alphabet as a bit of a black box because harming AI safety prospects is very bad regardless of whether they are doing it for racist reasons.)

So at this stage, it looks like a big corporation made some bad HR decisions and fired people who were well regarded as managers but ultimately doing work that I value about as much as most of the day to day work at Google. That’s not so bad, beyond the fact that we live in a world where small sets of high ranking execs get to make bad decisions without oversight, but we all already knew we were living in that world.

Models of AI Safety

The reason I think this is bad, is that I think it invalidates my ideas about how AI Safety could be implemented in the real world.

In brief: in order to “align” your AI, you will need to reduce it’s direct efficacy on some measure, which will be opposed by a middle manager. I had hoped that “official avenues” like the Ethical AI team could be sufficiently ingrained that when the ideas needed to solve AI Safety are developed, there is a way to incorporate them into the projects which have enough compute to create an aligned AI before others accidentally create misaligned AI.

In more detail:

  1. AI scales intensely with compute. (99%+ confidence)

  2. Large projects, such as Google or the US government, will have access to more compute than projects formed by small organizations, being able to easily put together several million dollars of compute on short notice. (95% confidence)

  3. Some small set of large projects will be in position to create AGI with some alignment plan for a few years before large numbers of small actors will be in position to do so. (requires ~1+2, slightly less confident than 2, say a 90% confidence interval of 1.5-10 years)

  4. Once a large number of small actors are able to easily create AGI, one of them will accidentally create misaligned AGI pretty quickly. This is my “time limit” on how long we have to get AGI implemented, assuming a ‘solution’ exists before AGI is possible. (~80% chance a misaligned AGI emerges within 30 years of it being possible to make an AGI with <1 year’s SWE salary of compute in your garage; 50% chance within 10 years, 10% chance within 2 years)

  5. The first available solution to AGI alignment will require spending more time to develop an aligned AGI than the first available plan for creating any AGI. (90% confidence interval of how much longer: 2 weeks − 5 years)

  6. Therefore, in order to be confident that the first AGI created is aligned, the process from “AGI alignment is solved and AGI is possible with access to the best compute on earth” to “An org with enough compute to build AGI is executing an alignment plan with enough head start not to be overtaken by a new misaligned AGI project” needs to be as short as possible, because 1.5 years of compute advantage plus 2 years of code accessibility is already possibly not enough to cover the delay needed to align the AGI. (follows from 3,4,5)

  7. Ethical AI teams are a natural place to introduce an alignment solution to a large organization like Google. (90% confidence interval of how much faster the team could impose an alignment plan than any other team: 0.5 − 10 years. Probability I think such a team would impose such a plan, if they had it and were in a position to do so: 80%+. Probability I think any other team would impose such a plan: ~30%)

    1. The team has to be aware of cutting edge developments in alignment enough to identify the earliest correct solution. Teams focused on improving results or applying to specific cases will not reliably have that familiarity, but it fits directly into the scope of ethical AI teams.

    2. The team has to be technically capable of influencing the direction of actually implemented AI projects at the org. If one Google exec believes something strongly, they can’t implement a technical program on their own. If people in Ads understand the program, transitioning to Google Brain’s codebase alone would be a difficult task. An ethical AI team should have specific firsthand experience applying alignment-like frameworks to actual AI projects, so that they can begin executing as soon as the priority is clear to them.

    3. The team has to be politically capable of influencing the direction of actually implemented AI projects at the org. If a SWE says “I can do this in two weeks” and their manager says “ship it tomorrow or I’ll get Jane to do it instead,” then you need to have influence over every possible SWE that could do the work. If the organization instead sees the value of oversight programs and has people in place to execute those programs, you only need to influence the leader of that team to start the plan.

I don’t think any of these points are controversial or surprising.

There has long been agreement that large-scale projects such as those at large corporations or governments will be able to create AGI earlier. This is a possible way to get the timing lead necessary if aligning the AGI causes it to take much more time than misaligned AGI.

But that timing lead only manifests if it takes a shorter period of time to become part of the org and set the direction of a major project than it does to wait for compute to get cheaper.

Putting two and two together

My previously existing hope was something like this:

  1. Existing ethical AI teams are maintained at large companies because they add value through:

    1. Something like PR maintenance by keeping them from being too evil

    2. Finding pockets of non-obvious value through accessibility or long-term incentives.

  2. Existing ethical AI teams actually care about ethics, and have some members that keep up with AI Safety research.

  3. The members who keep up with safety research can convince their team to act when there is a “solution.”

  4. The ethical AI team can convince other teams to act.

  5. One or more large companies with such teams and such plans will move forward confidently while other possible early actors will not, giving them a strategic advantage.

  6. ????

  7. Profit

The news of Drs. Gebru and Mitchell being removed from Google seems to be a direct refutation of (4), because their attempts to create action more broadly caused retaliation against them.

It also makes me very concerned about (1), and especially (1.1), in that it seems that Google continued this course of action over the course of three months of bad press. I can also take this as some evidence that (1.2) isn’t working out, or else there would be some profit motive for Google not to take these steps.

Google is the prime example of a tech company that values ethics, or it was in the recent past. I have much less faith in Amazon or Microsoft or Facebook or the US federal government or the Chinese government that they would even make gestures toward responsibility in AI. And the paper that is widely cited as kicking off this debate is raising concerns about racial equity and climate change, which are exactly the boring central parts of the culture war that I’d expect Googlers to support in massive majorities.

What else could happen

One of the big reactions I’m having is despair. I don’t think this is a killing blow to humanity for a few reasons, but it did seem like the single most likely path to a really good future, so I’m pretty sad about it.

But I think there are still a number of other ways that the problem of implementing an AGI alignment strategy could be solved.

  1. It could be much easier to solve alignment than expected, or the solution could make it easier to implement AGI than otherwise! That would be nice, but it’s not within anyone’s control.

  2. New ethical AI teams could form at other organizations, especially the US Gov’t. This seems directly actionable, though I don’t really see how someone would sell this work to an org.

  3. More secretive connections (such as the leadership of DeepMind or OpenAI) could establish a more direct throughline that I will never know about until I start living out the first chapter of Friendship is Optimal. This also does not seem actionable to me.

  4. Fix (4) in the above by pressure from people who can exert influence over such organizations.

    1. By creating helpful legislation for alignment if such a thing is possible, which I am somewhat skeptical of

    2. Worker organizations that can pressure tech companies on problems of the commons, such as the newly formed Alphabet Worker’s Union. This seems directly actionable, but requires a lot of effort of a sort that I’m uncomfortable with.

I guess I’m mostly out of thoughts, but I hope this makes the case at least somewhat that these events are important, even if you don’t care at all about the specific politics involved.