Robustness to Scale

I want to quickly draw at­ten­tion to a con­cept in AI al­ign­ment: Ro­bust­ness to Scale. Briefly, you want your pro­posal for an AI to be ro­bust (or at least fail grace­fully) to changes in its level of ca­pa­bil­ities. I dis­cuss three differ­ent types of ro­bust­ness to scale: ro­bust­ness to scal­ing up, ro­bust­ness to scal­ing down, and ro­bust­ness to rel­a­tive scale.

The pur­pose of this post is to com­mu­ni­cate, not to per­suade. It may be that we want to bite the bul­let of the strongest form of ro­bust­ness to scale, and build an AGI that is sim­ply not ro­bust to scale, but if we do, we should at least re­al­ize that we are do­ing that.

Ro­bust­ness to scal­ing up means that your AI sys­tem does not de­pend on not be­ing too pow­er­ful. One way to check for this is to think about what would hap­pen if the thing that the AI is op­ti­miz­ing for were ac­tu­ally max­i­mized. One ex­am­ple of failure of ro­bust­ness to scal­ing up is when you ex­pect an AI to ac­com­plish a task in a spe­cific way, but it be­comes smart enough to find new cre­ative ways to ac­com­plish the task that you did not think of, and these new cre­ative ways are dis­as­trous. Another ex­am­ple is when you make an AI that is in­cen­tivized to do one thing, but you add re­stric­tions that make it so that the best way to ac­com­plish that thing has a side effect that you like. When you scale the AI up, it finds a way around your re­stric­tions.

Ro­bust­ness to scal­ing down means that your AI sys­tem does not de­pend on be­ing suffi­ciently pow­er­ful. You can’t re­ally make your sys­tem still work when it scales down, but you can maybe make sure it fails grace­fully. For ex­am­ple, imag­ine you had a sys­tem that was try­ing to pre­dict hu­mans, and use these pre­dic­tions to figure out what to do. When scaled up all the way, the pre­dic­tions of hu­mans are com­pletely ac­cu­rate, and it will only take ac­tions that the pre­dicted hu­mans would ap­prove of. If you scale down the ca­pa­bil­ities, your sys­tem may pre­dict the hu­mans in­cor­rectly. Th­ese er­rors may mul­ti­ply as you stack many pre­dicted hu­mans to­gether, and the sys­tem can end up op­ti­miz­ing for some seem­ing ran­dom goal.

Ro­bust­ness to rel­a­tive scale means that your AI sys­tem does not de­pend on any sub­sys­tems be­ing similarly pow­er­ful to each other. This is most easy to see in sys­tems that de­pend on ad­ver­sar­ial sub­sys­tems. If part of you AI sys­tem is sug­gest plans, and an­other part is try­ing to find prob­lems in those plans, if you scale up the sug­gester rel­a­tive to the ver­ifier, the sug­gester may find plans that are op­ti­mized for tak­ing ad­van­tage of the ver­ifier’s weak­nesses.

My cur­rent state is that when I hear pro­pos­als for AI al­ign­ment that do not feel very strongly ro­bust to scale, I be­come very wor­ried about the plan. Part of this comes from feel­ing like we are ac­tu­ally very early on a lo­gis­tic ca­pa­bil­ities curve. I thus ex­pect that as we scale up ca­pa­bil­ities, we can get even­tu­ally get large differ­ences very quickly. Thus, I ex­pect that the scaled up (and par­tially scaled up) ver­sions to ac­tu­ally hap­pen. How­ever, ro­bust­ness to scale is very difficult, so it may be that we have to de­pend on sys­tems that are not very ro­bust, and be care­ful not to push them too far.