The E-Coli Test for AI Alignment

Let’s say you have an idea in mind for how to al­ign an AI with hu­man val­ues.

Go prep a slide with some e-coli, put it un­der a micro­scope, and zoom in un­til you can see four or five cells. Your mis­sion: satisfy the val­ues of those par­tic­u­lar e-coli. In par­tic­u­lar, walk through what­ever method you have in mind for AI al­ign­ment. You get to play the role of the AI; with your so­phis­ti­cated brain, mas­sive com­put­ing power, and large-scale re­sources, hope­fully you can satisfy the val­ues of a few sim­ple e-coli cells.

Per­haps you say “this is sim­ple, they just want to max­i­mize re­pro­duc­tion rate.” Ah, but that’s not quite right. That’s op­ti­miz­ing for the goals of the pro­cess of evolu­tion, not op­ti­miz­ing for the goals of the god­shat­ter it­self. The e-coli has some frozen-in val­ues which have evolved to ap­prox­i­mate evolu­tion­ary fit­ness max­i­miza­tion in some en­vi­ron­ments; your job is op­ti­mize for the frozen-in ap­prox­i­ma­tion, even in new en­vi­ron­ments. After all, we don’t want a strong AI op­ti­miz­ing for the re­pro­duc­tive fit­ness of hu­mans—we want it op­ti­miz­ing for hu­mans’ own val­ues.

On the other hand, per­haps you say “these cells don’t have any con­sis­tent val­ues, they’re just ex­e­cut­ing a few sim­ple hard­coded al­gorithms.” Well, you know what else doesn’t have con­sis­tent val­ues? Hu­mans. Bet­ter be able to deal with that some­how.

Per­haps you say “these cells are too sim­ple, they can’t learn/​re­flect/​etc.” Well, chances are hu­mans will have the same is­sue once the com­pu­ta­tional bur­den gets large enough.

This is the prob­lem of AI al­ign­ment: we need to both define and op­ti­mize for the val­ues of things with limited com­pu­ta­tional re­sources and in­con­sis­tent val­ues. To see the prob­lem from the AI’s point of view, look through a micro­scope.