the human-created source code must be defining a learning algorithm of some sort. And then that learning algorithm will figure out for itself that tires are usually black etc. Might this learning algorithm be simple and legible? Yes! But that was true for GPT-3 too
Simple first-order learning algorithms have types of patterns they recognize, and meta-learning algorithms also have types of patterns they like.
In order to make a friendly or aligned AI, we will have to have some insight into what types of patterns we are going to have it recognize, and separately what types of things it is going to like or find salient.
There was a simple calculation protocol which generated GPT-3. The part that was not simple was translating that into predicting its preferences or perceptual landscape, and hence what it would do after it was turned on. And if you can’t predict how a parameter will respond to input, you can’t architect it one-shot.
Simple first-order learning algorithms have types of patterns they recognize, and meta-learning algorithms also have types of patterns they like.
In order to make a friendly or aligned AI, we will have to have some insight into what types of patterns we are going to have it recognize, and separately what types of things it is going to like or find salient.
There was a simple calculation protocol which generated GPT-3. The part that was not simple was translating that into predicting its preferences or perceptual landscape, and hence what it would do after it was turned on. And if you can’t predict how a parameter will respond to input, you can’t architect it one-shot.