I am not John, so I can’t be completely sure what he meant, but here’s what I got from reflection on the idea:
One way to phrase the alignment problem (At least if we expect AGI to be neural network based) is that the alignment problem is how to get a bunch of matrices into the positions we want them to be in. There is (hopefully) some set of parameters, made of matrices, for a given architecture that is aligned, and some training process we can use to get there.
Now, determining what those positions are is very hard—we need to figure out what properties we need, encode them in maths, and ensure the training process gets there and stays there. Nevertheless, at it’s core, at least the last two of these are linear algebra problems, and if you were the God of Linear Algebra you could solve them. Since we can’t solve them, we don’t know enough linear algebra.
Sure, but I don’t think analyzing the meaning/function of any given configuration of a particular layer in isolation from other layers gets you very far. The layers depend on each other through non-linear activation functions, which should limit the usefulness of LA.
I would appreciate some elaboration on that idea.
I am not John, so I can’t be completely sure what he meant, but here’s what I got from reflection on the idea:
One way to phrase the alignment problem (At least if we expect AGI to be neural network based) is that the alignment problem is how to get a bunch of matrices into the positions we want them to be in. There is (hopefully) some set of parameters, made of matrices, for a given architecture that is aligned, and some training process we can use to get there.
Now, determining what those positions are is very hard—we need to figure out what properties we need, encode them in maths, and ensure the training process gets there and stays there. Nevertheless, at it’s core, at least the last two of these are linear algebra problems, and if you were the God of Linear Algebra you could solve them. Since we can’t solve them, we don’t know enough linear algebra.
Sure, but I don’t think analyzing the meaning/function of any given configuration of a particular layer in isolation from other layers gets you very far. The layers depend on each other through non-linear activation functions, which should limit the usefulness of LA.