Why are neural nets organised in layers, rather than as a DAG?
I tried Google, but when I asked “Why are neural nets organised in layers?”, the answers were all to the question “Why are neural nets organised in multiple layers instead of one layer?” which is not my question. When I added the stipulation “rather than as a DAG”, it turns out that some people have indeed worked on NNs organized as DAGs, but I was very unimpressed with all of the references I looked at.
GPT 3.5, needless to say, failed to give a useful answer.
I can come up with one reason, which is that the connectivity of an NN can be specified with just a handful of numbers: the number of neurons in each layer. But in the vastly larger space of all DAGs, how does one locate a useful one? I can come up with another reason: layered NNs allow the required calculations to be organised as operations on large arrays, which can be efficiently performed in hardware. Perhaps the calculations required by less structured DAGS are not so amenable to fast hardware execution.
But I’m just guessing.
Can anyone give me a real answer to the question, why are neural nets organised in layers, rather than as a DAG? Has any real work been done on the latter?
All neural nets can be represented as a DAG, in principle (including RNNs, by unrolling). This makes automatic differentiation nearly trivial to implement.
It’s very slow, though, if every node is a single arithmetic operation. So typically each node is made into a larger number of operations simultaneously, like matrix multiplication or convolution. This is what is normally called a “layer.” Chunking the computations this way makes It easier to load them into a GPU.
However, even these operations can still be differentiated as one formula, e.g. in the case of matrix mult. So it is still ostensibly a DAG even when it is organized into layers. (This is how IIRC libraries like PyTorch work.)
I don’t think “sparse neural networks” fit the bill. All the references I’ve turned up for the phrase talk about the usual sort of what I’ve been calling layered NNs, but where most of the parameters are zero. This leaves intact the layer structure.
To express more precisely the sort of connectivity I’m talking about, for any NN, construct the following directed graph. There is one node for every neuron, and an arc from each neuron A to each neuron B whose output depends directly on an output value of A.
For the NNs as described in e.g. Andrej Karpathy’s lectures (which I’m currently going through), this graph is a DAG. Furthermore, it is a DAG having the property of layeredness, which I define thus:
A DAG is layered if every node A can be assigned an integer label L(A), such that for every edge from A to B, L(B) = L(A)+1. A layer is the set of all the nodes having a given label.
The sparse NNs I’ve found in the literature are all layered. A “full” (i.e. not sparse) NN would also satisfy the converse of the above definition, i.e. L(B) = L(A)+1 would imply an edge from A to B.
The simplest example of a non-layered DAG is one with three nodes A, B, and C, with edges from A to B, A to C, and B to C. If you tried to structure this into layers, you would either find an edge between two nodes in the same layer, or an edge that skips a layer.
To cover non-DAG NNs also, I’d call one layered if in the above definition, “L(B) = L(A)+1” is replaced by “L(B) = L(A) ± 1″. (ETA: This is equivalent to the graph being bipartite: the nodes can be divided into two sets such that every edge goes from a node in one set to a node in the other.)
It could be called approximately layered if most edges satisfy the condition.
Are there any not-even-approximately-layered NNs in the literature?
I’m fairly sure that there’s architectures where each layer is a linear function of the concatenated activations of all previous layers, though I can’t seem to find it right now. If you add possible sparsity to that, then I think you get a fully general DAG.
Why are neural nets organised in layers, rather than as a DAG?
I tried Google, but when I asked “Why are neural nets organised in layers?”, the answers were all to the question “Why are neural nets organised in multiple layers instead of one layer?” which is not my question. When I added the stipulation “rather than as a DAG”, it turns out that some people have indeed worked on NNs organized as DAGs, but I was very unimpressed with all of the references I looked at.
GPT 3.5, needless to say, failed to give a useful answer.
I can come up with one reason, which is that the connectivity of an NN can be specified with just a handful of numbers: the number of neurons in each layer. But in the vastly larger space of all DAGs, how does one locate a useful one? I can come up with another reason: layered NNs allow the required calculations to be organised as operations on large arrays, which can be efficiently performed in hardware. Perhaps the calculations required by less structured DAGS are not so amenable to fast hardware execution.
But I’m just guessing.
Can anyone give me a real answer to the question, why are neural nets organised in layers, rather than as a DAG? Has any real work been done on the latter?
It’s to make the computational load easier.
All neural nets can be represented as a DAG, in principle (including RNNs, by unrolling). This makes automatic differentiation nearly trivial to implement.
It’s very slow, though, if every node is a single arithmetic operation. So typically each node is made into a larger number of operations simultaneously, like matrix multiplication or convolution. This is what is normally called a “layer.” Chunking the computations this way makes It easier to load them into a GPU.
However, even these operations can still be differentiated as one formula, e.g. in the case of matrix mult. So it is still ostensibly a DAG even when it is organized into layers. (This is how IIRC libraries like PyTorch work.)
I think you might want to look at the litterature on “sparse neural networks”, which is the right search term for what you mean here.
I don’t think “sparse neural networks” fit the bill. All the references I’ve turned up for the phrase talk about the usual sort of what I’ve been calling layered NNs, but where most of the parameters are zero. This leaves intact the layer structure.
To express more precisely the sort of connectivity I’m talking about, for any NN, construct the following directed graph. There is one node for every neuron, and an arc from each neuron A to each neuron B whose output depends directly on an output value of A.
For the NNs as described in e.g. Andrej Karpathy’s lectures (which I’m currently going through), this graph is a DAG. Furthermore, it is a DAG having the property of layeredness, which I define thus:
A DAG is layered if every node A can be assigned an integer label L(A), such that for every edge from A to B, L(B) = L(A)+1. A layer is the set of all the nodes having a given label.
The sparse NNs I’ve found in the literature are all layered. A “full” (i.e. not sparse) NN would also satisfy the converse of the above definition, i.e. L(B) = L(A)+1 would imply an edge from A to B.
The simplest example of a non-layered DAG is one with three nodes A, B, and C, with edges from A to B, A to C, and B to C. If you tried to structure this into layers, you would either find an edge between two nodes in the same layer, or an edge that skips a layer.
To cover non-DAG NNs also, I’d call one layered if in the above definition, “L(B) = L(A)+1” is replaced by “L(B) = L(A) ± 1″. (ETA: This is equivalent to the graph being bipartite: the nodes can be divided into two sets such that every edge goes from a node in one set to a node in the other.)
It could be called approximately layered if most edges satisfy the condition.
Are there any not-even-approximately-layered NNs in the literature?
I’m fairly sure that there’s architectures where each layer is a linear function of the concatenated activations of all previous layers, though I can’t seem to find it right now. If you add possible sparsity to that, then I think you get a fully general DAG.