Oh yeah, that makes sense. I wouldn’t want to make that assumption though, since activation functions are explicitly non-linear, otherwise the multiple layers can be multiplied together and a multi-layer perceptron would just be an indirect way of doing a single linear map.
Oh, I meant in the category of (topological) vector spaces, which requires the quotient maps to be linear.
Oh yeah, that makes sense. I wouldn’t want to make that assumption though, since activation functions are explicitly non-linear, otherwise the multiple layers can be multiplied together and a multi-layer perceptron would just be an indirect way of doing a single linear map.