To be precise what I meant by “implicitly linear” is a model that is globally linear in the feature space, after transforming inputs with a fixed map. In other words—a kernel machine. The claim is that ReLU networks approximate a particular, computable kernel machine during training.
it’s only locally linear, and the nonlinearity is a lot of what’s interesting. but this does seem like a pretty cool general idea.
To be precise what I meant by “implicitly linear” is a model that is globally linear in the feature space, after transforming inputs with a fixed map. In other words—a kernel machine. The claim is that ReLU networks approximate a particular, computable kernel machine during training.