To be precise what I meant by “implicitly linear” is a model that is globally linear in the feature space, after transforming inputs with a fixed map. In other words—a kernel machine. The claim is that ReLU networks approximate a particular, computable kernel machine during training.
To be precise what I meant by “implicitly linear” is a model that is globally linear in the feature space, after transforming inputs with a fixed map. In other words—a kernel machine. The claim is that ReLU networks approximate a particular, computable kernel machine during training.