If, on the other hand, an activation function is non-monotonic, as in ReLU, then some unrelated parts of the input space will get folded together into the same parts of the output space
Just a small technicality, but you probably mean “strictly monotonic” instead of monotonic, because ReLU actually is monotonic, right? (Or perhaps “injective” would be even closer, although I suppose in continuous spaces that’s practically the same as strict monotonicity) Of course, your actual point here still holds.
Indeed. Thank you 🙏 I’ll edit the post based on this. I think “injective” is most correct for the claim, although I don’t know of any commonly used discontinuous activation functions.
You might also be interested in the second half of this comment.
Just a small technicality, but you probably mean “strictly monotonic” instead of monotonic, because ReLU actually is monotonic, right? (Or perhaps “injective” would be even closer, although I suppose in continuous spaces that’s practically the same as strict monotonicity)
Of course, your actual point here still holds.
Indeed. Thank you 🙏 I’ll edit the post based on this. I think “injective” is most correct for the claim, although I don’t know of any commonly used discontinuous activation functions.
You might also be interested in the second half of this comment.