One addition: I’ve been privately informed that another interesting thing to look at would be a visualization of C² (rather than only multiplication of a constant complex number with other complex numbers, see Visualizing Learned Functions section).
So I did that. For instance, here’s the square visualization of model2 (the one with [10, 10] hidden neurons):
Again, we see some clear parallel between reality and the model, i.e. colors end up in roughly the right places, but it’s clearly quite a bit off anyway. We also still see a lot of “linearity”, i.e. straight lines in the model predictions as well as the diff heatmap, but this linearity is now seemingly only occurring in “radial” form, towards the center.
Model 0 and 1 look similar / worse. Model 3 ([20, 30, 20] hidden neurons) gets much closer despite still using ReLU:
And model 4 (same but with SiLU), expectedly, does even better:
But ultimately, we see the same pattern of “the larger the model, the more accurate, and SiLU works better than ReLU” again, without any obvious qualitative difference between SiLU and ReLU—so I don’t think these renderings give any direct hint of SiLU performing that much better for actual fractal renderings than ReLU.
One addition: I’ve been privately informed that another interesting thing to look at would be a visualization of C² (rather than only multiplication of a constant complex number with other complex numbers, see Visualizing Learned Functions section).
So I did that. For instance, here’s the square visualization of model2 (the one with [10, 10] hidden neurons):
Again, we see some clear parallel between reality and the model, i.e. colors end up in roughly the right places, but it’s clearly quite a bit off anyway. We also still see a lot of “linearity”, i.e. straight lines in the model predictions as well as the diff heatmap, but this linearity is now seemingly only occurring in “radial” form, towards the center.
Model 0 and 1 look similar / worse. Model 3 ([20, 30, 20] hidden neurons) gets much closer despite still using ReLU:
And model 4 (same but with SiLU), expectedly, does even better:
But ultimately, we see the same pattern of “the larger the model, the more accurate, and SiLU works better than ReLU” again, without any obvious qualitative difference between SiLU and ReLU—so I don’t think these renderings give any direct hint of SiLU performing that much better for actual fractal renderings than ReLU.