yeah while I noticed the distinction i usually find it worthwhile to try to steal tools across problem statements that use the same words in a different order, i’ll use your data point to downweight that heuristic a little thanks :p
Did knowing that the joint-gaussian thing generalizes to RNNs influence your decision to look at RNNs next?
Honestly for me it’s more of a strike against RNNs. Real deep neural networks that have been trained don’t have this property, so it’s a bridge we’re going to need to cross at some point regardless. From a derisking point of view I’d kind of like to get to that point ASAP. There’s a lot of talk about looking at random boolean circuits (which very obviously don’t have this property), narrow MLPs, or even jumping all the way to wide MLPs trained in some sort of mean-field/maximum update regime that gets rid of it.
yeah while I noticed the distinction i usually find it worthwhile to try to steal tools across problem statements that use the same words in a different order, i’ll use your data point to downweight that heuristic a little thanks :p
Did knowing that the joint-gaussian thing generalizes to RNNs influence your decision to look at RNNs next?
Honestly for me it’s more of a strike against RNNs. Real deep neural networks that have been trained don’t have this property, so it’s a bridge we’re going to need to cross at some point regardless. From a derisking point of view I’d kind of like to get to that point ASAP. There’s a lot of talk about looking at random boolean circuits (which very obviously don’t have this property), narrow MLPs, or even jumping all the way to wide MLPs trained in some sort of mean-field/maximum update regime that gets rid of it.