Honestly for me it’s more of a strike against RNNs. Real deep neural networks that have been trained don’t have this property, so it’s a bridge we’re going to need to cross at some point regardless. From a derisking point of view I’d kind of like to get to that point ASAP. There’s a lot of talk about looking at random boolean circuits (which very obviously don’t have this property), narrow MLPs, or even jumping all the way to wide MLPs trained in some sort of mean-field/maximum update regime that gets rid of it.
Honestly for me it’s more of a strike against RNNs. Real deep neural networks that have been trained don’t have this property, so it’s a bridge we’re going to need to cross at some point regardless. From a derisking point of view I’d kind of like to get to that point ASAP. There’s a lot of talk about looking at random boolean circuits (which very obviously don’t have this property), narrow MLPs, or even jumping all the way to wide MLPs trained in some sort of mean-field/maximum update regime that gets rid of it.