Partially-stochastic has a longer running time, because you have to predict which trajectories work ahead of time. Imagine having to guess the model weights, and then only use the training data to see if your guess checks out. Instead of wasting time finding a better guess, VAEs just say, “all guesses [for the ‘partially-stochastic’ bits] should be equally valid.” We know that isn’t true, so there’s going to be performance issues.
I’m not sure why you’re thinking about guessing model weights here. The thing I’m thinking with stochastic models is the forward pass bit, Monte Carlo sampling. I’m not sure why pre-computed randomness would be a problem for that portion.
As a weird example: Say there’s a memoized random function mapping strings to uniform random bits. This can’t really be pre-computed, because it’s very big. But it can be lazily evaluated, as if pre-computed. Now the stochastic model can query the memoized random function with a unique specification of the situation it’s querying. This should be equivalent to flipping coins mid-run.
Alternatively, if the Monte Carlo process is sequential, then it can just “read the next bit”, that’s computationally simpler.
Maybe it’s not an issue for forward sampling but it is for backprop? Not sure what you mean.
I’m not really sure what you mean either. Here’s a simplified toy that I think captures what you’re saying:
A turtle starts at the origin.
We flip a series of coins—on heads we move +1 in the nth dimension, on tails −1 in the nth dimension.
After N coin flips, we’ll be somewhere in N-d space. It obviously can be described with N bits.
Why are we flipping coins, instead of storing that N-bit string and then reading them off one at a time? Why do we need the information in real time?
Well, suppose you only care about that particular N-bit string. Maybe it’s the code to human DNA. How are you supposed to write down the string before humans exist? You would have to do a very expensive simulation.
If you’re training a neural network on offline data, sure you can seed a pseudo-random number generator and “write the randomness” down early. Training robots in simulation translates pretty well to the real world, so you don’t lose much. Now that I think about it, you might be able to claim the same with VAEs. My issue with VAEs is they add the wrong noise, but that’s probably due to humans not finding the right algorithm rather than the specific distribution being expensive to find.
This seems like a case of Bayesian inference. Like, we start from the observation that humans exist having the properties they are, and then find the set of strings consistent with that. Like, start from a uniform measure on the strings and then condition on “the string produces humans”.
Which is computationally intractable of course. The usual Bayesian inference issues. Though Bayesian inference would be hard if stochasticity was generated on the fly rather than being initial, too.
Partially-stochastic has a longer running time, because you have to predict which trajectories work ahead of time. Imagine having to guess the model weights, and then only use the training data to see if your guess checks out. Instead of wasting time finding a better guess, VAEs just say, “all guesses [for the ‘partially-stochastic’ bits] should be equally valid.” We know that isn’t true, so there’s going to be performance issues.
I’m not sure why you’re thinking about guessing model weights here. The thing I’m thinking with stochastic models is the forward pass bit, Monte Carlo sampling. I’m not sure why pre-computed randomness would be a problem for that portion.
As a weird example: Say there’s a memoized random function mapping strings to uniform random bits. This can’t really be pre-computed, because it’s very big. But it can be lazily evaluated, as if pre-computed. Now the stochastic model can query the memoized random function with a unique specification of the situation it’s querying. This should be equivalent to flipping coins mid-run.
Alternatively, if the Monte Carlo process is sequential, then it can just “read the next bit”, that’s computationally simpler.
Maybe it’s not an issue for forward sampling but it is for backprop? Not sure what you mean.
I’m not really sure what you mean either. Here’s a simplified toy that I think captures what you’re saying:
A turtle starts at the origin.
We flip a series of coins—on heads we move +1 in the nth dimension, on tails −1 in the nth dimension.
After N coin flips, we’ll be somewhere in N-d space. It obviously can be described with N bits.
Why are we flipping coins, instead of storing that N-bit string and then reading them off one at a time? Why do we need the information in real time?
Well, suppose you only care about that particular N-bit string. Maybe it’s the code to human DNA. How are you supposed to write down the string before humans exist? You would have to do a very expensive simulation.
If you’re training a neural network on offline data, sure you can seed a pseudo-random number generator and “write the randomness” down early. Training robots in simulation translates pretty well to the real world, so you don’t lose much. Now that I think about it, you might be able to claim the same with VAEs. My issue with VAEs is they add the wrong noise, but that’s probably due to humans not finding the right algorithm rather than the specific distribution being expensive to find.
This seems like a case of Bayesian inference. Like, we start from the observation that humans exist having the properties they are, and then find the set of strings consistent with that. Like, start from a uniform measure on the strings and then condition on “the string produces humans”.
Which is computationally intractable of course. The usual Bayesian inference issues. Though Bayesian inference would be hard if stochasticity was generated on the fly rather than being initial, too.