Lucius Bushnaq comments on johnswentworth’s Shortform

Lucius Bushnaq 7 Oct 2025 18:52 UTC
4 points
0
Did that clarify?
Yes. Seems like a pretty strong assumption to me.
Yup, it sure does look similar. One tricky point here is that we’re trying to fit the $f$ ’s to the data, so if going that route we’d need to pick some parametetric form for $f$ .
Ah. In that case, are you sure you actually need $Z$ to do the model comparisons you want? Do you even really need to work with this specific functional form at all? As opposed to e.g. training a model $p (λ ∣ X)$ to feed its output into $m$ tiny normalizing flow models which then try to reconstruct the original input data with conditional probability distributions $q_{i} (x_{i} ∣ λ)$ ?

To sketch out a little more what I mean, $p (λ ∣ X)$ could e.g. be constructed as a parametrised function^[1] which takes in the actual samples $X$ and returns the mean of a Gaussian, which $λ$ is then sampled from in turn^[2]. The $q_{i} (x_{i} ∣ λ)$ would be constructed using normalising flow networks^[3], which take in $λ$ as well as uniform distributions over variables $z_{i}$ that have the same dimensionality as their $x_{i}$ . Since the networks are efficiently invertible, this gives you explicit representations of the conditional probabilities $q_{i} (x_{i} ∣ λ)$ , which you can then fit to the actual data using KL-divergence.
You’d get explicit representations for both $P [λ ∣ X]$ and $P [X ∣ λ]$ from this.
1. ^
  Or ensemble of functions, if you want the mean of $λ$ to be something like $\sum_{i} f_{i} (x_{i})$ specifically.
2. ^
  Using reparameterization to keep the sampling operation differentiable in the mean.
3. ^
  If the dictionary of possible values of $X$ is small, you can also just use a more conventional ml setup which explicitly outputs probabilities for every possible value of every $x_{i}$ of course.
- johnswentworth 7 Oct 2025 19:13 UTC
  4 points
  0
  Parent
  That would be pretty reasonable, but it would make the model comparison part even harder. I do need P[X] (and therefore Z) for model comparison; this is the challenge which always comes up for Bayesian model comparison.
  - Lucius Bushnaq 7 Oct 2025 19:48 UTC
    4 points
    0
    Parent
    Why does it make Bayesian model comparison harder? Wouldn’t you get explicit predicted probabilities for the data $X$ from any two models you train this way? I guess you do need to sample from the Gaussian in $λ$ a few times for each $X$ and pass the result through the flow models, but that shouldn’t be too expensive.