johnswentworth comments on Bayes Questions

johnswentworth 8 Nov 2018 0:22 UTC
2 points
0
Having ~ten data points makes this way more interesting. That’s exactly the kind of problem that I specialize in.
For the log-normal distribution, it should be possible to do the integral for $P (d a t a | m o d e l)$ explicitly. The integral is tractable for the normal distribution—it comes out proportional to a power of the sample variance—so just log-transform the data and use that. If you write down the integral for normal-distributed data explicitly and plug it into wolframalpha or something, it should be able to handle it. That would circumvent needing to sample $μ$ and $σ$ .
I don’t know if there’s a corresponding closed form for Birnbaum-Saunders; I had never even heard of it before this. The problem is still sufficiently low-dimensional that it would definitely be tractable computationally, but it would probably be a fair bit of work to code.
- habryka 8 Nov 2018 2:29 UTC
  6 points
  0
  Parent
  (I thoroughly enjoy/enjoyed this exchange and think I learned some things from it, and wanted to thank you for having this out in the open)
- Bucky 9 Nov 2018 12:10 UTC
  1 point
  0
  Parent
  Birnbaum-Saunders is an interesting one. For the purposes of fatigue analysis, the assumptions which bring about the three models are:
  Weibull—numerous failure modes (of similar failure speed) racing to see which causes the component to fail first
  Log-normal—Damage per cycle is proportional to current damage
  Birnbaum-Saunders—Damage per cycle is normally distributed and independent of previous damage
  My engineering gut says that this component is probably somewhere between Log-normal and Birnbaum-Saunders (I think proportionality will decay as damage increases) which is maybe why I don’t have a clear winner yet.
  ***
  I think I understand now where my original reasoning was incorrect when I was calculating the expected worst in a million. I was just calculating worst in a million for each model and taking a weighted average of the answers. This meant that bad values from the outlier potential pdfs were massively suppressed.
  I’ve done some sampling of the worst in a million by repeatedly creating 2 random numbers from 0 to 1. I use the first to select a μ,σ combination based on the posterior for each pair. I use the second random number as a p-value. I then icdf those values to get an x.
  Is this an ok sampling method? I’m not sure if I’m missing something or should be using MCMC. I definitely need to read up on this stuff!
  The worst in a million is currently dominated by those occasions where the probability distribution is an outlier. In those cases the p-value doesn’t need to be particularly extreme to achieve low x.
  I think my initial estimates were based either mainly on uncertainty in p or mainly on uncertainty in μ,σ. The sampling method allows me to account for uncertainty in both which definitely makes more sense. The model seems to react sensibly when I add potential new data so I think I can assess much better now how many data points I require.
  - johnswentworth 9 Nov 2018 19:12 UTC
    1 point
    0
    Parent
    That sampling method sounds like it should work, assuming it’s all implemented correctly (not sure what method you’re using to sample from the posterior distribution of $μ$ , $σ$ ).
    Worst case in a million being dominated by parameter uncertainty definitely makes sense, given the small sample size and the rate at which those distributions fall off.
    - Bucky 9 Nov 2018 20:14 UTC
      1 point
      0
      Parent
      For μ,σ I effectively created a quasi-cumulative distribution with the parameter pairs as the x-axis.
      
      μ1,σ1. μ2,σ1. μ3,σ1 … μ1,σ2. μ2,σ2. μ3,σ2 … μn,σm
      
      The random number defines the relevant point on the y-axis. From there I get the corresponding μ,σ pair from the x-axis.
      
      If this method works I’ll probably have to code the whole thing instead of using a spreadsheet as I don’t have nearly enough μ,σ values to get a good answer currently.
      - johnswentworth 9 Nov 2018 22:30 UTC
        1 point
        0
        Parent
        At what point is the data used?
        Bucky 9 Nov 2018 22:44 UTC
        1 point
        0
        Parent
        I use it to determine the relative probabilities of each μ,σ pair which in turn create the pseudo cdf.
        
        johnswentworth 9 Nov 2018 22:49 UTC
        1 point
        0
        Parent
        Ok, that sounds right.