niplav comments on shortplav

niplav 30 Oct 2025 12:47 UTC
7 points
0
epistemic status: Going out on a limb and claiming to have solved an open problem in decision theory^[1] by making some strange moves. Trying to leverage Cunningham’s law. Hastily written.

p(the following is a solution to Pascal’s mugging in the relevant sense)≈25%^[2].

Okay, setting (also here in more detail): You have a a Solomonoff inductor with some universal semimeasure as a prior. The issue is that the utility of programs can grow faster than your universal semimeasure can penalize them, e.g. a complexity prior has busy-beaver-like programs that produce $B B (| p |)$ amounts of utility with the program $| p |$ , while only being penalized by $2^{- | p |}$ . The more general results are de Blanc 2007, de Blanc 2009 (LW discussion on the papers from 2007). We get this kind of divergence of expected utility on the prior if
1. the prior is bounded from below by a computable function and
2. the utility function is computable and unbounded
The next line of attack is to use the speed prior as a prior $2^{- | p | - log (t (p))}$ . That prior is not bounded by a computable function from below (because it grows slower than $1 / B B (n)$ for programs of length $n$ ), so we escape into one of de Blanc’s horns. (I don’t think having a computable lower bound is that important because K-complexity was never computable in the first place.)

But there’s an issue: What if our hypotheses output strings that are short, but are evaluated by our utility function as being high-value anyway? That is, the utility function takes in some short string of length $n$ and outputs as its utility $B B (n)$ . This is the case if the utility function itself is a program of some computational power, in the most extreme case the utility function is Turing-complete, and our hypotheses “parasitize” on this computational power of our utility function to be a Pascal’s mugging. So what we have to do is to also consider the computation of our utility function as being part of what’s penalized by the prior. That is,

$2^{- | p | - t (p) - t_{u} (eval (p))}$

for $t_{u} (eval (p))$ being the time it takes to run the utility function on the output of $p$ . I’ll call this the “reflective speed prior”. Note that if you don’t have an insane utility function which is Turing-complete, the speed penalty for evaluating the output of $p$ should be fairly low most of the time.

Pascal’s mugging can be thought of in two parts:
1. My expected utility diverges on priors, that is, not having observed any information or made any Bayesian updating, my utility can get arbitrarily big. I think this one is a problem.
2. My expected utility can diverge after updating on adversarially selected information. I think this case should be handled by improving your epistemology.
I claim that the reflective speed prior solves 1., but not 2. Furthermore, and this is the important thing, if you use the reflective speed prior, the expected utility is bounded on priors, but you can have arbitrarily high maximal expected utilities after performing Bayesian updating. So you get all the good aspects of having unbounded utility without having to worry about actually getting mugged (well, unless you have something controlling the evidence you observe, which is its own issue).

(Next steps: Reading the two de Blanc papers carefully, trying to suss out a proof, writing up the argument in more detail. Think/argue about what it means to update your prior in this strange way, and specifically penalizing hypotheses by how long it takes your utility function to evaluate them. Figure out which of these principles are violated. (On an initial read: Definitely Anti-Timidity.) Changing ones prior in a “superupdate” has been discussed here and here.)

Edit: Changed from penalizing the logarithm of runtime and utility-runtime to penalizing it linearly, after feedback from GPT-5.
1. ↩︎
  I like to live dangerously.
2. ↩︎
  Updated down from 40% after getting some feedback by GPT-5 on what the costs of this approach are.
What links here?
- Halfhaven halftime by Viliam (2 Nov 2025 21:29 UTC; 44 points)
- ryan_greenblatt 30 Oct 2025 16:52 UTC
  6 points
  0
  Parent
  Isn’t there already a kinda reasonable solution via something like UDASSA? See e.g. here (and this response to Joe’s objections here).
  - niplav 30 Oct 2025 21:14 UTC
    2 points
    0
    Parent
    My understanding is that UDASSA doesn’t give you unbounded utility, by virtue of directly assigning $U (eval (p)) \propto 2^{- | p |}$ , and the sum of utilities is proportional to $\sum_{i = 0}^{\infty} 2^{- i} = 2$ . The whole dance I did was in order to be able to have unbounded utilities. (Maybe you don’t care about unbounded utilities, in which case UDASSA seems like a fine choice.)
    
    (I think that the other horn of de Blanc’s proof is satisfied by UDASSA, unless the proportion of non-halting programs bucketed by simplicity declines faster than any computable function. Do we know this? “Claude!…”)
    
    Edit: Claude made up plausible nonsense, but GPT-5 upon request was correct, proportion of halting programs declines more slowly than some computable functions.
    
    Edit 2: Upon some further searching (and soul-searching) I think UDASSA is currently underspecified wrt whether its utility is bounded or unbounded. For example, the canonical explanation doesn’t mention utility at all, and none of the other posts about it mention how exactly utility is defined..
    - interstice 30 Oct 2025 22:21 UTC
      4 points
      0
      Parent
      The “UDASSA/UDT-like solution” is basically to assign some sort of bounded utility function to the output of various Turing machines weighted by a universal prior, like here. Although Wei Dai doesn’t specify that the preference function has to be bounded in that post, and he allows preferences over entire trajectories(but I think you should be able to do away with that by having another Turing machine running the first and evaluating any particular property of its trajectory)
      
      “Bounded utility function over Turing machine outputs weighted by simplicity prior” should recover your thing as a special case, actually, at least in the sense of having identical expected values. You could have a program which outputs 1 utility with probability 2^-[(log output of your utility turing machine) - (discount factor of your utility turing machine)]. That this is apparently also the same as Eliezer’s solution suggests there might be convergence on a unique sensible way to do EU maximization in a Turing-machine-theoretic mathematical multiverse.
      - niplav 30 Oct 2025 23:36 UTC
        4 points
        2
        Parent
        It’s a bit of a travesty there’s no canonical formal write-up of UDASSA, given all the talk about it. Ugh, TODO for working on this I guess.
- Jeremy Gillen 30 Oct 2025 16:04 UTC
  4 points
  0
  Parent
  Aren’t there programs that run fast and also return a number that grows much faster than |p|? Like up arrow notation. Why don’t these grow faster than your speed prior penalizes them?
  - niplav 30 Oct 2025 16:11 UTC
    4 points
    0
    Parent
    I think the upper bound here is set by a program “walking” along the tape as far as possible while setting the tape to $1$ and then setting a list bit before halting (thus creating the binary number $11 ⁿ$ where $n \leq B B (| p |)$ ^[1]). If we interpret that number as a utility, the utility is exponential in the number of steps taken, which is why we need to penalize by $2^{- steps (p)}$ instead of just $1 / steps (p)$ ^[2]. If you want to write $3 ↑↑↑ 3$ on the tape you have to make at least ${log}_{2} (3 ↑↑↑ 3)$ steps on a binary tape (and ${log}_{n} (3 ↑↑↑ 3)$ on an n-ary tape).
    
    ↩︎
    Technically the upper bound is $Σ (| p |)$ , the score function.
    
    ↩︎
    Thanks to GPT-5 for this point.
    - Jeremy Gillen 30 Oct 2025 16:44 UTC
      4 points
      0
      Parent
      Makes sense, but in that case, why penalize by time? Why not just directly penalize by utility? Like the leverage prior.
      Also, why not allow floating point representations of utility to be output? Rather than just binary integers?
      - niplav 30 Oct 2025 21:10 UTC
        4 points
        0
        Parent
        
        Makes sense, but in that case, why penalize by time? Why not just directly penalize by utility? Like the leverage prior.
        
        Huh. I find the post confusingly presented, but if I understand correctly, 15 logical inductor points to Yudkowsky₂₀₁₃—I think I invented the same concept from second principles.
        
        Let me summarize to understand: My speed prior on both the hypotheses and the utility functions is trying to emulate just discounting utility directly (because in the case of binary tapes and integers penalizing both for the exponential of speed gets you exactly an upper bound for the utility), and a cleaner way is to set the prior to $2^{- | p |} \cdot \frac{1}{U (eval(p))}$ . That avoids the “how do we encode numbers” question that naturally raises itself.
        
        Does that sound right?
        
        (The fact that I reinvented this looks like a good thing, since that indicates it’s a natural way out of the dilemma.)
        Jeremy Gillen 31 Oct 2025 6:44 UTC
        2 points
        0
        Parent
        Does that sound right?
        Can’t give a confident yes because I’m pretty confused about this topic, and I’m pretty unhappy currently with the way the leverage prior mixes up action and epistemics. The issue about discounting theories of physics if they imply high leverage seems really bad? I don’t understand whether the UDASSA thing fixes this. But yes.
        That avoids the “how do we encode numbers” question that naturally raises itself.
        I’m not sure how natural the encoding question is, there’s probably an AIT answer to this kind of question that I don’t know.