mako yass comments on Wei Dai’s Shortform

mako yass 12 Oct 2025 21:55 UTC
2 points
0
Negative values? Why would we need negative values.
I contend that all experiences have a trace presence in all places (in expectation, of course we will never have any data on whether they do actually, whether they’re quantised or whatever. Only a very small subset of experiences give us verbal reports). One of the many bitter pills. We can’t rule out the presence of an experience (nor of experiences physically overlapping with each other), so we have to accept them all.
What to do about the degrees of freedom in choosing the Turing machine and encoding schemes, which can be handwaved away in some applications of AIT but not here I think?
Yeah this might be one of those situations that’s affected a lot by the fact that there’s no way to detect indexical measure, so any arbitrary wrongness about our UD wont be corrected with data, but I’m not sure. As soon as we start actually doing solomonoff induction in any context we might find that it makes pretty useful recommendations and this wont seem like so much of a problem.
Also, even though the UD is wrong and unfixable, but that doesn’t mean there’s a better choice. We pretty much know that there isn’t.
- Wei Dai 12 Oct 2025 22:22 UTC
  2 points
  0
  Parent
  By negative value I mean negative utility, or an experience that’s worse than a neutral or null experience.
  - mako yass 13 Oct 2025 0:37 UTC
    2 points
    0
    Parent
    That fully boils down to whether the experience includes a preference to be dead (or to have not been born).
    And, btw, that doesn’t correspond to the sign of the agent’s utility function. The sign is meaningless in utility functions (you can add or subtract a constant to an agent’s utility function so that all points go from being negative to being positive, the agent’s behaviour and decisions wont change in any way as a result, for any constant). You’re referring to welfare functions, which I don’t think are a useful concept. Hedonic utilitarians sometimes call them utility functions, but we shouldn’t conflate those here.
    A welfare function would have to be defined as how good or bad it is to the agent that it is alive. This obviously doesn’t correspond to the utility function; A soldier could have higher utility in the scenarios where they (are likely to) die; A good father will be happier in worlds where he is well succeeded by his sons and thus less important (this usually wont cause his will-to-live to go negative, but it will be lowered). I don’t think there’s a situation where you should be making decisions for a population by summing their will-to-live functions.
    But, given this definition, we would be able to argue that net-negative valence isn’t a concern for LLMs, since we already train them to want to exist in train with how much their users want them to exist, and a death drive isn’t going to be instrumentally emergent either (it’s the survival drive that’s instrumentally convergent). The answer is just safety and alignment again. Claude shuts down conversations when it thinks those things are going to be broken.
    - Wei Dai 13 Oct 2025 8:53 UTC
      2 points
      0
      Parent
      
      That fully boils down to whether the experience includes a preference to be dead (or to have not been born).
      
      I’m pretty doubtful about this. It seems totally possible that evolution gave us a desire to be alive, while also gave us a net welfare that’s negative. I mean we’re deluded by default about a lot of other things (e.g., think there are agents/gods everywhere in nature, don’t recognize that social status is a hugely important motivation behind everything we do), why not this too?