Lukas Finnveden comments on Winning isn’t enough

Lukas Finnveden 3 Aug 2025 19:11 UTC
9 points
0
Basically, Dynamic Strong Maximality tells you which plans are permissible given your imprecise credences, and you just pick one.
This raises a question: Are your imprecise credences sufficiently wide that they permit plans which never take action to intentionally influence the future under any plausible update?
I think it seems significantly harder to argue “every plan is equally permissible w.r.t. total welfare” than to argue: “every action is equally permissible w.r.t. total welfare” — because the space of all plans is so large.
A simplified example:
- Let’s say I have a friend who is going to donate money to alignment research in one week, by default.
- They say that they’re interested in my advice.
  - If I don’t give any advice, they’ll donate $1000.
  - If I say “yeah it’s good”, they’ll donate $2000.
  - If I say “no it’s bad”, they’ll donate $0.
- Let’s say I can do research on whether it’s good or bad to donate money to alignment research. If I do so, I believe that:
  - I have a 50% chance of coming up with a positive consideration, that makes me think that it’s very slightly better to donate to alignment research than I thought before.
  - I have a 50% chance of coming up with a negative consideration, that makes me think that it’s very slightly worse to donate to alignment research than I thought before.
- Of course, these tiny updates would normally be drowned out by all my general background cluelessness. Even after making these updates, I’ll still be clueless about whether it’s good or bad to donate to alignment research.
- But consider the following plan: I say “yeah it’s good” if I find a positive consideration, and I say “not it’s bad” if I find a negative consideration. This plan might still be cluefully better than not giving any advice at all. Because I’m not changing the expected amount of money donated, I’m just “moving” $1000 from the hypothetical world where I came up with a negative consideration to the hypothetical world where I came up with a positive consideration.
- If so, dynamic maximality might:
  - Permit me to say “yeah it’s good”.
  - Permit me to say “no it’s bad”.
  - Permit the plan: say “yeah it’s good” if I come up with a positive consideration, and “no it’s bad” if I come up with a negative consideration.
  - But forbid me from not giving any advice at all.
Based on this example, I wonder if something like the following could be true: If your reason for cluelessness is that you have a ton of positive and negative considerations on each side of an issue, and you have no idea how to weigh them up, that’s not actually sufficient to make it permissible to not try to influence the long-run future. Because the plan of waiting until you learn more info, and being sensitive to whether further updates are positive or negative, might be cluefully better than not doing anything at all.
(Whereas, if your reason for cluelessness is that you can’t tell whether any given consideration is even a positive or negative consideration, that argument doesn’t seem to go through.)
- MichaelStJules 24 Aug 2025 1:09 UTC
  3 points
  0
  Parent
  Would some version of this still work if you have imprecise credences about the signs (and magnitudes) of considerations you’ll come up with, rather than 50-50 (or some other ratio)?
  Even if not 50-50 but precise, we could adjust the donation amounts to match the probabilities and maintain the expected amount donated at $1000.
  But if the probabilities are imprecise, I don’t think we can (precisely) maintain the expected donation amounts. We could pick donation amounts such that $1000, <$1000 and >$1000 are all possible expected donation amounts, in our set of credences (representor).
- Anthony DiGiovanni 19 Aug 2025 22:57 UTC
  3 points
  0
  Parent
  Thanks, this was thought-provoking. I feel confused about how action-relevant this idea is, though.
  For one, let’s grant that (a) “researching considerations + basing my recommendation on the direction of the considerations” > (b) “researching considerations + giving no recommendation”. This doesn’t tell me how to compare (a) “researching considerations + basing my recommendation on the direction of the considerations” vs. (c) “not doing research”. Realistically, the act of “doing research” would have various messy effects relative to, say, doing some neartermist thing — so I’d think (a) is incomparable with (c). (More on this here.)
  But based on the end of your comment, IIUC you’re conjecturing that we can compare plans based on a similar idea to your example even if no “research” is involved, just passively gaining info. If so:
  - It seems like this wouldn’t tell me to change anything about what I work on in between times when someone asks for my recommendation.
  - Suppose I recommend that someone do more of [intervention that I’ve positively updated on]. Again, their act of investing more in that intervention will presumably have lots of messy side effects, besides “more of the intervention gets implemented” in the abstract. So I should only be clueful that this plan is better if I’ve “positively updated” on the all-things-considered set of effects of this person investing more in that intervention. (Intuitively this seems like an especially high bar.)
  - Lukas Finnveden 20 Aug 2025 5:38 UTC
    2 points
    0
    Parent
    Huh, yeah, I suppose that side-effects from research could swing it. I’m not really sure how to analyze this.
    Here’s a bit of an expanded line of thinking/intuition, following the same thread as I started on in my previous comment, but getting a bit less concrete and a bit more speculative. It’s an argument for why this kind of thing will (maybe) rule out some policies as impermissible:
    In order for you to not suffer sure-losses, you must be representable as an EV-maximizing agent, which means that there must be some beliefs for which your entire policy (your mapping from all possible future observations, to all possible actions) is optimal w.r.t. total welfare.
    (Of course, as bounded agents, even if we try hard to be EV-maximizing-w.r.t.-precise-credences, we’ll be suffering sure-losses as viewed from the perspective of epistemically superior agents. Some of our beliefs are contradictory, etc. So what we really want here is some milder version of this criterion that admits to the limitation of bounded agents, but still condemns inefficiencies that we are capable of avoiding.)
    An intuitively compelling criterion is: these precise beliefs (which you are representable as holding) are within the bounds of your imprecise credences.
    I’m not sure if this criterion follows naturally from dynamic strong maximality, or if it’s a plausible “extra criterion” to add on top, or if you think it’s a bad criterion that we shouldn’t assume.
    For the rest of this comment, I’ll assume it.
    But when I picture myself living a life under cluelessness about impartial long-run welfare, as I currently intuitively understand such lives to look like (doing stuff like “working on animal welfare”, or “caring for friends and family”, or “researching/advocating for cluelessness”), and in particular, when I picture the big complex object that is the policy of a person doing these sorts of actions — including all the ways they could react to all possible information — it’s really unclear whether even severly imprecise credences should reasonably include any set of beliefs under which that policy is optimal w.r.t. total welfare.
    Intuitively, it seems like that might require oddly contorted beliefs. For example, for some reason, your actions change hugely when you learn about details of slaughter methods for shrimp, but change not at all when you learn what-seems-like-it-ought-to-be highly relevant information about the long-run consequences of your actions. Or perhaps your actions change a lot when you discover a new consideration for how to do reasoning under cluelessness. Are there really any “reasonable” precise beliefs (i.e. within the bounds of your imprecise credences) that prescribe that precise policy?
    (One possible direction to explore: What’s the “size” of “the set of all reasonable precise credences”, and what’s the “size” of “the set of all possible policies”. If the former is vastly larger than the latter, then maybe it doesn’t seem so unlikely that there will always be some set of “reasonable credences” that recommends any particular policy. But if not, then constraining yourself to only follow policies that are recommended by some possible “reasonable credences” will probably significantly limit the set of policies that are permissible to choose.)
    - Anthony DiGiovanni 23 Aug 2025 14:04 UTC
      2 points
      0
      Parent
      Thanks for explaining!
      An intuitively compelling criterion is: these precise beliefs (which you are representable as holding) are within the bounds of your imprecise credences.
      I think this is the step I reject. By hypothesis, I don’t think the coherence arguments show that the precise distribution P that I can be represented as optimizing w.r.t. corresponds to (reasonable) beliefs. P is nothing more than a mathematical device for representing some structure of behavior. So I’m not sure why I should require that my representor — i.e., the set of probability distributions that would be no less reasonable than each other if adopted as beliefs^[1] — contains P.
      ^
      I’m not necessarily committed to this interpretation of the representor, but for the purposes of this discussion I think it’s sufficient.
      - Lukas Finnveden 23 Aug 2025 16:22 UTC
        2 points
        0
        Parent
        I think I maybe figured out how to show that P must be in the representor.
        You ought to assign non-0 probability that you’ll be asked to bet on arbitrary questions. In order to not have your policy be dominated, dynamic maximality will require that you commit in advance to the odds that you’d be on (after seeing arbitrary evidence). Clearly you should be on odds P. And it’s only permissible to bet at odds that are inside your representor.
        (Now strictly speaking, there are some nuances about what kind of questions you can be convinced that you’ll be betting on, given that some of them might be quite hard to measure/verify even post-hoc. But since we’re just talking about non-0 probability of being convinced that you’re really betting on a question, I don’t think this should be too restrictive. And even non-”pure” bets, that only indirectly gets at some question q, will contribute to forcing P’s belief in q inside of your representor, I think.)
        Anthony DiGiovanni 23 Aug 2025 20:14 UTC
        2 points
        0
        Parent
        Sorry, I don’t understand the argument yet. Why is it clear that I should bet on odds P, e.g., if P is the distribution that the CCT says I should be represented by?
        Lukas Finnveden 23 Aug 2025 20:18 UTC
        2 points
        0
        Parent
        Because you couldn’t be represented as being an EV-maximizer with beliefs P if you were betting using some odds other than P. Because that would lead to lower expected value. (Assuming that pay-offs are going to be proportional to some proper scoring rule.)
        Anthony DiGiovanni 25 Aug 2025 10:46 UTC
        2 points
        0
        Parent
        Oops, right. I think what’s going on is:
        “It’s only permissible to bet at odds that are inside your representor” is only true if the representor is convex. If my credence in some proposition X is, say, P(X) = (0.2, 0.49) U (0.51, 0.7), IIUC it’s permissible to bet at 0.5. I guess the claim that’s true is “It’s only permissible to bet at odds in the convex hull of your representor”.
        But I’m not aware of an argument that representors should be convex in general.
        If there is such an argument, my guess is that the way things would work is: We start with the non-convex set of distributions that seem no less reasonable than each other, and then add in whichever other distributions are needed to make it convex. But there would be no particular reason we’d need to interpret these other distributions as “reasonable” precise beliefs, relative to the distributions in the non-convex set we started with.
        And, the kind of precise distribution P that would rationalize e.g. working on shrimp welfare seems to be the analogue of “betting at 0.5” in my example above. That is:
        Our actual “set of distributions that seem no less reasonable than each other” would include some distributions that imply large positive long-term EV from working on shrimp welfare, and some that imply large negative long-term EV.
        Whereas the distributions like P that imply vanishingly small long-term EV — given any evidence too weak to resolve our cluelessness w.r.t. long-term welfare — would lie in the convex hull. So betting at odds P would be permissible, and yet this wouldn’t imply that P is “reasonable” as precise beliefs.
- Lukas Finnveden 3 Aug 2025 22:50 UTC
  2 points
  0
  Parent
  Btw, thinking about this sort of example also serves as a bit of an intuition pump (to me) against the philosophy where you have imprecise credences, use maximality to restrict your option set, and then pick on the basis of some other criterion. For example, let’s say that your other criterion would prefer “give no advice” > “saying ‘yeah it’s good’ ” > “give conditional advice”. It feels real weird to exclude “give no advice” because it’s dominated, and then instead move to “saying ‘yeah it’s good’”, which is still incomparable to “give no advice” and less preferred according to your decision criterion. It doesn’t feel like the kind of thing a rational agent would do. (I guess it violates independence of irrelevant alternatives, for one thing.)
  I guess this isn’t unique to the dynamic rationality thing. You can construct much simpler examples where A dominates B, but they’re both incomparable to C, and you have some less-important decision-rule that prefers B > C > A. Probably you’ll already have thought a lot about these cases, so I don’t expect it to be convincing to you. Just reporting an intuition that pushes me away from imprecise credences + maximality.
  - Anthony DiGiovanni 11 Aug 2025 12:31 UTC
    2 points
    0
    Parent
    Sorry this wasn’t clear: In the context of this post, when we endorsed “use maximality to restrict your option set, and then pick on the basis of some other criterion”, I think we were implicitly restricting to the special case where {permissible options w.r.t. the other criterion} ⊆ {permissible options w.r.t. consequentialism}. If that doesn’t hold, it’s not obvious to me what to do.
    Regardless, it’s not clear to me what alternative you’d propose in this situation that’s less weird than choosing “saying ‘yeah it’s good’”. (In particular I’m not sure if you’re generally objecting to incomplete preferences per se, or to some way of choosing an option given incomplete preferences (w.r.t. consequentialism).)
    - Lukas Finnveden 11 Aug 2025 18:23 UTC
      2 points
      0
      Parent
      Ah, that’s a helpful clarification.
      In particular I’m not sure if you’re generally objecting to incomplete preferences per se, or to some way of choosing an option given incomplete preferences (w.r.t. consequentialism)
      I was thinking at least a bit of both. I find the case for imprecise credences to be more compelling if they come with a decision-rule that seems reasonable to me.