Vivek S comments on Winning isn’t enough

Vivek S 18 Mar 2026 2:35 UTC
1 point
0
Thanks for the reply! I think you’re right to push back on Lukas’ point about incomparability in practice. But I (obviously) think there’s a question to be had about imprecise intervals in theory.
My wording was a bit confusing, but I meant to say that (a) is incomparable to (b) — and that a) is incomparable to (c) from one standpoint. In my formulation of the problem, (b) is exactly equal to (c) by simple cluelessness (unlikely in practice but plausible in theory).
I thought of two ways you can try to hold on to complex cluelessness here and why they both seem to struggle. How would you respond?
- Would you bite the bullet here and say that (a) and (c) (or, equivalently, (a) and (b)) are incomparable?
- - If so, we’re clueless about whether we should redirect alignment money when you have a negative update to worlds with a positive update — which I find very unintuitive.
- If you try bracketing away this cluelessness, then I’m also skeptical.
- - It seems arbitrary whether to bracket before the update on alignment research or after the update, since there’s complex cluelessness in both cases.
- Anthony DiGiovanni 18 Mar 2026 7:19 UTC
  3 points
  1
  Parent
  Thanks for clarifying! Back to your first comment then:
  Worlds a) and b) and worlds a) and c) are incomparable from one standpoint because you are still radically clueless about alignment research being good
  I’m still not sure I understand, sorry if I’m missing something basic. Let:
  (a*) = “recommend donating to alignment research [without doing research about the sign of alignment beforehand]”
  (b*) = “recommend not donating [without doing research about the sign of alignment beforehand]”
  As you say, Lukas’s argument implies (a) > (b). (Independently of whether (b) ~ (c).) This holds even if (a*) is incomparable with (b*) (that’s precisely Lukas’s point). I don’t see how “you are still radically clueless about alignment research being good” — i.e., (a*) is incomparable with (b*) — tells me that (a) is incomparable with (b) (or that (a) incomp (c)).
  - Vivek S 18 Mar 2026 9:36 UTC
    1 point
    0
    Parent
    Oops yeah, I believe you’re right. I got confused and I thought we had specified a) incomp b) but in reality we had only specified alignment research after a positive update is incomparable to b*).