Steven Byrnes comments on Foom & Doom 2: Technical alignment is hard

Steven Byrnes 19 Sep 2025 19:48 UTC
2 points
0
Maybe study logical decision theory?
Eliezer has always been quite clear that you should one-box for Newcomb’s problem because then you’ll wind up with more money. The starting point for the whole discussion is a consequentialist preference—you have desires about the state of the world after the decision is over.
You have desires, and then decision theory tells you how to act so as to bring those desires about. The desires might be entirely about the state of the world in the future, or they might not be. Doesn’t matter. Regardless, whatever your desires are, you should use good decision theory to make decisions that will lead to your desires getting fulfilled.
Thus, decision theory is unrelated to our conversation here. I expect that Eliezer would agree.
To me it seems a bit surprising that you say we agree on the object level, when in my view you’re totally guilty of my 2.b.i point above of not specifying the tradeoff / not giving a clear specification of how decisions are actually made.
Your 2.a is saying “Steve didn’t write down a concrete non-farfuturepumping utility function, and maybe if he tried he would get stuck”, and yeah I already agreed with that.
Your 2.b is saying “Why can’t you have a utility function but also other preferences?”, but that’s a very strange question to me, because why wouldn’t you just roll those “other preferences” into the utility function as you describe the agent? Ditto with 2.c, why even bring that up? Why not just roll that into the agent’s utility function? Everything can always be rolled into the utility function. Utility functions don’t imply anything about behavior, and they don’t imply reflective consistency, etc., it’s all vacuous formalizing unless you put assumptions / constraints on the utility function.
- Towards_Keeperhood 19 Sep 2025 20:52 UTC
  1 point
  0
  Parent
  The purpose of studying LDT would be to realize that the type signature you currently imagine Steve::consequentialist preferences to have is different from the type signature that Eliezer would imagine.
  The starting point for the whole discussion is a consequentialist preference—you have desires about the state of the world after the decision is over.
  You can totally have preferences about the past that are still influenced by your decision (e.g. Parfit’s hitchhiker).
  Decisions don’t cause future states, they influence which worlds end up real vs counterfactual. Preferences aren’t over future states but over worlds—which worlds would you like to be more real?
  AFAIK Eliezer only used the word “consequentialism” in abstract descriptions of the general fact that you (usually) need some kind of search in order to find solutions to new problems. (Like I think just using a new word for what he used to call optimization.) Maybe he also used the outcome pump as an example, but if you asked him what how consequentialist preferences look like in detail, I’d strongly bet he’d say sth like preferences over worlds rather than preferences over states in the far future.