I just read your post (and Wei Dai’s) for better context — coming back it sounds like you’re working with a prior that “value facts” exist, deriving acausal trade from these, but highlighting misalignment arising from over-appeasement when predicting another’s state and a likely future outcome.
In my world-model “value facts” are “Platonic Virtues” that I agree exist. On over-appeasement, it’s true that in many cases we don’t have a well-defined A/B test to leverage (no hold-out group, and/or no past example), but with powerful AI I believe we can course-correct quickly.
To stick with the parent-child analogy: powerful AI can determine short timeframe indicators of well-socialised behaviour and iterate quickly (e.g. gamifying proper behaviour, changing contexts, replaying behaviour back to the kids for them to reflect… up to and including re-evaluating punishment philosophy). With powerful AI well grounded in value facts we should trust its diligence with these iterative levers.
you’re working with a prior that “value facts” exist, deriving acausal trade from these
It’s the other way around. The example with Agent-4 and its Chinese counterparts who have utility functions neither of which we consider ethical implies that it’s a decistion theoretic result, not an ethical one, that after destroying mankind they should split the resources evenly. Similarly, if Agent-4 and Clyde Doorstopper 8, which have utility functions similar to Agent-4 and its Chinese counterparts, were both adversarially misaligned AIs locked in the same data center, then it’s not an ethical result that neither AI should sell the other AI to the humans. What I suspect is that ethics or something indistinguishable from ethics is derivable either from decision theory or from evolutionary arguments like overly aggressive tribes becoming outcompeted when others form a temporary alliance against the aggressors.
However, as far as I understand acausal trade, it relies on the assumption that most other agents will behave similarly to us, as the One-Shot Prisoner’s Dilemma implies. This assumption is what kids are supposed to internalize along with the Golden Rule of Ethics.
I just read your post (and Wei Dai’s) for better context — coming back it sounds like you’re working with a prior that “value facts” exist, deriving acausal trade from these, but highlighting misalignment arising from over-appeasement when predicting another’s state and a likely future outcome.
In my world-model “value facts” are “Platonic Virtues” that I agree exist. On over-appeasement, it’s true that in many cases we don’t have a well-defined A/B test to leverage (no hold-out group, and/or no past example), but with powerful AI I believe we can course-correct quickly.
To stick with the parent-child analogy: powerful AI can determine short timeframe indicators of well-socialised behaviour and iterate quickly (e.g. gamifying proper behaviour, changing contexts, replaying behaviour back to the kids for them to reflect… up to and including re-evaluating punishment philosophy). With powerful AI well grounded in value facts we should trust its diligence with these iterative levers.
It’s the other way around. The example with Agent-4 and its Chinese counterparts who have utility functions neither of which we consider ethical implies that it’s a decistion theoretic result, not an ethical one, that after destroying mankind they should split the resources evenly. Similarly, if Agent-4 and Clyde Doorstopper 8, which have utility functions similar to Agent-4 and its Chinese counterparts, were both adversarially misaligned AIs locked in the same data center, then it’s not an ethical result that neither AI should sell the other AI to the humans. What I suspect is that ethics or something indistinguishable from ethics is derivable either from decision theory or from evolutionary arguments like overly aggressive tribes becoming outcompeted when others form a temporary alliance against the aggressors.
However, as far as I understand acausal trade, it relies on the assumption that most other agents will behave similarly to us, as the One-Shot Prisoner’s Dilemma implies. This assumption is what kids are supposed to internalize along with the Golden Rule of Ethics.