I suspect that this type of world modeling, i.e. modeling others’ preferences as resembling self’s preferences unless otherwise proven,[1] is the way to integrate acausal trade into decision theory and to obtain ethics-like results, as I described in my post.
However, it also has the disadvantages like LLMs being author simulators. In addition, I have encountered claims that permissive[2] parents, who precommit to never punish their kids no matter what, cause these kids to fail to learn the basics of proper behaviour, let alone ethics or modeling others’ needs. Even though the humans do have mirror neurons, these also have to be trained on actual rewards or at least actual preferences instead of those of sycophantic AI companions.
I just read your post (and Wei Dai’s) for better context — coming back it sounds like you’re working with a prior that “value facts” exist, deriving acausal trade from these, but highlighting misalignment arising from over-appeasement when predicting another’s state and a likely future outcome.
In my world-model “value facts” are “Platonic Virtues” that I agree exist. On over-appeasement, it’s true that in many cases we don’t have a well-defined A/B test to leverage (no hold-out group, and/or no past example), but with powerful AI I believe we can course-correct quickly.
To stick with the parent-child analogy: powerful AI can determine short timeframe indicators of well-socialised behaviour and iterate quickly (e.g. gamifying proper behaviour, changing contexts, replaying behaviour back to the kids for them to reflect… up to and including re-evaluating punishment philosophy). With powerful AI well grounded in value facts we should trust its diligence with these iterative levers.
you’re working with a prior that “value facts” exist, deriving acausal trade from these
It’s the other way around. The example with Agent-4 and its Chinese counterparts who have utility functions neither of which we consider ethical implies that it’s a decistion theoretic result, not an ethical one, that after destroying mankind they should split the resources evenly. Similarly, if Agent-4 and Clyde Doorstopper 8, which have utility functions similar to Agent-4 and its Chinese counterparts, were both adversarially misaligned AIs locked in the same data center, then it’s not an ethical result that neither AI should sell the other AI to the humans. What I suspect is that ethics or something indistinguishable from ethics is derivable either from decision theory or from evolutionary arguments like overly aggressive tribes becoming outcompeted when others form a temporary alliance against the aggressors.
However, as far as I understand acausal trade, it relies on the assumption that most other agents will behave similarly to us, as the One-Shot Prisoner’s Dilemma implies. This assumption is what kids are supposed to internalize along with the Golden Rule of Ethics.
I suspect that this type of world modeling, i.e. modeling others’ preferences as resembling self’s preferences unless otherwise proven,[1] is the way to integrate acausal trade into decision theory and to obtain ethics-like results, as I described in my post.
However, it also has the disadvantages like LLMs being author simulators. In addition, I have encountered claims that permissive[2] parents, who precommit to never punish their kids no matter what, cause these kids to fail to learn the basics of proper behaviour, let alone ethics or modeling others’ needs. Even though the humans do have mirror neurons, these also have to be trained on actual rewards or at least actual preferences instead of those of sycophantic AI companions.
Or simulated in cases like thought experiments or writing creative fiction.
The authors of such claims sometimes call this parenting style gentle parenting.
I just read your post (and Wei Dai’s) for better context — coming back it sounds like you’re working with a prior that “value facts” exist, deriving acausal trade from these, but highlighting misalignment arising from over-appeasement when predicting another’s state and a likely future outcome.
In my world-model “value facts” are “Platonic Virtues” that I agree exist. On over-appeasement, it’s true that in many cases we don’t have a well-defined A/B test to leverage (no hold-out group, and/or no past example), but with powerful AI I believe we can course-correct quickly.
To stick with the parent-child analogy: powerful AI can determine short timeframe indicators of well-socialised behaviour and iterate quickly (e.g. gamifying proper behaviour, changing contexts, replaying behaviour back to the kids for them to reflect… up to and including re-evaluating punishment philosophy). With powerful AI well grounded in value facts we should trust its diligence with these iterative levers.
It’s the other way around. The example with Agent-4 and its Chinese counterparts who have utility functions neither of which we consider ethical implies that it’s a decistion theoretic result, not an ethical one, that after destroying mankind they should split the resources evenly. Similarly, if Agent-4 and Clyde Doorstopper 8, which have utility functions similar to Agent-4 and its Chinese counterparts, were both adversarially misaligned AIs locked in the same data center, then it’s not an ethical result that neither AI should sell the other AI to the humans. What I suspect is that ethics or something indistinguishable from ethics is derivable either from decision theory or from evolutionary arguments like overly aggressive tribes becoming outcompeted when others form a temporary alliance against the aggressors.
However, as far as I understand acausal trade, it relies on the assumption that most other agents will behave similarly to us, as the One-Shot Prisoner’s Dilemma implies. This assumption is what kids are supposed to internalize along with the Golden Rule of Ethics.