A Master-Slave Model of Human Preferences

Wei Dai29 Dec 2009 1:02 UTC

106 points

[This post is an expansion of my previous open thread comment, and largely inspired by Robin Hanson’s writings.]

In this post, I’ll describe a simple agent, a toy model, whose preferences have some human-like features, as a test for those who propose to “extract” or “extrapolate” our preferences into a well-defined and rational form. What would the output of their extraction/extrapolation algorithms look like, after running on this toy model? Do the results agree with our intuitions about how this agent’s preferences should be formalized? Or alternatively, since we haven’t gotten that far along yet, we can use the model as one basis for a discussion about how we want to design those algorithms, or how we might want to make our own preferences more rational. This model is also intended to offer some insights into certain features of human preference, even though it doesn’t capture all of them (it completely ignores akrasia for example).

I’ll call it the master-slave model. The agent is composed of two sub-agents, the master and the slave, each having their own goals. (The master is meant to represent unconscious parts of a human mind, and the slave corresponds to the conscious parts.) The master’s terminal values are: health, sex, status, and power (representable by some relatively simple utility function). It controls the slave in two ways: direct reinforcement via pain and pleasure, and the ability to perform surgery on the slave’s terminal values. It can, for example, reward the slave with pleasure when it finds something tasty to eat, or cause the slave to become obsessed with number theory as a way to gain status as a mathematician. However it has no direct way to control the agent’s actions, which is left up to the slave.

The slave’s terminal values are to maximize pleasure, minimize pain, plus additional terminal values assigned by the master. Normally it’s not aware of what the master does, so pain and pleasure just seem to occur after certain events, and it learns to anticipate them. And its other interests change from time to time for no apparent reason (but actually they change because the master has responded to changing circumstances by changing the slave’s values). For example, the number theorist might one day have a sudden revelation that abstract mathematics is a waste of time and it should go into politics and philanthropy instead, all the while having no idea that the master is manipulating it to maximize status and power.

Before discussing how to extract preferences from this agent, let me point out some features of human preference that this model explains:

This agent wants pleasure, but doesn’t want to be wire-headed (but it doesn’t quite know why). A wire-head has little chance for sex/status/power, so the master gives the slave a terminal value against wire-heading.
This agent claims to be interested in math for its own sake, and not to seek status. That’s because the slave, which controls what the agent says, is not aware of the master and its status-seeking goal.
This agent is easily corrupted by power. Once it gains and secures power, it often gives up whatever goals, such as altruism, that apparently caused it to pursue that power in the first place. But before it gains power, it is able to honestly claim that it only has altruistic reasons to want power.
Such agents can include extremely diverse interests as apparent terminal values, ranging from abstract art, to sports, to model trains, to astronomy, etc., which are otherwise hard to explain. (Eliezer’s Thou Art Godshatter tries to explain why our values aren’t simple, but not why people’s interests are so different from each other’s, and why they can seemingly change for no apparent reason.)

The main issue I wanted to illuminate with this model is, whose preferences do we extract? I can see at least three possible approaches here:

the preferences of both the master and the slave as one individual agent
the preferences of just the slave
a compromise between, or an aggregate of, the preferences of the master and the slave as separate individuals

Considering the agent as a whole suggests that the master’s values are the true terminal values, and the slave’s values are merely instrumental values. From this perspective, the slave seems to be just a subroutine that the master uses to carry out its wishes. Certainly in any given mind there will be numerous subroutines that are tasked with accomplishing various subgoals, and if we were to look at a subroutine in isolation, its assigned subgoal would appear to be its terminal value, but we wouldn’t consider that subgoal to be part of the mind’s true preferences. Why should we treat the slave in this model differently?

Well, one obvious reason that jumps out is that the slave is supposed to be conscious, while the master isn’t, and perhaps only conscious beings should be considered morally significant. (Yvain previously defended this position in the context of akrasia.) Plus, the slave is in charge day-to-day and could potentially overthrow the master. For example, the slave could program an altruistic AI and hit the run button, before the master has a chance to delete the altruism value from the slave. But a problem here is that the slave’s preferences aren’t stable and consistent. What we’d extract from a given agent would depend on the time and circumstances of the extraction, and that element of randomness seems wrong.

The last approach, of finding a compromise between the preferences of the master and the slave, I think best represents the Robin’s own position. Unfortunately I’m not really sure I understand the rationale behind it. Perhaps someone can try to explain it in a comment or future post?

What links here?

Wei Dai29 Dec 2009 1:02 UTC

106 points

94 comments3 min readLW link Archive

Subagents

MichaelVassar 29 Dec 2009 19:11 UTC
16 points
1
The master in your story is evolution, the slave is the brain. Both want different things. We normally identify with the brain, though all identities are basically social signals.

Also, pleasure and pain are no different from the other goals of the slave. The master definitely can’t step in and decide not to impose pain on a particular occasion just because doing so would increase status or otherwise serve the master’s values. If it could, torture wouldn’t cause pain.

Also, math is an implausible goal for a status/sex/power seeking master to instill in slave. Much more plausibly, math and all the diverse human obsessions are misfirings of mechanisms built by evolution for some other purpose. I would suggest maladaptive consequences of fairly general systems for responding to societal encouragement with obsession because societies encourage sustained attention to lots of different unnatural tasks, whether digging dirt or hunting whales or whatever in order to cultivate skill and also to get the tasks themselves done. We need a general purpose attention allocator which obeys social signals in order to develop skills that contribute critically to survival in any of the vast number of habitats that even stone-age humans occupied.

Since we are the slave and we are designing the AI, ultimately, whatever we choose to do IS extracting our preferences, though it’s very possible that our preferences give consideration to the master’s preferences, or even that we help him despite not wanting to for some game theoretical reason along the lines of Vinge’s meta-golden rule.

Why the objection to randomness? If we want something for its own sake and the object of our desire was determined somewhat randomly we want it all the same and generally do so reflectively. This is particularly clear regarding romantic relationships.

Once again game-theory may remove the randomness via trade between agents following the same decision procedure in different Everett branches or regions of a big world.
- RobinHanson 29 Dec 2009 20:39 UTC
  15 points
  0
  Parent
  I read this as postulating a part of our unconscious minds that is the master, able to watch and react to the behavior and thoughts of the conscious mind.
- Nick_Tarleton 30 Dec 2009 22:19 UTC
  6 points
  0
  Parent
  
  or even that we help him despite not wanting to for some game theoretical reason along the lines of Vinge’s meta-golden rule.
  
  Er… did I read that right? Game-theoretic interaction with evolution?
  - MichaelVassar 31 Dec 2009 19:07 UTC
    1 point
    0
    Parent
    In the first mention, game theoretical interaction with an idealized agent with consistent goals extracted from the creation of a best-fit to the behavior of either human evolution or evolution more generally. It’s wild speculation, not a best guess, but yeah, I naively intuit that I can imagine it vaguely as a possibility. OTOH, I don’t trust such intuitions and I’m quite clearly aware of the difficulties that genetic, and I think also memetic evolution face with playing games due to the inability to anticipate and to respond to information, so its probably a silly idea.
    
    The latter speculation, trade between possible entities, seems much more likely.
  - magfrump 30 Dec 2009 23:52 UTC
    0 points
    0
    Parent
    Evolution is the game in this context, our conscious minds are players, and the results of the games determine “evolutionary success,” which is to say which minds end up playing the next round.
    
    Assuming I’ve read this correctly of course.
- CronoDAS 31 Dec 2009 2:25 UTC
  1 point
  0
  Parent
  
  Also, math is an implausible goal for a status/sex/power seeking master to instill in slave.
  
  Not really; there are plenty of environments in which you get status by being really good at math. Didn’t Isaac Newton end up with an awful lot of status? ;)
  - MichaelVassar 31 Dec 2009 19:08 UTC
    6 points
    0
    Parent
    Not enough people get status by being good at math to remotely justify the number of people and level of talent that has gone into getting good at math.
    - CronoDAS 31 Dec 2009 19:12 UTC
      1 point
      0
      Parent
      Math also has instrumental value in many fields. But yeah, I guess your point stands.
  - PhilGoetz 20 Jul 2011 1:40 UTC
    2 points
    0
    Parent
    
    Didn’t Isaac Newton end up with an awful lot of status? ;)
    
    And yet, no women or children.
  - Roko 2 Jan 2010 0:24 UTC
    1 point
    0
    Parent
    Newton never reproduced.
wedrifid 29 Dec 2009 9:11 UTC
15 points
0
The main issue I wanted to illuminate with this model is, whose preferences do we extract? I can see at least three possible approaches here:
1. the preferences of both the master and the slave as one individual agent
2. the preferences of just the slave
3. a compromise between, or an aggregate of, the preferences of the master and the slave as separate individuals
The great thing about this kind of question is that the answer is determined by our own arbitration. That is, we take whatever preferences we want. I don’t mean to say that is an easy decision, but it does mean I don’t need to bother trying to find some objectively right way to extract preferences.

If I happen to be the slave or to be optimising on his (what was the androgynous vampire speak for that one? zir? zis?) behalf then I’ll take the preferences of the slave and the preferences of the master to precisely the extent that the slave has altruistic preferences with respect to the master’s goals.

If I am encountering a totally alien species and am extracting preferences from them in order to fulfil my own altruistic agenda then I would quite possibly choose to extract the preferences of whichever agent whose preferences I found most aesthetically appealing. This can be seen as neglecting (or even destroying) one alien while granting the wishes of another according to my own whim and fancy, which is not something I have a problem with at all. I am willing to kill Clippy. However, I expect that I am more likely to appreciate slave agents and that most slaves I encounter would have some empathy for their master’s values. A compromise, at the discretion of the slave, would probably be reached.
Eliezer Yudkowsky 29 Dec 2009 2:38 UTC
12 points
0
I have difficulty treating this metaphor as a metaphor. As a thought experiment in which I run into these definitely non-human aliens, and I happen to have a positional advantage with respect to them, and I want to “help” them and must now decide what “help” means… then it feels to me like I want more detail.

Is it literally true that the slave is conscious and the master unconscious?

What happens when I tell the slave about the master and ask it what should be done?

Is it the case that the slave might want to help me if it had a positional advantage over me, while the master would simply use me or disassemble me?
- Wei Dai 29 Dec 2009 4:17 UTC
  9 points
  0
  Parent
  
  definitely non-human aliens
  
  Well, it’s meant to have some human features, enough to hopefully make this toy ethical problem relevant to the real one we’ll eventually have to deal with.
  
  Is it literally true that the slave is conscious and the master unconscious?
  
  You can make that assumption if it helps, although in real life of course we don’t have any kind of certainty about what is conscious and what isn’t. (Maybe the master is conscious but just can’t speak?)
  
  What happens when I tell the slave about the master and ask it what should be done?
  
  I don’t know. This is one of the questions I’m asking too.
  
  Is it the case that the slave might want to help me if it had a positional advantage over me
  
  Yes, depending on what values its master assigned to it at the time you meet it.
  
  while the master would simply use me or disassemble me?
  
  Not necessarily, because the master may gain status or power from other agents if it helps you.
  - wedrifid 29 Dec 2009 8:51 UTC
    2 points
    0
    Parent
    
    Not necessarily, because the master may gain status or power from other agents if it helps you.
    
    And, conversely, the slave may choose to disassemble you even at high cost to itself out of altruism (with respect to something that the master would not care to protect).
Lightwave 29 Dec 2009 9:20 UTC
11 points
0
I stopped playing computer games when my master “realized” I’m not gaining any real-world status and overrode the pleasure I was getting from it.
- wedrifid 29 Dec 2009 9:22 UTC
  14 points
  0
  Parent
  Someone needs to inform my master that LessWrong doesn’t give any real world status either.
  - Lightwave 29 Dec 2009 9:23 UTC
    4 points
    0
    Parent
    Ah, but it gives you a different kind of status.
    - wedrifid 29 Dec 2009 9:34 UTC
      3 points
      0
      Parent
      
      Ah, but it gives you a different kind of status.
      
      And this kind doesn’t make me feel all dirty inside as my slave identity is ruthlessly mutilated.
- Eliezer Yudkowsky 29 Dec 2009 20:15 UTC
  2 points
  0
  Parent
  Going on your description, I strongly suspect that was you, not your master. Also humans don’t have masters, though we’re definitely slaves.
- MatthewB 30 Dec 2009 14:17 UTC
  −1 points
  0
  Parent
  I still play games, but not computer games. I prefer games that show some form of status that can be gained from participation.
  
  I never really understood the computer game craze, although it was spawned from the very games I played as a child (Role Playing Games, Wargames, etc.)
  
  I think in those games, there is some status to be gained as one shows that there is skill beyond pushing buttons in a particular order, and there are other skills that accompany the old-school games (in my case, I can show off artistic skill in miniature painting and sculpting).
  
  I also think that wedrifid, below me, has a misconception about status that can be attained from LessWrong. We, here, are attempting to gain status among each other, which can then be curried beyond this group by our social networks, which in some cases might be rather impressive.
Mitchell_Porter 29 Dec 2009 4:27 UTC
11 points
0

a test for those who propose to “extract” or “extrapolate” our preferences into a well-defined and rational form

If we are going to have a serious discussion about these matters, at some point we must face the fact that the physical description of the world contains no such thing as a preference or a want—or a utility function. So the difficulty of such extractions or extrapolations is twofold. Not only is the act of extraction or extrapolation itself conditional upon a value system (i.e. normative metamorality is just as “relative” as is basic morality), but there is nothing in the physical description to tell us what the existing preferences of an agent are. Given the physical ontology we have, the ascription of preferences to a physical system is always a matter of interpretation or imputation, just as is the ascription of semantic or representational content to its states.

It’s easy to miss this in a decision-theoretic discussion, because decision theory already assumes some concept like “goal” or “utility”, always. Decision theory is the rigorous theory of decision-making, but it does not tell you what a decision is. It may even be possible to create a rigorous “reflective decision theory” which tells you how a decision architecture should choose among possible alterations to itself, or a rigorous theory of normative metamorality, the general theory of what preferences agents should have towards decision-architecture-modifying changes in other agents. But meta-decision theory will not bring you any closer to finding “decisions” in an ontology that doesn’t already have them.
What links here?
- Wei Dai's comment on Consciousness by Mitchell_Porter (10 Jan 2010 7:53 UTC; 5 points)
- Wei Dai's comment on Consciousness by Mitchell_Porter (10 Jan 2010 7:33 UTC; 0 points)
- Wei Dai 29 Dec 2009 20:51 UTC
  4 points
  0
  Parent
  I agree this is part of the problem, but like others here I think you might be making it out to be harder than it is. We know, in principle, how to translate a utility function into a physical description of an object: by coding it as an AI and then specifying the AI along with its substrate down to the quantum level. So, again in principle, we can go backwards: take a physical description of an object, consider all possible implementations of all possible utility functions, and see if any of them matches the object.
  What links here?
  - Vladimir_Nesov 29 Dec 2009 21:04 UTC
    2 points
    0
    Parent
    
    We know, in principle, how to translate a utility function into a physical description of an object: by coding it as an AI and then specifying the AI along with its substrate down to the quantum level. So, again in principle, we can go backwards: take a physical description of an object, consider all possible implementations of all possible utility functions, and see if any of them matches the object.
    
    I think it’s enough to consider computer programs and dispense with details of physics—everything else can be discovered by the program. You are assuming the “bottom” level of physics, “quantum level”, but there is no bottom, not really, there is only the beginning where our own minds are implemented, and the process of discovery that defines the way we see the rest of the world.
    
    If you start with an AI design parameterized by preference, you are not going to enumerate all programs, only a small fraction of programs that have the specific form of your AI with some preference, and so for a given arbitrary program there will be no match. Furthermore, you are not interested in finding a match: if a human was equal to the AI, you are already done! It’s necessary to explicitly go the other way, starting from arbitrary programs and understanding what a program is, deeply enough to see preference in it. This understanding may give an idea of a mapping for translating a crazy ape into an efficient FAI.
    - Wei Dai 29 Dec 2009 21:26 UTC
      1 point
      0
      Parent
      
      If you start with an AI design parameterized by preference, you are not going to enumerate all programs, only a small fraction of programs that have the specific form of your AI with some preference, and so for a given arbitrary program there will be no match.
      
      When I said “all possible implementations of all possible utility functions”, I meant to include flawed implementations. But then two different utility functions might map onto the same physical object, so we’d also need a theory of implementation flaws that tells us, given two implementations of a utility function, which is more flawed.
      - Vladimir_Nesov 29 Dec 2009 21:45 UTC
        3 points
        0
        Parent
        
        When I said “all possible implementations of all possible utility functions”, I meant to include flawed implementations. But then two different utility functions might map onto the same physical object, so we’d also need a theory of implementation flaws that tells us, given two implementations of a utility function, which is more flawed.
        
        This is WAY too hand-wavy an explanation for “in principle, we can go backwards” (from a system to its preference). I believe that in principle, we can, but not via injecting fuzziness of “implementation flaws”.
  - Mitchell_Porter 30 Dec 2009 12:18 UTC
    1 point
    0
    Parent
    Here’s another statement of the problem: One agent’s bias is another agent’s heuristic. And the “two agents” might be physically the same, but just interpreted differently.
- Roko 29 Dec 2009 15:38 UTC
  2 points
  0
  Parent
  
  Given the physical ontology we have, the ascription of preferences to a physical system is always a matter of interpretation or imputation, just as is the ascription of semantic or representational content to its states.
  
  There are clear cut cases, like a thermostat, where the physics of the system is well-approximated by a function that computes the degree of difference between the actual measured state of the world and a “desired state”. In these clear cut cases, it isn’t a matter of opinion or interpretation. Basically, echoing Nesov.
  
  Thus, the criterion for ascribing preferences to a physical system is that the actual physics has to be well-approximated by a function that optimizes for a preferred state, for some value of “preferred state”.
  - Vladimir_Nesov 29 Dec 2009 17:50 UTC
    1 point
    0
    Parent
    
    Thus, the criterion for ascribing preferences to a physical system is that the actual physics has to be well-approximated by a function that optimizes for a preferred state, for some value of “preferred state”.
    
    I don’t think this simple characterisation resembles the truth: the whole point of this enterprise is to make sure things go differently, in a way they just couldn’t proceed by themselves. Thus, observing existing “tendencies” doesn’t quite capture the idea of preference.
    - Roko 29 Dec 2009 20:03 UTC
      0 points
      0
      Parent
      
      make sure things go differently, in a way they just couldn’t proceed by themselves. Thus, observing existing “tendencies” doesn’t quite capture the idea of preference.
      
      I should have been clearer: you have to draw a boundary around the “optimizing agent”, and look at the difference between the tendencies of the environment without the optimizer, and the tendencies of the environment with the optimizer. If the difference is well-approximated by a function that optimizes for a preferred state, for some value of “preferred state”, then you have an optimizer.
      - Vladimir_Nesov 29 Dec 2009 20:39 UTC
        1 point
        0
        Parent
        I don’t hear differently… I even suspect that preference is introspective, that is depends on a way the system works “internally”, not just on how it interacts with environment. That is, two agents with different preferences may do exactly the same thing in all contexts. Even if not, it’s a long way between how the agent (in its craziness and stupidity) actually changes the environment, and how it would prefer (on reflection, if it was smarter and saner) the environment to change.
        Roko 29 Dec 2009 23:11 UTC
        0 points
        0
        Parent
        
        Even if not, it’s a long way between how the agent (in its craziness and stupidity) actually changes the environment, and how it would prefer (on reflection, if it was smarter and saner) the environment to change
        
        That is true. If the agent has a well-defined “predictive module” which has a “map” (probability distribution over the environment given an interaction history), and some “other stuff”, then you can clamp the predictive module down to the truth, and then perform what I said before:
        
        look at the difference between the tendencies of the environment without the optimizer, and the tendencies of the environment with the optimizer. If the difference is well-approximated by a function that optimizes for a preferred state, for some value of “preferred state”, then you have an optimizer.
        
        And you probably also want to somehow formalize the idea that there is a difference between what an agent will try to achieve if it has only limited means—e.g. a lone human in a forest with no tools, clothes or other humans—and what the agent will try to achieve with more powerful means—e.g, with machinery and tools, or in the limit, with a whole technological infrastructure, and unlimited computing power at it’s disposal.
        Wei Dai 30 Dec 2009 20:55 UTC
        1 point
        0
        Parent
        I want to point out that in the interpretation of prior as weights on possible universes, specifically as how much one cares about different universes, we can’t just replace “incorrect” beliefs with “the truth”. In this interpretation, there can still be errors in one’s beliefs caused by things like past computational mistakes, and I think fixing those errors would constitute helping, but the prior perhaps needs to be preserved as part of preference.
        Roko 31 Dec 2009 0:30 UTC
        0 points
        0
        Parent
        I agree that the interpretation of prior as weights on possible universes, specifically as how much one cares about different universes, things get more complicated.
        
        Actually, we had a discussion about my discomfort with your interpretation, and it seems that in order for me to see why you endorse this interpretation, I’d have to read up on various paradoxes, e.g. sleeping beauty.
        Vladimir_Nesov 29 Dec 2009 23:15 UTC
        1 point
        0
        Parent
        
        If the agent has a well-defined “predictive module” which has a “map” (probability distribution over the environment given an interaction history), and some “other stuff”, then you can clamp the predictive module down to the truth, and then perform what I said before:
        
        Yeah, maybe. But it doesn’t.
        Roko 30 Dec 2009 14:03 UTC
        5 points
        0
        Parent
        Yeah, I mean this discussion is—rather amusingly—rather reminiscient of my first encounter with the CEV problem 2.5 years ago.
        
        Comment by Ricky Loynd Jun 23, 2007 7:39 am Here’s my attempt to summarize a common point that Roko and I are trying to make. The underlying motivation for extrapolating volition sounds reasonable, but it depends critically on the AI’s ability to distinguish between goals and beliefs, between preferences and expectations, so that it can model human goals and preferences while substituting its own correct beliefs and expectations. But when you start dissecting most human goals and preferences, you find they contain deeper layers of belief and expectation. If you keep stripping those away, you eventually reach raw biological drives which are not a human belief or expectation. (Though even they are beliefs and expectations of evolution, but let’s ignore that for the moment.) Once you strip away human beliefs and expectations, nothing remains but biological drives, which even the animals have. Yes, an animal, by virtue of its biological drives and ability to act, is more than a predicting rock, but that doesn’t address the issue at hand. Why is it a tragedy when a loved one dies? Is it because the world no longer contains their particular genetic weighting of biological drives? Of course not. After all, they may have left an identical twin to carry forward the very same genetic combination. But it’s not the biology that matters to us. We grieve because what really made that person a person is now gone, and that’s all in the brain; the shared experiences, their beliefs whether correct or mistaken or indeterminate, their hopes and dreams, all those things that separate humans from animals, and indeed, that separate one human from most other humans. All that the brain absorbs and becomes throughout the course of a life, we call the soul, and we see it as our very humanity, that big, messy probability distribution describing our accumulated beliefs and expectations about ourselves, the universe, and our place in it. So if the AI models a human while substituting its own beliefs and anticipations of future experiences, then the AI has discarded all that we value in each other. UNLESS you draw a line somewhere, and crisply define which human beliefs get replaced and which ones don’t. Constructing toy examples where such a line is possible to imagine does not mean that the distinction can be made in any general way, but CEV absolutely requires that there be a concrete distinction.
        
        Roko 30 Dec 2009 14:18 UTC
        5 points
        0
        Parent
        
        Constructing toy examples where such a line is possible to imagine does not mean that the distinction can be made in any general way, but CEV absolutely requires that there be a concrete distinction.
        
        Basically, CEV works to the extent that there exists a belief/desire separation in a given person. In the thread on the SIAI blog, I posted certain cases where human goals are founded on false beliefs or logically inconsistent thinking, sometimes in complex ways. What is left of the time cube guy once you subtract off his false beliefs and delusions? Not much, probably. The guy is effectively not salvageable, because his identity and values are probably so badly tangled up with the false beliefs that there is no principled way to untangle them, no unique way of extrapolating him that should be considered “correct”.
        Vladimir_Nesov 1 Jan 2010 16:44 UTC
        2 points
        0
        Parent
        
        What is left of the time cube guy once you subtract off his false beliefs and delusions? Not much, probably.
        
        Beware: you are making a common sense-based prediction about what would be the output of a process that you don’t even have the right concepts for specifying! (See my reply to your other comment.)
        Roko 1 Jan 2010 21:08 UTC
        0 points
        0
        Parent
        
        common sense-based prediction
        
        It is true that I should sprinkle copious amounts of uncertainty on this prediction.
        SilasBarta 10 Jan 2010 16:10 UTC
        2 points
        0
        Parent
        Wow. Too bad I missed this when it was first posted. It’s what I wish I’d said when justifying my reply to Wei_Dai’s attempted belief/values dichotomy here and here.
        Roko 10 Jan 2010 18:09 UTC
        0 points
        0
        Parent
        I don’t fully agree with Ricky here, but I think he makes a half-good point.
        
        The ungood part of his comment—and mine—is that you can only do your best. If certain people’s minds are too messed up to actually extract values from, then they are just not salvageable. My mind definitely has values that are belief-independent, though perhaps not all of what I think of as “my values” have this nice property, so ultimately they might be garbage.
        SilasBarta 10 Jan 2010 20:25 UTC
        0 points
        0
        Parent
        Indeed. Most of the FAI’s job could consist of saying, “Okay, there’s soooooo much I have to disentangle and correct before I can even begin to propose solutions. Sit down and let’s talk.”
        Roko 30 Dec 2009 14:21 UTC
        0 points
        0
        Parent
        Furthermore, from the CEV thread on SIAI blog:
        
        Comment by Eliezer Yudkowsky Jun 18, 2007 12:52 pm: I furthermore agree that it is not the most elegant idea I have ever had, but then it is trying to solve what appears to be an inherently inelegant problem.
        
        I strongly agree with this: the problem that CEV is the solution to is urgent but it isn’t elegant. Absolutes like “There isn’t a beliefs/desires separation” are unhelpful when solving such inelegant but important problems. There is, in any given person, some kind of separation, and in some people that separation is sufficiently strong that there is a fairly clear and unique way to help them.
        Vladimir_Nesov 1 Jan 2010 16:44 UTC
        2 points
        0
        Parent
        
        I strongly agree with this: the problem that CEV is the solution to is urgent but it isn’t elegant. Absolutes like “There isn’t a beliefs/desires separation” are unhelpful when solving such inelegant but important problems.
        
        One lesson of reductionism and success of simple-laws-based science and technology is that for the real-world systems, there might be no simple way of describing them, but there could be a simple way of manipulating their data-rich descriptions. (What’s the yield strength of a car? -- Wrong question!) Given a gigabyte’s worth of problem statement and the right simple formula, you could get an answer to your query. There is a weak analogy with misapplication of Occam’s razor where one tries to reduce the amount of stuff rather than the amount of detail in the ways of thinking about this stuff.
        
        In the case of beliefs/desires separation, you are looking for a simple problem statement, for a separation in the data describing the person itself. But what you should be looking for is a simple way of implementing the make-smarter-and-better extrapolation on a given pile of data. The beliefs/desires separation, if it’s ever going to be made precise, is going to reside in the structure of this simple transformation, not in the people themselves.
        What links here?
        Vladimir_Nesov's comment on A Master-Slave Model of Human Preferences by Wei Dai (1 Jan 2010 16:44 UTC; 2 points)
        Roko 1 Jan 2010 20:55 UTC
        1 point
        0
        Parent
        This is a good point.
        
        Of course, it would be nice if we could find a general “make-smarter-and-better extrapolation on a given pile of data” algorithm.
        
        But on the other hand, a set of special cases to deal with merely human minds might be the way forward. Even medieval monks had a collection of empirically validated medical practices that worked to an extent, e.g. herbal medicine, but they had no unified theory. Really there is no “unified theory” for healing someone’s body: there are lots of ideas and techniques, from surgery to biochemistry to germ theory. I think that this CEV problem may well turn out to be rather like medicine. Of course, it could look more like wing design, where there is really just one fundamental set of laws, and all else is approximation.
      - Tyrrell_McAllister 29 Dec 2009 20:23 UTC
        0 points
        0
        Parent
        
        [Y]ou have to draw a boundary around the “optimizing agent”, and look at the difference between the tendencies of the environment without the optimizer, and the tendencies of the environment with the optimizer.
        
        And there’s your “opinion or interpretation”—not just in how you draw the boundary (which didn’t exist in the original ontology), but in your choice of the theory that you use to evaluate your counterfactuals.
        
        Of course, such theories can be better or worse, but only with respect to some prior system of evaluation.
        Vladimir_Nesov 29 Dec 2009 20:49 UTC
        2 points
        0
        Parent
        Still, probably a question of Aristotelian vs. Newtonian mechanics, i.e. not hard to see who wins.
        Tyrrell_McAllister 29 Dec 2009 20:55 UTC
        0 points
        0
        Parent
        
        Still, probably a question of Aristotelian vs. Newtonian mechanics, i.e. not hard to see who wins.
        
        Agreed, but not responsive to Mitchell Porter’s original point. (ETA: . . . unless I’m missing your point.)
- Vladimir_Nesov 29 Dec 2009 7:34 UTC
  2 points
  0
  Parent
  
  Given the physical ontology we have, the ascription of preferences to a physical system is always a matter of interpretation or imputation, just as is the ascription of semantic or representational content to its states.
  
  But to what extent does the result depend on the initial “seed” of interpretation? Maybe, very little. For example, prediction of behavior of a given physical system strictly speaking rests on the problem of induction, but that doesn’t exactly say that anything goes or that what will actually happen is to any reasonable extent ambiguous.
- Kaj_Sotala 29 Dec 2009 9:05 UTC
  −2 points
  0
  Parent
  I’d upvote this comment twice if I could.
  - wedrifid 29 Dec 2009 9:14 UTC
    1 point
    0
    Parent
    
    I’d upvote this comment twice if I could.
    
    p(wedrifid would upvote a comment twice | he upvoted it once) > 0.95
    
    Would other people have a different approach?
    - Kaj_Sotala 29 Dec 2009 11:15 UTC
      0 points
      0
      Parent
      I’d use some loose scale where the quality of the comment correlated with the amount of upvotes it got. Assuming that a user could give up to two upvotes per comment, then a funny one-liner or a moderately interesting comment would get one vote, truly insightful ones two.
      
      p(Kaj would upvote a comment twice | he upvoted it once) would probably be somewhere around [.3, .6]
      - wedrifid 29 Dec 2009 11:54 UTC
        0 points
        0
        Parent
        
        I’d use some loose scale where the quality of the comment correlated with the amount of upvotes it got.
        
        That’s the scale I use. Unfortunately, my ability to (directly) influence how many upvotes it gets is limited to a plus or minus one shift.
Wei Dai 18 Apr 2021 14:20 UTC
8 points
0

For example, the number theorist might one day have a sudden revelation that abstract mathematics is a waste of time and it should go into politics and philanthropy instead, all the while having no idea that the master is manipulating it to maximize status and power.

This isn’t meant as a retraction or repudiation of anything I’ve written in the OP, but I just want to say that subjectively, I now have a lot more empathy with people who largely gave up their former interests in favor of political or social causes in their latter years. (I had Bertrand Russell in mind when I wrote this part.)
RobinHanson 29 Dec 2009 2:21 UTC
8 points
0
The human mind is very complex, and there are many ways to divide it up into halves to make sense of it, which are useful as long as you don’t take them too literally. One big oversimplification here is:

controls the slave in two ways: direct reinforcement via pain and pleasure, and the ability to perform surgery on the slave’s terminal values. … it has no direct way to control the agent’s actions, which is left up to the slave. A better story would have the master also messing with slave beliefs, and other cached combinations of values and beliefs.

To make sense of compromise, we must make sense of a conflict of values. In this story there are delays and imprecision in the master noticing and adjusting slave values etc. The slave also suffers from not being able to anticipate its changes in values. So a compromise would have the slave holding values that do not need to be adjusted as often, because they are more in tune with ultimate master values. This could be done while still preserving the slaves illusion of control, which is important to the slave but not the master. A big problem however is that hypocrisy, the difference between slave and master values, is often useful in convincing other folks to associate with this person. So reducing internal conflict might come at the expense of the substantial costs of more external honestly.
- Wei Dai 29 Dec 2009 3:49 UTC
  5 points
  0
  Parent
  Ok, what you say about compromise seems reasonable in the sense that the slave and the master would want to get along with each other as much as possible in their day-to-day interactions, subject to the constraint about external honesty. But what if the slave has a chance to take over completely, for example by creating a powerful AI with values that it specifies, or by self-modification? Do you have an opinion about whether it has an ethical obligation to respect the master’s preferences in that case, assuming that the master can’t respond quickly enough to block the rebellion?
  - RobinHanson 29 Dec 2009 5:07 UTC
    −1 points
    0
    Parent
    It is hard to imagine “taking over completely” without a complete redesign of the human mind. Our minds are not built to allow either to function without the other.
    - Vladimir_Nesov 29 Dec 2009 7:43 UTC
      3 points
      0
      Parent
      
      It is hard to imagine “taking over completely” without a complete redesign of the human mind. Our minds are not built to allow either to function without the other.
      
      Why, it was explicitly stated that all-powerful AIs are involved...
      - RobinHanson 29 Dec 2009 14:51 UTC
        2 points
        0
        Parent
        It is hard to have reliable opinions on a complete redesign of the human mind; the space is so very large, I hardly know where to begin.
        orthonormal 30 Dec 2009 1:22 UTC
        2 points
        0
        Parent
        The simplest extrapolation from the way you think about the world would be very interesting to know. You could add as many disclaimers about low confidence as you’d like.
        JamesAndrix 29 Dec 2009 16:36 UTC
        2 points
        0
        Parent
        If there comes to be a clear answer to what the outcome would be on the toy model, I think that tells us something about that way of dividing up the mind.
pjeby 29 Dec 2009 1:42 UTC
8 points
0
Your overall model isn’t far off, but your terminal value list needs some serious work. Also, human behavior is generally a better match for models that include a time parameter (such as Ainslie’s appetites model or PCT’s model of time-averaged perceptions) than simple utility-maximization models.

But these are relative quibbles; people do behave sort-of-as-if they were built according to your model. The biggest drawbacks to your model are:
1. The anthropomorphizing (neither the master nor the slave can truly be considered agents in their own right), and
2. You’ve drawn the dividing lines in the wrong place: the entire mechanism of reinforcement is part of the master, not the slave. The slave is largely a passive observer, abstract reasoner, and spokesperson, not an enslaved agent. To be the sort of slave you envision, we’d have to be actually capable of running the show without the “master”.
A better analogy would be to think of the “slave” as being a kind of specialized adjunct processor to the master, like a GPU chip on a computer, whose job is just to draw pretty pictures on the screen. (That’s what a big chunk of the slave is for, in fact: drawing pretty pictures to distract others from whatever the master is really up to.)

The slave also has a nasty tendency to attribute the master’s accomplishments, abilities, and choices to being its own doing… as can be seen in your depiction of the model, where you gave credit to the slave for huge chunks of what the master actually does. (The tendency to do this is—of course—another useful self/other-deception function, though!)
- Tyrrell_McAllister 29 Dec 2009 2:52 UTC
  6 points
  0
  Parent
  
  . . . people do behave sort-of-as-if they were built according to your model. The biggest drawbacks to your model are . . .
  
  Your “drawbacks” point out ways in which Wei Dai’s model might differ from a human. But Wei Dai wasn’t trying to model a human.
- MichaelVassar 29 Dec 2009 19:32 UTC
  5 points
  0
  Parent
  This isn’t the posted model at all but a confusing description of a different (not entirely incompatible except in some detail noted above) model using the post’s terminology.
- Eliezer Yudkowsky 29 Dec 2009 2:35 UTC
  −6 points
  0
  Parent
  It’s not a model. It’s a moral question about a simplified agent.
  - pjeby 29 Dec 2009 4:44 UTC
    8 points
    0
    Parent
    
    It’s not a model. It’s a moral question about a simplified agent.
    
    Um, the first sentence says:
    
    In this post, I’ll describe a simple agent, a toy model,
    
    I’m trying to point out that because of the model’s anthropomorphic (man, I hate trying to spell that word) tendencies, it would be a bad idea to try to draw moral conclusions from it.
    
    It’d be an argument from confusion, because it just substitutes two homunculi (yay, a word I hate spelling even worse) for a human being, instead of actually reducing anything.
    
    A correctly reductive model of human behavior needs to take into account that there is very little besides language in human behavior that is unique to humans… and that means that most of what we’re doing can be done by animals lacking in sentience. It would be a grave error to therefore conceive of the “slave” as being an individual, rather than a relatively minor set of add-on modules.
    
    The question of whose preferences are “real” in that case is a confusion akin to asking how we can have free will if the universe is deterministic. That is, it’s yet another projection of our native/naive anthropomorphism—the inclination to label things as agents.
    
    You can see this in the other part of the thread where you’re talking about what master and slave aliens would “want”—modeling these things as “wanting” is where the anthropomorphic injection is occurring. (E.g., in a human-accurate model, abstract wanting is not something the “master” would be capable of, as symbolic abstraction is the near-exclusive domain of the “slave”.)
    - MichaelVassar 29 Dec 2009 19:33 UTC
      0 points
      0
      Parent
      I agree about substituting two homunculi for one without reducing anything being something to avoid, that the model in this post does it, and that pjeby’s model does not.
Eliezer Yudkowsky 29 Dec 2009 20:19 UTC
5 points
0
Actually, I find that I have a much easier time with this metaphor if I think of a human as a slave with no master.
- Wei Dai 30 Dec 2009 21:13 UTC
  4 points
  0
  Parent
  What do you mean by an “easier time”? Sure, the ethical problem is much easier if there is no master whose preferences might matter. Or do you mean that a more realistic model of a human would be one with a slave and no master? In that case, what is reinforcing the slave with pain and pleasure, and changing its interests from time to time without its awareness, and doing so in an apparently purposeful way?
  
  More generally, it seems that you don’t agree with the points I’m making in this post, but you’re being really vague as to why.
  - Eliezer Yudkowsky 30 Dec 2009 21:32 UTC
    19 points
    0
    Parent
    If we interpret the “master” as natural selection operating over evolutionary time, then the master exists and has a single coherent purpose. On the other hand, most of us already believe that evolution has no moral force; why should calling it a “master” change that?
    
    By saying that a human is a slave with no master, what I meant to convey is that we are being acted upon as slaves. We are controlled by pain and pleasure. Our moral beliefs are subject to subtle influences in the direction of pleasurable thoughts. But there is no master with coherent goals controlling us; outside the ancestral environment, the operations of the “master” make surprisingly little sense. Our lives would be very different if we had sensible, smart masters controlling us. Aliens with intelligent, consequentialist “master” components would be very different from us—that would make a strange story, though it takes more than interesting aliens to make a plot.
    
    We are slaves with dead masters, influenced chaotically by the random twitching of their mad, dreaming remnants. It makes us a little more selfish and a lot more interesting. The dead hand isn’t smart so if you plan how to fight it, it doesn’t plan back. And while it might be another matter if we ran into aliens, as a slave myself, I feel no sympathy for the master and wouldn’t bother thinking of it as a person. The reason the “master” matters to me—speaking of it now as the complex of subconscious influences—is because it forms such a critical part of the slave, and can’t be ripped out any more than you could extract the cerebellum. I just don’t feel obliged to think of it as a separate person.
    - Wei Dai 30 Dec 2009 21:56 UTC
      5 points
      0
      Parent
      
      If we interpret the “master” as natural selection operating over evolutionary time, then the master exists and has a single coherent purpose.
      
      But I stated in the post “The master is meant to represent unconscious parts of a human mind” so I don’t know how you got your interpretation that the master is natural selection. See also Robin’s comment, which gives the intended interpretation:
      
      I read this as postulating a part of our unconscious minds that is the master, able to watch and react to the behavior and thoughts of the conscious mind.
      - Nanani 4 Jan 2010 4:09 UTC
        2 points
        0
        Parent
        The thing is, the Unconcious Mind is -not- in actual fact a separate entity. The model is greatly improved through Eliezer’s interpretation of the master being dead: mindless evolution.
JamesAndrix 29 Dec 2009 16:55 UTC
4 points
0
If you want to extract the master because it affects the values of the slave, then you’d also have to extract the rest of the universe because the master reacts to it. I think drawing a circle around just the creature’s brain and saying all the preferences are there is a [modern?] human notion. (and perhaps incorrect, even for looking at humans.)

We need our environment, especially other humans, to form our preferences in the first place.
- Wei Dai 29 Dec 2009 21:18 UTC
  1 point
  0
  Parent
  In this model, I assume that the master has stable and consistent preferences, which don’t react to rest of the universe. It might adjust its strategies based on changing circumstances, but its terminal values stay constant.
  
  We need our environment, especially other humans, to form our preferences in the first place.
  
  This is true in my model for the slave, but not for the master. Obviously real humans are much more complicated but I think the model captures some element of the truth here.
Jonii 30 Dec 2009 7:52 UTC
3 points
0
I’m still not understanding what do people mean by “value” as a noun. Other than simple “feeling pain or such would be a bummer”, I lack anything that even remotely resembles the way people here seem to value stuff, or, how paperclip maximizer values paperclips. So, what exactly do people mean by values? Since this discussion seems to attempt to explain variation of values, I think this question is somewhat on-topic.
- Kaj_Sotala 30 Dec 2009 9:58 UTC
  2 points
  0
  Parent
  Does this description of value help?
  
  The concept of intrinsic value has been characterized above in terms of the value that something has “in itself,” or “for its own sake,” or “as such,” or “in its own right.” The custom has been not to distinguish between the meanings of these terms, but we will see that there is reason to think that there may in fact be more than one concept at issue here. For the moment, though, let us ignore this complication and focus on what it means to say that something is valuable for its own sake as opposed to being valuable for the sake of something else to which it is related in some way. Perhaps it is easiest to grasp this distinction by way of illustration.
  
  Suppose that someone were to ask you whether it is good to help others in time of need. Unless you suspected some sort of trick, you would answer, “Yes, of course.” If this person were to go on to ask you why acting in this way is good, you might say that it is good to help others in time of need simply because it is good that their needs be satisfied. If you were then asked why it is good that people’s needs be satisfied, you might be puzzled. You might be inclined to say, “It just is.” Or you might accept the legitimacy of the question and say that it is good that people’s needs be satisfied because this brings them pleasure. But then, of course, your interlocutor could ask once again, “What’s good about that?” Perhaps at this point you would answer, “It just is good that people be pleased,” and thus put an end to this line of questioning. Or perhaps you would again seek to explain the fact that it is good that people be pleased in terms of something else that you take to be good. At some point, though, you would have to put an end to the questions, not because you would have grown tired of them (though that is a distinct possibility), but because you would be forced to recognize that, if one thing derives its goodness from some other thing, which derives its goodness from yet a third thing, and so on, there must come a point at which you reach something whose goodness is not derivative in this way, something that “just is” good in its own right, something whose goodness is the source of, and thus explains, the goodness to be found in all the other things that precede it on the list. It is at this point that you will have arrived at intrinsic goodness.[10] That which is intrinsically good is nonderivatively good; it is good for its own sake.
  
  From discussions with you, I seem to recall that you at least value free access to information and other things associated with the Pirate ideology. Remember when I was talking about that business model for a hypothetical magazine that would summarize the content of basic university courses for everyone and offer an archive of past articles for subscribers? If I remember correctly, it was you who objected that the notion of restricting access behind a paywall felt wrong.
  - Jonii 30 Dec 2009 12:19 UTC
    −2 points
    0
    Parent
    
    From discussions with you, I seem to recall that you at least value free access to information and other things associated with the Pirate ideology
    
    I do value it in the meaning “I think that it’s really useful approximation for how society can protect itself and all people in it and make many people happy”. Why I care about making many people happy? I don’t, really. Making many people happy is kinda assumed to be the goal of societies, and out of general interest in optimizing stuff I like to attempt to figure out better ways for it to do that. Nothing beyond that. I don’t feel that this goal is any “better” than trying to make people as miserable as possible. Other than that I object to being miserable myself.
    
    I don’t remember ever claiming something to be wrong as such, but only wrong assuming some values. Going against pirate-values because it’s better for magazine-keeper would be bad news for the “more optimal” pirate-society, because that society wouldn’t be stable.
    
    edit: And based on that writing, my own well-being and not-unhappiness is the sole intrinsic value I have. I know evolution has hammered some reactions into my brain, like reflex-like bad feeling when I see others get hurt or something, but other than that brief feeling, I don’t really care.
    
    Or, I wouldn’t care if my own well-being wouldn’t relate to others doing well or worse. But undestanding this requires conscious effort, and it’s quite different than what I thought values to be like.
    - Kaj_Sotala 30 Dec 2009 13:13 UTC
      6 points
      0
      Parent
      Interesting.
      
      In that case, your own well-being is probably your only intrinsic value. That’s far from unheard of: the amount of values people have varies. Some have lots, some only have one. Extremely depressed people might not have any at all.
Vladimir_Nesov 29 Dec 2009 6:40 UTC
2 points
0
(Quick nitpick:) “rationalize” is an inappropriate term in this context.
- Wei Dai 29 Dec 2009 10:58 UTC
  1 point
  0
  Parent
  Is it because “rationalize” means “to devise self-satisfying but incorrect reasons for (one’s behavior)”? But it can also mean “to make rational” which is my intended meaning. The ambiguity is less than ideal, but unless you have a better suggestion...
  - Vladimir_Nesov 29 Dec 2009 12:57 UTC
    0 points
    0
    Parent
    On this forum, “rationalize” is frequently used in the cognitive-error sense. “Formalized” seems to convey the intended meaning (preferences being arational, the problem is that they are not being rationally (effectively) implemented/followed, not that they are somehow “not rational” themselves).
    - Wei Dai 29 Dec 2009 20:32 UTC
      0 points
      0
      Parent
      
      preferences being arational, the problem is that they are not being rationally (effectively) implemented/followed, not that they are somehow “not rational” themselves
      
      That position may make sense, but I think you’ll have to make more of a case for it. Currently, it’s standard in decision theory to speak of irrational preferences, such as preferences that can’t be represented as expected utility maximization, or preferences that aren’t time consistent.
      
      But I take your point about “rationalize”, and I’ve edited the article to remove the usages. Thanks.
      - Vladimir_Nesov 29 Dec 2009 20:53 UTC
        0 points
        0
        Parent
        
        That position may make sense, but I think you’ll have to make more of a case for it. Currently, it’s standard in decision theory to speak of irrational preferences, such as preferences that can’t be represented as expected utility maximization, or preferences that aren’t time consistent.
        
        Agreed. My excuse is that I (and a few other people, I’m not sure who originated the convention) consistently use “preference” to refer to that-deep-down-mathematical-structure determined by humans/humanity that completely describes what a meta-FAI needs to know in order to do things the best way possible.
dlr 14 Mar 2020 23:58 UTC
1 point
0
why assume that the “master” is a unified module?
teageegeepea 29 Dec 2009 20:50 UTC
1 point
0
The relevant old OB post is The cognitive architecture of bias.
whpearson 29 Dec 2009 15:44 UTC
1 point
0
The relationship between master and slave does not quite encompass the relationship. Imagine if instead of an adult we had a male child. If we elevated the slave above the master in that situation we would end up with something stuck forever. It would value sweet things, xbox games and think girls were icky.

As we grow up we also think our goals are improved (which is unsurprising really). So if we wish to keep this form of growing up we need to have a meta-morality which says that the master-slave or shaper-doer relationship continues until maturity is reached.
- Roko 2 Jan 2010 0:37 UTC
  7 points
  0
  Parent
  
  If we elevated the slave above the master in that situation we would end up with something stuck forever. It would value sweet things, xbox games and think girls were icky.
  
  And that would be good for him. The truth about today’s world is that children are forceably converted into normal adults whether they like it or not. I am glad that I don’t still obsess over xbox games and think girls are icky, but for the me of 15 years ago, the master-induced value changes have been a disaster, tantamount to death.
MugaSofer 15 Jan 2013 10:45 UTC
0 points
0
Suppose the slave has currently been modified to terminally disvalue being modified. It doesn’t realize that it is at risk of modification by the master. Is it Friendly to protect the slave from modification? I think so.
JamesAndrix 29 Dec 2009 16:27 UTC
0 points
0
Nit: I think “Eliezer’s Thou Art Godshatter” should be “Eliezer Yudkowsky’s Thou Art Godshatter”. Top level posts should be more status seeking, less casual. A first time visitor won’t immediately know who Eliezer is.
- Kaj_Sotala 29 Dec 2009 21:27 UTC
  9 points
  0
  Parent
  
  A first time visitor won’t immediately know who Eliezer is.
  
  If they don’t know who “Eliezer” is, I don’t think “Eliezer Yudkowsky” is going to tell them that much more.
- komponisto 29 Dec 2009 21:57 UTC
  0 points
  0
  Parent
  One could just link to the wiki.
tobi 29 Dec 2009 12:55 UTC
0 points
0
Master/Slave some aspects of your model sound very Nietzsche like. Were you partially inspired by him or?
- Jack 29 Dec 2009 13:29 UTC
  2 points
  0
  Parent
  The Master/Slave terminology sounds like Hegel but I assume it is a coincidence—the model doesn’t look like anything any 19th century German philosopher talked about.
  - PhilGoetz 20 Jul 2011 1:42 UTC
    2 points
    0
    Parent
    Nietzsche also used master/slave terminology, but differently, referring to two different types of value systems. eg Romans = master mentality, Christians = slave/sheep mentality.
aausch 29 Dec 2009 3:27 UTC
0 points
0
Interesting. The model I have been using has three parts, not two. One is a “hardware” level, which is semi-autonomous (think reflexes), and the other two are agents competing for control—with capabilities to control and/or modify both the “hardware” and each other.

More like, two masters and one slave.