“What simple tricks can help turn humans—haphazard evolutionary amalgams that we are—into coherent agents?”
One trick that we can always apply: disassemble the human and use his atoms to build a paperclip maximizer. The point is, we don’t just want to turn humans into coherent agents, we want to turn them into coherent agents who can be said to have the same preferences as the original humans. But given that we don’t have a theory of preferences for incoherent agents, how do we know that any given trick intended to improve coherence is preference-preserving? Right now we have little to guide us except intuition.
To borrow an example from Robin Hanson, we have both preferences that are consciously held, and preferences that are unconsciously held, and many “rationality techniques” seem to emphasize the consciously held preferences at the expense of unconsciously held preferences. It’s not clear this is kosher.
I think there are many important unsolved problems in the theoretical/philosophical parts of rationality, and this post seems to under-emphasize them.
I think there are many important unsolved problems in the theoretical/philosophical parts of rationality, and this post seems to under-emphasize them.
What would a picture of rationality’s goal that correctly emphasized them look like?
I think it should at least mention prominently that there is a field that might be called “theory of rationality” and perhaps subdivided into “theory of ideal agents” and “theory of flawed agents”, and we still know very little about these subjects (the latter even less than the former), and as a result we have little theoretical guidance for the practical work.
I’m tempted to further say that work on theory should take precedence over work on practice at this point, but that’s probably just my personal bias speaking. In any case people will mostly work on what they intuitively think is interesting or important, so I just want to make sure that people who are interested in “rationality” know that there are lots of theoretical problems that they might consider interesting or important.
The point is, we don’t just want to turn humans into coherent agents, we want to turn them into coherent agents who can be said to have the same preferences as the original humans. But given that we don’t have a theory of preferences for incoherent agents, how do we know that any given trick intended to improve coherence is preference-preserving? Right now we have little to guide us except intuition.
I absolutely agree. The actual question I had written on my sheet, as I tried to figure out what a more powerful “rationality” might include, was “… into coherent agents, with something like the goals ‘we’ wish to have?” Branch #8 above is exactly the art of not having the goals-one-acts-on be at odds with the goals-one-actually-cares-about (and includes much mention of the usefulness of theory).
My impression, though, is that some of the other branches of rationality in the post are very helpful for self-modifying in a manner you’re less likely to regret. Philosophy always holds dangers, but a person approaching the question of “What goals shall I choose?”, and encountering confusing information that may affect what he wants (e.g., encountering arguments in meta-ethics, or realizing his religion is false, or realizing he might be able to positively or negatively affect a disorienting number of lives) will be much better off if he already has good self-knowledge and has accepted that his current state is his current state (vs. if he wants desperately to maintain that, say, he doesn’t care about status and that only utilitarian expected-global-happiness-impacts affect his behavior—a surprisingly common nerd failure mode).
I don’t know how to extrapolate the preferences of myself or other people either, but my guess is, while further theoretical work is critical, it’ll be easier to do this work in a non-insane fashion in the context of a larger, or more whole-personed, rationality. What are your thoughts here?
very helpful for self-modifying in a manner you’re less likely to regret.
I don’t think regret is the concern here… Your future self might be perfectly happy making paperclips. I almost think “not wanting your preferences changed” deserves a new term...Hmm, “pre-gret”?
I like to imagine another copy of my mind watching what I’m becoming, and being pleased. If I can do that, then I feel good about my direction.
You will find people who are willing to bite the “I won’t care when I’m dead” bullet, or at least claim to—it’s probably just the abstract rule-based part of them talking.
will be much better off if he already has good self-knowledge and has accepted that his current state is his current state
Everything here turns on the meaning of “accept”. Does it mean “acknowledge as a possibly fixable truth” or does it mean “consciously endorse”? I think you’re suggesting the latter but only defending the former, which is much more obviously true.
he wants desperately to maintain that, say, he doesn’t care about status and that only utilitarian expected-global-happiness-impacts affect his behavior—a surprisingly common nerd failure mode
Is the disagreement here about what his brain does, or about what parts of his brain to label as himself? If the former, it’s not obviously common, if the latter, it’s not obviously a failure mode.
I don’t have much data here, but I guess none of us do. Personally, I haven’t found it terribly helpful to learn that I’m probably driven in large part by status seeking, and not just pure intellectual curiosity. I’m curious what data points you have.
That is interesting to me because finding out I am largely a status maximizer (and that others are as well) has been one of the most valuable bits of information I’ve learned from OB/LW. This was especially true at work, where I realized I needed to be maximizing my status explicitly as a goal and not feel bad about it, which allowed me to do so far more efficiently.
You, upon learning that you’re largely a status maximizer, decided to emphasize status seeking even more, by doing it on a conscious level. But that means other competing goals (I assume you must have some) have been de-emphasized, since the cognitive resources of your conscious mind are limited.
I, on the other hand, do not want to want to seek status. Knowing that I’m driven largely by status seeking makes me want to self-modify in a way that de-emphasizes status seeking as a goal (*). But I’m not really sure either of these responses are rational.
(*) Unfortunately I don’t know how to do so effectively. Before, I’d just spend all of my time thinking about a problem on the object level. Now I can’t help but periodically wonder if I believe or argue for some position because it’s epistemically justified, or because it helps to maximize status. For me, this self doubt seems to sap energy and motivation without reducing bias enough to be worth the cost.
This is the simple version of the explicit model I have in my head at work now: I have two currencies, Dollars and Status. Every decision I make likely has some impact both in terms of our company’s results (Dollars) and also in terms of how I and others will be perceived (Status). The cost in Status to make any given decision is a reducing function of current Status. My long term goal is to maximize Dollars. However, often the correct way to maximize Dollars in the long term is to sacrifice Dollars for Status, bank the Status and use it to make better decisions later.
I think this type of thing should be common. Status is a resource that is used to acquire what you want, so in my mind there’s no shame in going after it.
Do you ever find yourself in situations where you would predict different things if you thought you were a pure-intellectual-curiosity-satisfier than if you think you’re in part a status-maximizer?
If so, is making more accurate predictions in such situations useful, or do accurate predictions not matter much?
I suspect that if I thought of myself as a pure-intellectual-curiosity-satisfier, I would be a lot more bewildered by my behavior and my choices than I am, and struggle with them a lot more than I do, and both of those would make me less happy.
If the way you seek status is ethical (“do good work” more than “market yourself as doing good work”) then you may not want to change anything once you discover your “true motivation”. And the alternative “don’t care about anything” hardly entices.
I think there are many important unsolved problems in the theoretical/philosophical parts of rationality, and this post seems to under-emphasize them.
Agreed to an extent, but most folk aren’t out to become Friendliness philosophers. One branch that went unmentioned in the post and would be useful for both philosophers and pragmatists includes the ability to construct (largely cross-domain / overarching) ontologies out of experience and abstract knowledge, the ability to maintain such ontologies (propagating beliefs across domains, noticing implications of belief structures and patterns, noticing incoherence), and the disposition of staying non-attached to familiar ontologies (e.g. naturalism/reductionism) and non-averse to unfamiliar/enemy ontologies (e.g. spiritualism/phenomenology). This is largely what distinguishes exemplary rationalists from merely good rationalists, and it’s barely talked about at all on Less Wrong.
One trick that we can always apply: disassemble the human and use his atoms to build a paperclip maximizer. The point is, we don’t just want to turn humans into coherent agents, we want to turn them into coherent agents who can be said to have the same preferences as the original humans. But given that we don’t have a theory of preferences for incoherent agents, how do we know that any given trick intended to improve coherence is preference-preserving? Right now we have little to guide us except intuition.
To borrow an example from Robin Hanson, we have both preferences that are consciously held, and preferences that are unconsciously held, and many “rationality techniques” seem to emphasize the consciously held preferences at the expense of unconsciously held preferences. It’s not clear this is kosher.
I think there are many important unsolved problems in the theoretical/philosophical parts of rationality, and this post seems to under-emphasize them.
What would a picture of rationality’s goal that correctly emphasized them look like?
I think it should at least mention prominently that there is a field that might be called “theory of rationality” and perhaps subdivided into “theory of ideal agents” and “theory of flawed agents”, and we still know very little about these subjects (the latter even less than the former), and as a result we have little theoretical guidance for the practical work.
I’m tempted to further say that work on theory should take precedence over work on practice at this point, but that’s probably just my personal bias speaking. In any case people will mostly work on what they intuitively think is interesting or important, so I just want to make sure that people who are interested in “rationality” know that there are lots of theoretical problems that they might consider interesting or important.
I absolutely agree. The actual question I had written on my sheet, as I tried to figure out what a more powerful “rationality” might include, was “… into coherent agents, with something like the goals ‘we’ wish to have?” Branch #8 above is exactly the art of not having the goals-one-acts-on be at odds with the goals-one-actually-cares-about (and includes much mention of the usefulness of theory).
My impression, though, is that some of the other branches of rationality in the post are very helpful for self-modifying in a manner you’re less likely to regret. Philosophy always holds dangers, but a person approaching the question of “What goals shall I choose?”, and encountering confusing information that may affect what he wants (e.g., encountering arguments in meta-ethics, or realizing his religion is false, or realizing he might be able to positively or negatively affect a disorienting number of lives) will be much better off if he already has good self-knowledge and has accepted that his current state is his current state (vs. if he wants desperately to maintain that, say, he doesn’t care about status and that only utilitarian expected-global-happiness-impacts affect his behavior—a surprisingly common nerd failure mode).
I don’t know how to extrapolate the preferences of myself or other people either, but my guess is, while further theoretical work is critical, it’ll be easier to do this work in a non-insane fashion in the context of a larger, or more whole-personed, rationality. What are your thoughts here?
I don’t think regret is the concern here… Your future self might be perfectly happy making paperclips. I almost think “not wanting your preferences changed” deserves a new term...Hmm, “pre-gret”?
Useful concept, bad example.
Upvoted for ‘pregret.’
I like to imagine another copy of my mind watching what I’m becoming, and being pleased. If I can do that, then I feel good about my direction.
You will find people who are willing to bite the “I won’t care when I’m dead” bullet, or at least claim to—it’s probably just the abstract rule-based part of them talking.
Everything here turns on the meaning of “accept”. Does it mean “acknowledge as a possibly fixable truth” or does it mean “consciously endorse”? I think you’re suggesting the latter but only defending the former, which is much more obviously true.
Is the disagreement here about what his brain does, or about what parts of his brain to label as himself? If the former, it’s not obviously common, if the latter, it’s not obviously a failure mode.
Those both sound like basically verbal/deliberate activities, which is probably not what Anna meant. I would say “not be averse to the thought of”.
I don’t have much data here, but I guess none of us do. Personally, I haven’t found it terribly helpful to learn that I’m probably driven in large part by status seeking, and not just pure intellectual curiosity. I’m curious what data points you have.
That is interesting to me because finding out I am largely a status maximizer (and that others are as well) has been one of the most valuable bits of information I’ve learned from OB/LW. This was especially true at work, where I realized I needed to be maximizing my status explicitly as a goal and not feel bad about it, which allowed me to do so far more efficiently.
You, upon learning that you’re largely a status maximizer, decided to emphasize status seeking even more, by doing it on a conscious level. But that means other competing goals (I assume you must have some) have been de-emphasized, since the cognitive resources of your conscious mind are limited.
I, on the other hand, do not want to want to seek status. Knowing that I’m driven largely by status seeking makes me want to self-modify in a way that de-emphasizes status seeking as a goal (*). But I’m not really sure either of these responses are rational.
(*) Unfortunately I don’t know how to do so effectively. Before, I’d just spend all of my time thinking about a problem on the object level. Now I can’t help but periodically wonder if I believe or argue for some position because it’s epistemically justified, or because it helps to maximize status. For me, this self doubt seems to sap energy and motivation without reducing bias enough to be worth the cost.
This is the simple version of the explicit model I have in my head at work now: I have two currencies, Dollars and Status. Every decision I make likely has some impact both in terms of our company’s results (Dollars) and also in terms of how I and others will be perceived (Status). The cost in Status to make any given decision is a reducing function of current Status. My long term goal is to maximize Dollars. However, often the correct way to maximize Dollars in the long term is to sacrifice Dollars for Status, bank the Status and use it to make better decisions later.
I think this type of thing should be common. Status is a resource that is used to acquire what you want, so in my mind there’s no shame in going after it.
How do time constraints play into this model?
Do you ever find yourself in situations where you would predict different things if you thought you were a pure-intellectual-curiosity-satisfier than if you think you’re in part a status-maximizer?
If so, is making more accurate predictions in such situations useful, or do accurate predictions not matter much?
I suspect that if I thought of myself as a pure-intellectual-curiosity-satisfier, I would be a lot more bewildered by my behavior and my choices than I am, and struggle with them a lot more than I do, and both of those would make me less happy.
If the way you seek status is ethical (“do good work” more than “market yourself as doing good work”) then you may not want to change anything once you discover your “true motivation”. And the alternative “don’t care about anything” hardly entices.
I, the entity that is typing these words, do not approve of unconscious preferences when they conflict with conscious ones.
Agreed to an extent, but most folk aren’t out to become Friendliness philosophers. One branch that went unmentioned in the post and would be useful for both philosophers and pragmatists includes the ability to construct (largely cross-domain / overarching) ontologies out of experience and abstract knowledge, the ability to maintain such ontologies (propagating beliefs across domains, noticing implications of belief structures and patterns, noticing incoherence), and the disposition of staying non-attached to familiar ontologies (e.g. naturalism/reductionism) and non-averse to unfamiliar/enemy ontologies (e.g. spiritualism/phenomenology). This is largely what distinguishes exemplary rationalists from merely good rationalists, and it’s barely talked about at all on Less Wrong.