I like the approach but the problem this may run into on many occasions is—parts of the reasoning include things which are not easily communicable. Suppose you talk to AlphaGo and want it to explain why it prefers some move. It may relatively easily communicate part of tree search (an I can easily integrate the non-overlapping part of the tree into my tree), but we will run into trouble communicating the “policy network” consisting of weights obtained on training by very large number of examples. *
From human perspective, such parts of our cognitive infrastructure are often not really accessible, giving their results in the form of “intuition”. What’s worse, other part of our cognitive infrastructure is very good in making fake stories explaining the intuitions, so if you are motivated enough your mind may generate plausible but in fact fake models for your intuition. In worst case you discard the good intuition when someone shows you problems in the fake model.
Also… it seems to me it is in practice difficult to communicate such parts of your reasoning, and among rationalists in particular. I would expect communicating parts of the model in the form of “my intuition is …” to be intuitively frowned upon, down-voted, pattern-matched to argument from authority, or “argument” by someone who cannot think clearly. (An exception are intuitions of high-status people). In theory it would help if you describe why your intuitive black box is producing something else than random noise, but my intuition tells me in it may sound even more awkward (“Hey I studied this for several years” )
*Btw in some cases, it _may_ make sense to throw away part of you model and replace it just by the opinion of the expert. I’m not sure I can describe it clearly without drawing
It’s important to have a model of whether AlphaGo is trustworthy before you should trust it. Knowing either (a) it beat all the grandmasters or (b) its architecture and amount of compute it used, is necessary for me to take on its policies. (This is sort of the point of Inadequate Equilibria—you need to make models of the trustworthiness of experts.)
Btw in some cases, it _may_ make sense to throw away part of you model and replace it just by the opinion of the expert. I’m not sure I can describe it clearly without drawing
I think I’d say something like: it may become possible to download the policy network of AlphaGo—to learn what abstractions it’s using and what it pays attention to. And AlphaGo may not be able to tell you what experiences it had that lead to the policy network (it’s hard to communicate tens of thousands of game’s worth of experience). Yet you should probably just replace your Go models with the policy network of AlphaGo if you’re offered the choice.
A thing I’m currently confused on regarding this, is how much one is able to further update after downloading the policy network of such an expert. How much evidence should persuade you to change your mind, as opposed to you expecting that info to be built into the policy network you have?
From human perspective, such parts of our cognitive infrastructure are often not really accessible, giving their results in the form of “intuition”. What’s worse, other part of our cognitive infrastructure are very good in making fake stories explaining the intuitions, so if you are motivated enough your mind may generate plausible but in fact fake models for your intuition. In worst case you discard the good intuition when someone shows you problems in the fake model.
You’re right, it does seem that a bunch of important stuff is in the subtle, hard-to-make-explicit reasoning that we do. I’m confused about how to deal with this, but I’ll say two things:
This is where the most important stuff is, try to focus your cognitive work on making this explicit and trading these models. Oliver likes to call this ‘communicating taste’.
Practice not dismissing the intuitions in yourself—use bucket protections if need-be.
I’m reminded of the feeling when I’m processing a new insight, and someone asks me to explain it immediately, and I try to quickly convey this simple insight, but words I say end up making no sense (and this is not because I don’t understand it). If I can’t communicate something, one of the key skills here is acting on your models anyway, even if were you to explain your reasoning it might sound inconsistent or like you were making a bucket error.
And yeah, for the people who you think its really valuable to download the policy networks from, it’s worth spending the time building intuitions that match theirs. I feel like I often look at a situation using my inner Hansonian-lens, and predict what he’ll say (sometimes successfully), even while I can’t explicitly state the principles Robin is using.
I don’t think the things I’ve just said constitute a clear solution to the problem you raised, and I think my original post is missing some key part that you correctly pointed to.
In retrospect, I think this comment of mine didn’t address Jan’s key point, which is that we often form intuitions/emotions by running a process analagous to aggregating data into a summary statistic and then throwing away the data. Now the evidence we saw is quite incommunicable—we no longer have the evidence ourselves.
Ray Arnold gave me a good example the other day of two people—one an individualist libertarian, the other a communitarian Christian. In the example these two people deeply disagree on how society should be set up, and this is entirely because they’re two identical RL systems built on different training sets (one has repeatedly seen the costs of trying to trust others with your values, and the other has repeatedly seen it work out brilliantly). Their brains have compressed the data into a single emotion, that they feel in groups trying to coordinate (say). Overall they might be able to introspect enough to communicate the causes of their beliefs, but they might not—they might just be stuck this way (until we reach the glorious transhumanist future, that is). Scott might expect them to say they just have fundamental value differences.
I agree that I have not in the OP given a full model of the different parts of the brain, how they do reasoning, and which parts are (or aren’t) in principle communicable or trustworthy. I at least claim that I’ve pointed to a vague mechanism that’s more true than the simple model where everyone just has the outputs of their beliefs. There are important gears that are hard-but-possible to communicate, and they’re generally worth focusing on over and above the credences they output. (Will write more on this in a future post about Aumann’s Agreement Theorem.)
I think the problem with being willing to throw away your model for an expert opinion is that it means, taken literally, you’d be willing to keep replacing your opinion as more experts (of differing opinions) talked to you, without understanding why. Right now, I view “respected expert disagrees with me” not as “they have higher status, they win” but as I should figure out whether I’m missing information.
What I had in mind would, for example, in the case of AlphaGo mean throwing away part of my opinion on “how valuable a position is” and replacing it by AG opinion. That does not mean throwing away your whole model. If done in a right way, you can integrate opinions of experts and come with better overall picture than any of them, even if you miss most of the information.
I like the approach but the problem this may run into on many occasions is—parts of the reasoning include things which are not easily communicable. Suppose you talk to AlphaGo and want it to explain why it prefers some move. It may relatively easily communicate part of tree search (an I can easily integrate the non-overlapping part of the tree into my tree), but we will run into trouble communicating the “policy network” consisting of weights obtained on training by very large number of examples. *
From human perspective, such parts of our cognitive infrastructure are often not really accessible, giving their results in the form of “intuition”. What’s worse, other part of our cognitive infrastructure is very good in making fake stories explaining the intuitions, so if you are motivated enough your mind may generate plausible but in fact fake models for your intuition. In worst case you discard the good intuition when someone shows you problems in the fake model.
Also… it seems to me it is in practice difficult to communicate such parts of your reasoning, and among rationalists in particular. I would expect communicating parts of the model in the form of “my intuition is …” to be intuitively frowned upon, down-voted, pattern-matched to argument from authority, or “argument” by someone who cannot think clearly. (An exception are intuitions of high-status people). In theory it would help if you describe why your intuitive black box is producing something else than random noise, but my intuition tells me in it may sound even more awkward (“Hey I studied this for several years” )
*Btw in some cases, it _may_ make sense to throw away part of you model and replace it just by the opinion of the expert. I’m not sure I can describe it clearly without drawing
(I found your comment really clear and helpful.)
It’s important to have a model of whether AlphaGo is trustworthy before you should trust it. Knowing either (a) it beat all the grandmasters or (b) its architecture and amount of compute it used, is necessary for me to take on its policies. (This is sort of the point of Inadequate Equilibria—you need to make models of the trustworthiness of experts.)
I think I’d say something like: it may become possible to download the policy network of AlphaGo—to learn what abstractions it’s using and what it pays attention to. And AlphaGo may not be able to tell you what experiences it had that lead to the policy network (it’s hard to communicate tens of thousands of game’s worth of experience). Yet you should probably just replace your Go models with the policy network of AlphaGo if you’re offered the choice.
A thing I’m currently confused on regarding this, is how much one is able to further update after downloading the policy network of such an expert. How much evidence should persuade you to change your mind, as opposed to you expecting that info to be built into the policy network you have?
You’re right, it does seem that a bunch of important stuff is in the subtle, hard-to-make-explicit reasoning that we do. I’m confused about how to deal with this, but I’ll say two things:
This is where the most important stuff is, try to focus your cognitive work on making this explicit and trading these models. Oliver likes to call this ‘communicating taste’.
Practice not dismissing the intuitions in yourself—use bucket protections if need-be.
I’m reminded of the feeling when I’m processing a new insight, and someone asks me to explain it immediately, and I try to quickly convey this simple insight, but words I say end up making no sense (and this is not because I don’t understand it). If I can’t communicate something, one of the key skills here is acting on your models anyway, even if were you to explain your reasoning it might sound inconsistent or like you were making a bucket error.
And yeah, for the people who you think its really valuable to download the policy networks from, it’s worth spending the time building intuitions that match theirs. I feel like I often look at a situation using my inner Hansonian-lens, and predict what he’ll say (sometimes successfully), even while I can’t explicitly state the principles Robin is using.
I don’t think the things I’ve just said constitute a clear solution to the problem you raised, and I think my original post is missing some key part that you correctly pointed to.
In retrospect, I think this comment of mine didn’t address Jan’s key point, which is that we often form intuitions/emotions by running a process analagous to aggregating data into a summary statistic and then throwing away the data. Now the evidence we saw is quite incommunicable—we no longer have the evidence ourselves.
Ray Arnold gave me a good example the other day of two people—one an individualist libertarian, the other a communitarian Christian. In the example these two people deeply disagree on how society should be set up, and this is entirely because they’re two identical RL systems built on different training sets (one has repeatedly seen the costs of trying to trust others with your values, and the other has repeatedly seen it work out brilliantly). Their brains have compressed the data into a single emotion, that they feel in groups trying to coordinate (say). Overall they might be able to introspect enough to communicate the causes of their beliefs, but they might not—they might just be stuck this way (until we reach the glorious transhumanist future, that is). Scott might expect them to say they just have fundamental value differences.
I agree that I have not in the OP given a full model of the different parts of the brain, how they do reasoning, and which parts are (or aren’t) in principle communicable or trustworthy. I at least claim that I’ve pointed to a vague mechanism that’s more true than the simple model where everyone just has the outputs of their beliefs. There are important gears that are hard-but-possible to communicate, and they’re generally worth focusing on over and above the credences they output. (Will write more on this in a future post about Aumann’s Agreement Theorem.)
I think the problem with being willing to throw away your model for an expert opinion is that it means, taken literally, you’d be willing to keep replacing your opinion as more experts (of differing opinions) talked to you, without understanding why. Right now, I view “respected expert disagrees with me” not as “they have higher status, they win” but as I should figure out whether I’m missing information.
I think most disagreements are about missing information.
What I had in mind would, for example, in the case of AlphaGo mean throwing away part of my opinion on “how valuable a position is” and replacing it by AG opinion. That does not mean throwing away your whole model. If done in a right way, you can integrate opinions of experts and come with better overall picture than any of them, even if you miss most of the information.