Personally, I’m incredibly confused about what the comment reveals about what it’s like to be you. Your questions just feel to me like they’re coming from a completely different place from my experience, and it’s hard not to have an initial sense that you must be trolling.
I think humans are reinforcement learners in a very strong sense, much stronger than just a metaphor; it’s basically the best gears model I know of what humans (also non-human animals) are like. The history of this idea starts with classical conditioning and operant conditioning and in its modern form can get very sophisticated, for example with the modern understanding of dopamine as reward prediction error.
But this is something you should be able to experience about yourself without needing to study the psychology or RL literature: have you ever tried a new food and found that you liked it a surprising amount and now you eat it all the time? That’s reinforcement learning.
CoZE can be used as a therapy to overcome phobias but it’s more general than that. There are things that you’re not afraid to do, but that you just never do because it never occurs to you, and CoZE can give you exposure to those things in such a way that they might occur to you to do later.
CoZE can be used as a therapy to overcome phobias but it’s more general than that. There are things that you’re not afraid to do, but that you just never do because it never occurs to you, and CoZE can give you exposure to those things in such a way that they might occur to you to do later.
This seems to me to be an (indeed, the) interesting claim here, and is exactly what I’d like to see explored in some detail, in the treatment of CoZE that I should like to read. Indeed, an expansion of, and justification for, this claim, is precisely the sort of thing I was asking for in the first place.
Great, thanks, I think I understand your confusion better now (this was not at all apparent to me). There’s a section of Stephan Guyenet’s The Hungry Brain that you might be interested in reading; he talks about how the brain selects actions to take at a neurological level, and his explanation is very clear.
Personally, I’m incredibly confused about what the comment reveals about what it’s like to be you. Your questions just feel to me like they’re coming from a completely different place from my experience, and it’s hard not to have an initial sense that you must be trolling.
I don’t think I’ve ever posted a comment, either here or on the original Less Wrong, that I’d describe as ‘trolling’, so for whatever my word is worth, you can be sure that that’s not my intent.
I don’t think there’s anything terribly unusual about what it’s like to be me. I, conversely, often have this sort of reaction when I read about the experiences of folks in the rationalist community (especially those from the Bay Area). Greater mutual understanding is one of our goals here, right?
[stuff about conditioning]
We’ve moved past behaviorism, though, haven’t we? Or is it now, again, the ‘in’ thing? I wasn’t aware of that; but then, I don’t keep up with all the latest developments in psychology. It’s a big field; and it seems strange to expect that all Less Wrong readers must be intimately familiar with the most current findings in neurobiology or what have you. (Perhaps a sequence of posts, describing (various aspects of) the current understanding of the mind, is in order?)
But this is something you should be able to experience about yourself without needing to study the psychology or RL literature: have you ever tried a new food and found that you liked it a surprising amount and now you eat it all the time? That’s reinforcement learning.
This seems like a strange thing to say, in exactly the way I meant in my earlier comment. Why say “that’s reinforcement learning”, when you could just as easily say “that’s you finding out that you liked something, and then acting on that newfound knowledge”—or, conversely, “that’s the action of the electromagnetic forces in the atoms of your body”? You wouldn’t say the latter, would you? But why not? Well, because it’s a rather uninformative, far too low-level, description; the interesting stuff, the stuff that has predictive value and that we can reason about, is taking place on the higher levels… and so with the “reinforcement learning” comment, which also seems to me to speak of “implementation details” that don’t tell me anything particularly interesting or specific. (I am, of course, entirely taking your word that it’s even an accurate description in the first place.)
(This is all not to mention the fact that your scenario is certainly not outside the bounds of my experience, but on the other hand neither are many adjacent but distinct scenarios, such as: (a) I try a new food, don’t much like it, but then keep eating it occasionally and develop a taste; (b) I try a new food, don’t much like it, but then some time later try it again and do like it, and then eat it occasionally; (c) I try a new food, like it a lot, but then don’t really eat it all that often; (d) I try a new food, like it a lot, eat it a lot, but then stop liking it, and stop eating it; (e) etc., etc. Are all of these things also “reinforcement learning”? If yes, then everything is “reinforcement learning” and the label is uninformative, much like “that’s EM fields among atoms”; if not, then saying that I “am a reinforcement learner” is clearly inaccurate, at best.)
Thanks for writing this. I think part of what’s going on is that when I say “reinforcement learning” it connects to a lot of gears in my head (for example, Q-learning) and I think I’ve been typical minding on how much other people know about those gears. It would probably be well worth writing a top-level post on using reinforcement learning to understand human cognition and behavior broadly, but I’m not sure I want to commit to a project that looks so big.
Yes, all of those things could be an example of reinforcement learning, depending on contextual factors such as the extent to which the food contains nutrients you’re deficient in, your history of experience with similar foods, your sense of how others around you would react to seeing you eat the food, etc. I’m aware that you’ll find this an unsatisfying answer, but since humans do in fact exhibit a large and complicated array of behaviors (I hope we can agree on this at least) a theory into which it all comfortably fits needs to be flexible enough to produce such an array.
I think part of what’s going on is that when I say “reinforcement learning” it connects to a lot of gears in my head (for example, Q-learning) and I think I’ve been typical minding on how much other people know about those gears.
Indeed; e.g., I didn’t know what Q-learning is until now—in fact I still don’t know what it is (for precisely the reason noted in the banner at the top of the Wikipedia page).
It’s entirely understandable that you are reluctant to take on a project to explain the entirety of what seems to be a large and somewhat abstruse scientific subfield. That said, the takeaway is that most readers here have no good reason to share those of your views that have, as a prerequisite, understanding of said field. (Making our way toward having a good overview of reinforcement learning would seem to be a good community goal for Less Wrong.)
Yes, all of those things could be an example of reinforcement learning, depending on contextual factors … I’m aware that you’ll find this an unsatisfying answer, but since humans do in fact exhibit a large and complicated array of behaviors (I hope we can agree on this at least) a theory into which it all comfortably fits needs to be flexible enough to produce such an array.
Indeed, but the point here is that the same could be said of atomic theory, which, while perfectly true, tells us very little about what behaviors we should engage in, what strategies and approaches to life’s challenges are likely to be successful, etc. If “reinforcement learning” and “reinforcement learner” are that broad and general of categories, then just pointing that humans are reinforcement learners, in support of a specific claim or specific advice or a specific technique, etc., is not particularly convincing.
Maybe this is Bay Area bias, but the models that Qiaochu is relying on strike me as very natural, point to a lot of meaningful gears in my head, and my model of at least a large chunk of the people on this side have a similar experience.
I feel that reinforcement-learning based models were covered by a bunch of highly upvoted content on this site, let me quickly take 5 minutes to find the references I remember:
And a bunch more. This perspective of humans as reinforcement learners has been a core topic of a lot of LessWrong writing, and it seems reasonable for people to write things that build on top of that.
As for the models—to me they seem oddly esoteric and specific to support such general claims (and, of course, I don’t have these “gears in my head” to point to).
However, perhaps I’ll change my mind after reading the posts you linked—which I will do at my earliest convenience!
Personally, I’m incredibly confused about what the comment reveals about what it’s like to be you. Your questions just feel to me like they’re coming from a completely different place from my experience, and it’s hard not to have an initial sense that you must be trolling.
I think humans are reinforcement learners in a very strong sense, much stronger than just a metaphor; it’s basically the best gears model I know of what humans (also non-human animals) are like. The history of this idea starts with classical conditioning and operant conditioning and in its modern form can get very sophisticated, for example with the modern understanding of dopamine as reward prediction error.
But this is something you should be able to experience about yourself without needing to study the psychology or RL literature: have you ever tried a new food and found that you liked it a surprising amount and now you eat it all the time? That’s reinforcement learning.
CoZE can be used as a therapy to overcome phobias but it’s more general than that. There are things that you’re not afraid to do, but that you just never do because it never occurs to you, and CoZE can give you exposure to those things in such a way that they might occur to you to do later.
Commenting on this bit separately:
This seems to me to be an (indeed, the) interesting claim here, and is exactly what I’d like to see explored in some detail, in the treatment of CoZE that I should like to read. Indeed, an expansion of, and justification for, this claim, is precisely the sort of thing I was asking for in the first place.
Great, thanks, I think I understand your confusion better now (this was not at all apparent to me). There’s a section of Stephan Guyenet’s The Hungry Brain that you might be interested in reading; he talks about how the brain selects actions to take at a neurological level, and his explanation is very clear.
Thank you, I’ll check out Guyenet’s book.
Meanwhile, here’s a question: are Guyenet’s claims / models / etc. generally accepted in the field? That is—is this, basically, orthodoxy, or what?
I don’t think I’ve ever posted a comment, either here or on the original Less Wrong, that I’d describe as ‘trolling’, so for whatever my word is worth, you can be sure that that’s not my intent.
I don’t think there’s anything terribly unusual about what it’s like to be me. I, conversely, often have this sort of reaction when I read about the experiences of folks in the rationalist community (especially those from the Bay Area). Greater mutual understanding is one of our goals here, right?
We’ve moved past behaviorism, though, haven’t we? Or is it now, again, the ‘in’ thing? I wasn’t aware of that; but then, I don’t keep up with all the latest developments in psychology. It’s a big field; and it seems strange to expect that all Less Wrong readers must be intimately familiar with the most current findings in neurobiology or what have you. (Perhaps a sequence of posts, describing (various aspects of) the current understanding of the mind, is in order?)
This seems like a strange thing to say, in exactly the way I meant in my earlier comment. Why say “that’s reinforcement learning”, when you could just as easily say “that’s you finding out that you liked something, and then acting on that newfound knowledge”—or, conversely, “that’s the action of the electromagnetic forces in the atoms of your body”? You wouldn’t say the latter, would you? But why not? Well, because it’s a rather uninformative, far too low-level, description; the interesting stuff, the stuff that has predictive value and that we can reason about, is taking place on the higher levels… and so with the “reinforcement learning” comment, which also seems to me to speak of “implementation details” that don’t tell me anything particularly interesting or specific. (I am, of course, entirely taking your word that it’s even an accurate description in the first place.)
(This is all not to mention the fact that your scenario is certainly not outside the bounds of my experience, but on the other hand neither are many adjacent but distinct scenarios, such as: (a) I try a new food, don’t much like it, but then keep eating it occasionally and develop a taste; (b) I try a new food, don’t much like it, but then some time later try it again and do like it, and then eat it occasionally; (c) I try a new food, like it a lot, but then don’t really eat it all that often; (d) I try a new food, like it a lot, eat it a lot, but then stop liking it, and stop eating it; (e) etc., etc. Are all of these things also “reinforcement learning”? If yes, then everything is “reinforcement learning” and the label is uninformative, much like “that’s EM fields among atoms”; if not, then saying that I “am a reinforcement learner” is clearly inaccurate, at best.)
Thanks for writing this. I think part of what’s going on is that when I say “reinforcement learning” it connects to a lot of gears in my head (for example, Q-learning) and I think I’ve been typical minding on how much other people know about those gears. It would probably be well worth writing a top-level post on using reinforcement learning to understand human cognition and behavior broadly, but I’m not sure I want to commit to a project that looks so big.
Yes, all of those things could be an example of reinforcement learning, depending on contextual factors such as the extent to which the food contains nutrients you’re deficient in, your history of experience with similar foods, your sense of how others around you would react to seeing you eat the food, etc. I’m aware that you’ll find this an unsatisfying answer, but since humans do in fact exhibit a large and complicated array of behaviors (I hope we can agree on this at least) a theory into which it all comfortably fits needs to be flexible enough to produce such an array.
Indeed; e.g., I didn’t know what Q-learning is until now—in fact I still don’t know what it is (for precisely the reason noted in the banner at the top of the Wikipedia page).
It’s entirely understandable that you are reluctant to take on a project to explain the entirety of what seems to be a large and somewhat abstruse scientific subfield. That said, the takeaway is that most readers here have no good reason to share those of your views that have, as a prerequisite, understanding of said field. (Making our way toward having a good overview of reinforcement learning would seem to be a good community goal for Less Wrong.)
Indeed, but the point here is that the same could be said of atomic theory, which, while perfectly true, tells us very little about what behaviors we should engage in, what strategies and approaches to life’s challenges are likely to be successful, etc. If “reinforcement learning” and “reinforcement learner” are that broad and general of categories, then just pointing that humans are reinforcement learners, in support of a specific claim or specific advice or a specific technique, etc., is not particularly convincing.
Maybe this is Bay Area bias, but the models that Qiaochu is relying on strike me as very natural, point to a lot of meaningful gears in my head, and my model of at least a large chunk of the people on this side have a similar experience.
I feel that reinforcement-learning based models were covered by a bunch of highly upvoted content on this site, let me quickly take 5 minutes to find the references I remember:
https://www.lesserwrong.com/posts/zThWT5Zvifo5qYaca/the-neuroscience-of-pleasure
https://www.lesserwrong.com/posts/EMJ3egz48BtZS8Pws/basics-of-animal-reinforcement
https://www.lesserwrong.com/posts/hN2aRnu798yas5b2k/a-crash-course-in-the-neuroscience-of-human-motivation
And a bunch more. This perspective of humans as reinforcement learners has been a core topic of a lot of LessWrong writing, and it seems reasonable for people to write things that build on top of that.
Thank you very much for the links!
As for the models—to me they seem oddly esoteric and specific to support such general claims (and, of course, I don’t have these “gears in my head” to point to).
However, perhaps I’ll change my mind after reading the posts you linked—which I will do at my earliest convenience!