Well, all the models in the frontal lobe get, let’s call it, reward-prediction points (see my comment here), which feels like positive vibes or something.
If the generative model “I eat a cookie” has lots of reward-prediction points (including the model itself and the downstream models that get activated by it in turn), we describe that as “I want to eat a cookie”.
Likewise If the generative model “Michael Jackson” has lots of reward prediction points, we describe that as “I like Michael Jackson. He’s a great guy.”. (cf. “halo effect”)
If somebody says that justice is one of their values, I think it’s at least partly (and maybe primarily) up a level in meta-cognition. It’s not just that there’s a generative model “justice” and it has lots of reward-prediction points (“justice is good”), but there’s also a generative model of yourself valuing justice, and that has lots of reward-prediction points too. That feels like “When I think of myself as the kind of person who values justice, it’s a pleasing thought”, and “When I imagine other people saying that I’m a person who values justice, it’s a pleasing thought”.
This isn’t really answering your question of what human values are or should be—this is me saying a little bit about what happens behind the scenes when you ask someone “What are your values?”. Maybe they’re related, or maybe not. This is a philosophy question. I don’t know.
If cortical algorithm will be replaced with GPT-N in some human mind model, will the whole system work?
My belief (see post here) is that GPT-N is running a different kind of algorithm, but learning to imitate some steps of the brain algorithm (including neocortex and subcortex and the models that result from a lifetime of experience, and even hormones, body, etc.—after all, the next-token-prediction task is the whole input-output profile, not just the neocortex.) in a deep but limited way. I can’t think of a way to do what you suggest, but who knows.
Well, all the models in the frontal lobe get, let’s call it, reward-prediction points (see my comment here), which feels like positive vibes or something.
If the generative model “I eat a cookie” has lots of reward-prediction points (including the model itself and the downstream models that get activated by it in turn), we describe that as “I want to eat a cookie”.
Likewise If the generative model “Michael Jackson” has lots of reward prediction points, we describe that as “I like Michael Jackson. He’s a great guy.”. (cf. “halo effect”)
If somebody says that justice is one of their values, I think it’s at least partly (and maybe primarily) up a level in meta-cognition. It’s not just that there’s a generative model “justice” and it has lots of reward-prediction points (“justice is good”), but there’s also a generative model of yourself valuing justice, and that has lots of reward-prediction points too. That feels like “When I think of myself as the kind of person who values justice, it’s a pleasing thought”, and “When I imagine other people saying that I’m a person who values justice, it’s a pleasing thought”.
This isn’t really answering your question of what human values are or should be—this is me saying a little bit about what happens behind the scenes when you ask someone “What are your values?”. Maybe they’re related, or maybe not. This is a philosophy question. I don’t know.
My belief (see post here) is that GPT-N is running a different kind of algorithm, but learning to imitate some steps of the brain algorithm (including neocortex and subcortex and the models that result from a lifetime of experience, and even hormones, body, etc.—after all, the next-token-prediction task is the whole input-output profile, not just the neocortex.) in a deep but limited way. I can’t think of a way to do what you suggest, but who knows.