I agree that Legg’s focus was more on getting the model to understand our ethical principles. And he didn’t give much on how you make a system that follows any principles. My guess is that he’s thinking of something like a much-elaborated form of AutoGPT; that’s what I mean by a language model agent. You prompt it with “come up with a plan that follows these goals” and then have a review process that prompts with something like “now make sure this plan fulfills these goals”. But I’m just guessing that he’s thinking of a similar system. He might be deliberately vague so as to not share strategy with competitors, or maybe that’s just not what the interview focused on.
I think this is one reasonable interpretation of his comments. But the fact that he:
1. Didn’t say very much about a solution to the problem of making models want to follow our ethical principles, and 2. Mostly talked about model capabilities even when explicitly asked about that problem
makes me think it’s not something he spends much time thinking about, and is something he doesn’t think is especially important to focus on.
I agree that Legg’s focus was more on getting the model to understand our ethical principles. And he didn’t give much on how you make a system that follows any principles. My guess is that he’s thinking of something like a much-elaborated form of AutoGPT; that’s what I mean by a language model agent. You prompt it with “come up with a plan that follows these goals” and then have a review process that prompts with something like “now make sure this plan fulfills these goals”. But I’m just guessing that he’s thinking of a similar system. He might be deliberately vague so as to not share strategy with competitors, or maybe that’s just not what the interview focused on.
I think this is one reasonable interpretation of his comments. But the fact that he:
1. Didn’t say very much about a solution to the problem of making models want to follow our ethical principles, and
2. Mostly talked about model capabilities even when explicitly asked about that problem
makes me think it’s not something he spends much time thinking about, and is something he doesn’t think is especially important to focus on.