Suppose that GPT-6 does turn out to be some highly transformative AI capable of human-level language understanding and causal reasoning? What would the remaining gap be between that and an Agentive AGI? Possibly, it would not be much of a further leap.
There is this list of remaining capabilities needed for AGI in an older post I wrote, with the capabilities of ‘GPT-6’ as I see them underlined:
Stuart Russell’s List
human-like language comprehension
cumulative learning
discovering new action sets
managing its own mental activity
For reference, I’ve included two capabilities we already have that I imagine being on a similar list in 1960
So we’d have discovering new action sets, and managing mental activity—effectively, the things that facilitate long-range complex planning, remaining. Unless you think those could also arise with GPT-N?
Suppose GPT-8 gives you all of those, just spontaneously, but its nothing but a really efficient text-predictor. Supposing that no dangerous mesa-optimisers arise, what then? Would it be relatively easy to turn it into something agentive, or would agent-like behaviour arise anyway?
I wonder if this is another moment to step back and reassess the next decade with fresh eyes—what’s the probability of a highly transformative AI, enough to impact overall growth rates, in the next decade? I don’t know, but probably not as low as I thought. We’ve already had our test-run.
******
In the spirit of trying to get ahead of events, are there any alignment approaches that we could try out on GPT-3 in simplified form? I recall a paper on getting GPT-2 to learn from Human preferences, which is step 1 in the IDA proposal. You could try and do the same thing for GPT-3, but get the human labellers to try and get it to recognise more complicated concepts—even label output as ‘morally good’ or ‘bad’ if you really want to jump the gun. You might also be able to set up debate scenarios to elicit better results using a method like this.
are there any alignment approaches that we could try out on GPT-3 in simplified form?
For a start you could see how it predicts or extrapolates moral reasoning. The datasets I’ve seen for that are “moral machines” and ‘am I the arsehole’ on reddit.
Suppose that GPT-6 does turn out to be some highly transformative AI capable of human-level language understanding and causal reasoning? What would the remaining gap be between that and an Agentive AGI? Possibly, it would not be much of a further leap.
There is this list of remaining capabilities needed for AGI in an older post I wrote, with the capabilities of ‘GPT-6’ as I see them underlined:
So we’d have discovering new action sets, and managing mental activity—effectively, the things that facilitate long-range complex planning, remaining. Unless you think those could also arise with GPT-N?
Suppose GPT-8 gives you all of those, just spontaneously, but its nothing but a really efficient text-predictor. Supposing that no dangerous mesa-optimisers arise, what then? Would it be relatively easy to turn it into something agentive, or would agent-like behaviour arise anyway?
I wonder if this is another moment to step back and reassess the next decade with fresh eyes—what’s the probability of a highly transformative AI, enough to impact overall growth rates, in the next decade? I don’t know, but probably not as low as I thought. We’ve already had our test-run.
******
In the spirit of trying to get ahead of events, are there any alignment approaches that we could try out on GPT-3 in simplified form? I recall a paper on getting GPT-2 to learn from Human preferences, which is step 1 in the IDA proposal. You could try and do the same thing for GPT-3, but get the human labellers to try and get it to recognise more complicated concepts—even label output as ‘morally good’ or ‘bad’ if you really want to jump the gun. You might also be able to set up debate scenarios to elicit better results using a method like this.
For a start you could see how it predicts or extrapolates moral reasoning. The datasets I’ve seen for that are “moral machines” and ‘am I the arsehole’ on reddit.
EDIT Something like this was just released Aligning AI With Shared Human Values