So8res comments on Why all the fuss about recursive self-improvement?

So8res 13 Jun 2022 15:31 UTC
14 points
3
(still travelling; still not going to reply in a ton of depth; sorry. also, this is very off-the-cuff and unreflected-upon.)

Which of “being smart,” “being a good person,” and “still being a good person in a Chinese bureaucracy” do you think is hard (prior to having AI smart enough to be dangerous)?

For all that someone says “my image classifier is very good”, I do not expect it to be able to correctly classify “a screenshot of the code for an FAI” as distinct from everything else. There are some cognitive tasks that look so involved as to require smart-enough-to-be-dangerous capabilities. Some such cognitive tasks can be recast as “being smart”, just as they can be cast as “image classification”. Those ones will be hard without scary capabilities. Solutions to easier cognitive problems (whether cast as “image classification” or “being smart” or whatever) by non-scary systems don’t feel to me like they undermine this model.

“Being good” is one of those things where the fact that a non-scary AI checks a bunch of “it was being good” boxes before some consequent AI gets scary, does not give me much confidence that the consequent AI will also be good, much like how your chimps can check a bunch of “is having kids” boxes without ultimately being an IGF maximizer when they grow up.

My cached guess as to our disageement vis a vis “being good in a Chinese bureaucracy” is whether or not some of the difficult cognitive challenges (such as understanding certain math problems well enough to have insights about them) decompose such that those cognitions can be split across a bunch of non-scary reasoners in a way that succeeds at the difficult cognition without the aggregate itself being scary. I continue to doubt that and don’t feel like we’ve seen much evidence either way yet (but perhaps you know things I do not).

(from the OP:) Yet it seems like GPT-3 already has a strong enough understanding of what humans care about that it could be used for this purpose.

To be clear, I agree that GPT-3 already has strong enough understanding to solve the sorts of problems Eliezer was talking about in the “get my grandma out of the burning house” argument. I read (perhaps ahistorically) the grandma-house argument as being about how specifying precisely what you want is real hard. I agree that AIs will be able to learn a pretty good concept of what we want without a ton of trouble. (Probably not so well that we can just select one of their concepts and have it optimize for that, in the fantasy-world where we can leaf through its concepts and have it optimize for one of them, because of how the empirically-learned concepts are more likely to be like “what we think we want” than “what we would want if we were more who we wished to be” etc. etc.)

Separately, in other contexts where I talk about AI systems understanding the consequences of their actions being a bottleneck, it’s understanding of consequences sufficient for things like fully-automated programming and engineering. Which look to me like they require a lot of understanding-of-consequences that GPT-3 does not yet possess. My “for the record” above was trying to make that clear, but wasn’t making the above point where I think we agree clear; sorry about that.

Does that correspond to some prediction about the kind of imitation task that will prove difficult for AI?

It would take a bunch of banging, but there’s probably some sort of “the human engineer can stare at the engineering puzzle and tell you the solution (by using thinking-about-consequences in the manner that seems to me to be tricky)” that I doubt an AI can replicate before being pretty close to being a good engineer. Or similar with, like, looking at a large amount of buggy code (where fixing the bug requires understanding some subtle behavior of the whole system) and then telling you the fix; I doubt an AI can do that before it’s close to being able to do the “core” cognitive work of computer programming.

It seems reasonable for you to say “language models aren’t like the kind of AI systems we are worried about,” but I feel like in that case each unit of progress in language modeling needs to be evidence against your view.

Maybe somewhat? My models are mostly like “I’m not sure how far language models can get, but I don’t think they can get to full-auto programming or engineering”, and when someone is like “well they got a little farther (although not as far as you say they can’t)!”, it does not feel to me like a big hit. My guess is it feels to you like it should be a bigger hit, because you’re modelling the skills that copilot currently exhibits as being more on-a-continuum with the skills I don’t expect language models can pull off, and so any march along the continuum looks to you like it must be making me sweat?

If things like copilot smoothly increase in “programming capability” to the point that they can do fully-automated programming of complex projects like twitter, then I’d be surprised.

I still lose a few Bayes points each day to your models, which more narrowly predict that we’ll take each next small step, whereas my models are more uncertain and say “for all I know, today is the day that language models hit their wall”. I don’t see the ratios as very large, though.

or else some way of grounding out the objection in intuitions that do make some different prediction about something we actually observe (either in the interim or historically).

A man can dream. We may yet be able to find one, though historically when we’ve tried it looks to me like we are mostly reading the same history in different ways, which makes things tricky.